info:eu-repo/semantics/article
Modifying Jaccard Coefficient for Texts Similarity
Autor
Mahmood Abdullah, Sura
Mazin Ali, Sura
Abduljaleel Makttof, Mohammed
Institución
Resumen
Calculating similarities between texts written in any language remains one of the extremely important challenges encounter natural language processing. This paper presents the modified Jaccard similarity coefficient for the texts; the main aim from this modification is to count the number of similar sen- tences between texts instead of counting the number of similar words between them as in previous works. This modification is applied by produced an equa- tion which combining the Jaccard coefficient and the similarity coefficient, furthermore, two criteria are employed in the proposed equation; where the first one is multiplied by the Jaccard coefficient and the second criterion is multiplied by the similarity coefficient. The objective of these criteria is to keep the similarity degree between 0 and 1. The experimental results are logi- cal, in which the similarity degree of the proposed equation increased approx- imately 3% on Jaccard coefficient degree when chosen texts from the same class, while it became less than the Jaccard coefficient degree when chosen texts from the various classes.