Article
Automatic Readability Classification of Crowd-Sourced Data based on Linguistic and Information-Theoretic Features
Fecha
2013-06-07Registro en:
Revista Computación y Sistemas; Vol. 17 No.2
1405-5546
Autor
Islam, Zahurul
Mehler, Alexander
Institución
Resumen
This paper presents a classifier of text
readability based on information-theoretic features.
The classifier was developed based on a linguistic
approach to readability that explores lexical, syntactic
and semantic features. For this evaluation we extracted a
corpus of 645 articles from Wikipedia together with their
quality judgments. We show that information-theoretic
features perform as well as their linguistic counterparts
even if we explore several linguistic levels at once.