Artificial Neural Networks For Content-based Web Spam Detection

Web spam has become a big problem in the lives of Internet users, causing personal injury and economic losses. Although some approaches have been proposed to automatically detect and avoid this problem, the high speed the techniques employed by spammers are improved requires that the classifiers be more generic, efficient and highly adaptive. Despite of the fact that it is a common sense in the literature that neural based techniques have a high ability of generalization and adaptation, as far as we know there is no work that explore such method to avoid web spam. Given this scenario and to fill this important gap, this paper presents a performance evaluation of different models of artificial neural networks used to automatically classify and filter real samples of web spam based on their contents. The results indicate that some of evaluated approaches have a big potential since they are suitable to deal with the problem and clearly outperform the state-of-the-art techniques.

209

215

George Mason Univ., Bioinformatics Comput. Biol. Program,HST Harvard Univ. MIT, Biomed. Cybern. Lab.,University of Minnesota, Minnesota Supercomputing Institute,Center for Cyber Defense, NCAT,Argonne's Leadersh. Comput. Facil. Argonne Natl. Lab.

Svore, K.M., Wu, Q., Burges, C.J., Improving web spam classification using rank-time features (2007) Proceedings of the 3rd International Workshop on Adversarial Information Retrieval on the Web (AIRWeb'07), pp. 9-16. , Banff, Alberta, Canada

Gyongyi, Z., Garcia-Molina, H., Spam: It's not just for inboxes anymore (2005) Computer, 38 (10), pp. 28-34

Shen, G., Gao, B., Liu, T., Feng, G., Song, S., Li, H., Detecting link spam using temporal information (2006) Proceedings of the 6th IEEE International Conference on Data Mining (ICDM'06), pp. 1049-1053. , Hong Kong, China

Egele, M., Kolbitsch, C., Platzer, C., Removing web spam links from search engine results (2011) Journal in Computer Virology, 7, pp. 51-62

Eiron, N., McCurley, K.S., Tomlin, J.A., Ranking the web frontier (2004) Proceedings of the 13rd International Conference on World Wide Web (WWW'04), pp. 309-318. , New York, NY, USA

Almeida, T., Yamakami, A., Almeida, J., Evaluation of approaches for dimensionality reduction applied with naive bayes anti-spam filters (2009) Proceedings of the 8th IEEE International Conference on Machine Learning and Applications, pp. 517-522. , Miami, FL, USA

Almeida, T., Yamakami, A., Almeida, J., Filtering spams using the minimum description length principle (2010) Proceedings of the 25th ACM Symposium on Applied Computing, pp. 1856-1860. , Sierre, Switzerland

Almeida, T., Yamakami, A., Almeida, J., Probabilistic anti-spam filtering with dimensionality reduction (2010) Proceedings of the 25th ACM Symposium on Applied Computing, pp. 1804-1808. , Sierre, Switzerland

Almeida, T., Yamakami, A., Content-based spam filtering (2010) Proceedings of the 23rd IEEE International Joint Conference on Neural Networks, pp. 1-7. , Barcelona, Spain

Almeida, T., Almeida, J., Yamakami, A., Spam filtering: How the dimensionality reduction affects the accuracy of naive bayes classifiers (2011) Journal of Internet Services and Applications, 1 (3), pp. 183-200

Almeida, T., Yamakami, A., Redução de Dimensionalidade Aplicada na Classificaç ão de Spams Usando Filtros Bayesianos (2011) Revista Brasileira de Computação Aplicada, 3 (1), pp. 16-29

Almeida, T., Hidalgo, J.G., Yamakami, A., Contributions to the study of SMS spam filtering: New collection and results (2011) Proceedings of the 2011 ACM Symposium on Document Engineering, pp. 259-262. , Mountain View, CA, USA

Almeida, T.A., Yamakami, A., Facing the spammers: A very effective approach to avoid junk E-mails (2012) Expert Systems with Applications, pp. 1-5

Almeida, T.A., Yamakami, A., Advances in spam filtering techniques (2012) Computational Intelligence for Privacy and Security, Ser. Studies in Computational Intelligence, 394, pp. 199-214. , D. Elizondo, A. Solanas, and A. Martinez-Balleste, Eds. Springer

Gan, Q., Suel, T., Improving web spam classifiers using link structure (2007) Proceedings of the 3rd International Workshop on Adversarial Information Retrieval on the Web (AIRWeb'07), pp. 17-20. , Banff, Alberta, Canada

Ntoulas, A., Najork, M., Manasse, M., Fetterly, D., Detecting spam web pages through content analysis (2006) Proceedings of the World Wide Web Conference (WWW'06), pp. 83-92. , Edinburgh, Scotland

Urvoy, T., Chauveau, E., Filoche, P., Tracking web spam with html style similarities (2008) ACM Transactions on the Web, 2 (1), pp. 1-3. , February

Bíró, I., Siklósi, D., Szabó, J., Benczúr, A.A., Linked latent dirichlet allocation in web spam filtering (2009) Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web (AIRWebW), pp. 37-40. , Madrid, Spain

Abernethy, J., Chapelle, O., Castillo, C., Graph regularization methods for web spam detection (2010) Machine Learning, 81 (2), pp. 207-225

Castillo, C., Donate, D., Gionis, A., Know your neighbors: Web spam detection using the web topology (2007) Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'07), pp. 423-430. , Amsterdam, The Netherlands

Erdélyi, M., Garzó, A., Benczúr, A.A., Web spam classification: A few features worth more (2011) Proceedings of the 2011 Joint WICOW/AIRWeb Workshop on Web Quality (WebQuality'11), pp. 27-34. , Hyderabad, India

Geng, G., Wang, C., Li, Q., Xu, L., Jin, X., Boosting the performance of web spam detection with ensemble under-sampling classification (2007) Proceedings of the 14th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD'07), pp. 583-587. , Haikou, China

Largillier, T., Peyronnet, S., Lightweight clustering methods for webspam demotion (2010) Proceedings of the 9th IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT'10), pp. 98-104. , Toronto, Canada

Ren, Q., Feature-fusion framework for spam filtering based on svm (2010) Proceedings of the 7th Annual Collaboration, Electronic Messaging, Anti-Abuse and Spam Conference (CEAS'10), pp. 1-6. , Redmond, Washington, USA

Haykin, S., (1998) Neural Networks: A Comprehensive Foundation, , 2nd ed. New York, NY, USA: Prentice Hall

Liu, H., On the levenberg-marquardt training method for feed-forward neural networks (2010) Proceedings of the 6th International Conference on Natural Computation (ICNC'10), pp. 456-460. , Yantai, China

Bishop, C.M., (1995) Neural Networks for Pattern Recognition, , 1st ed. Oxford: Oxford Press

Hagan, M.T., Menhaj, M.B., Training feedforward networks with the marquardt algorithm (1994) IEEE Transactions on Neural Networks, 5 (6), pp. 989-993

Kohonen, T., The self-organizing map (1990) Proceedings of the IEEE, 9 (78), pp. 1464-1480

Orr, M.J.L., (1996) Introduction to Radial Basis Function Networks

Materias

Mostrar el registro completo del ítem