Actas de congresos
An Analysis Of Machine Learning Methods For Spam Host Detection
Registro en:
9780769549132
Proceedings - 2012 11th International Conference On Machine Learning And Applications, Icmla 2012. , v. 2, n. , p. 227 - 232, 2012.
10.1109/ICMLA.2012.161
2-s2.0-84873580735
Autor
Silva R.M.
Yamakami A.
Almeida T.A.
Institución
Resumen
The web is becoming an increasingly important source of entertainment, communication, research, news and trade. In this way, the web sites compete to attract the attention of users and many of them achieve visibility through malicious strategies that try to circumvent the search engines. Such sites are known as web spam and they are generally responsible for personal injury and economic losses. Given this scenario, this paper presents a comprehensive performance evaluation of several established machine learning techniques used to automatically detect and filter hosts that disseminate web spam. Our experiments were diligently designed to ensure statistically sounds results and they indicate that bagging of decision trees, multilayer perceptron neural networks, random forest and adaptive boosting of decision trees are promising in the task of web spam classification and, hence, they can be used as a good baseline for further comparison. © 2012 IEEE. 2
227 232 Ledford, J.L., (2009) Search Engine Optimization Bible, , 2nd ed. Indianapolis, Indiana, USA: Wiley Publishing Svore, K.M., Wu, Q., Burges, C.J.C., Raman, A., Improving web spam classification using rank-time features (2007) ACM International Conference Proceeding Series, 215, pp. 9-16. , DOI 10.1145/1244408.1244411, AIRWeb 2007 - Proceedings of the 3rd International Workshop on Adversarial Information Retrieval on the Web Gyongyi, Z., Garcia-Molina, H., (2005) Spam: It's Not Just For Inboxes Anymore Computer, 38 (10), pp. 28-34 John, J.P., Yu, F., Xie, Y., Krishnamurthy, A., Abadi, M., Deseo: Combating search-result poisoning (2011) Proc. of the 20th SEC, pp. 20-20. , Berkeley, CA, USA Silva, R.M., Almeida, T.A., Yamakami, A., Redes neurais artificiais para detecção de web spams (2012) Proc. of the 8th Brazilian Symposium on Information Systems-SBSI, pp. 636-641. , São Paulo, Brazil Artificial neural networks for content-based web spam detection Proc. of the 14th ICAI, 2012, pp. 1-7. , Las Vegas, NV, USA Towards web spam filtering with neural-based approaches (2012) Proc. of the 13rd IBERAMIA, Ser, pp. 199-209. , Lecture Notes in Artificial Intelligence, 7637. Cartagena de Indias, Colombia: Springer Berlin Heidelberg Largillier, T., Peyronnet, S., Webspam demotion: Low complexity node aggregation methods (2012) Neurocomputing, 76 (1), pp. 105-113 Liu, Y., Chen, F., Kong, W., Yu, H., Zhang, M., Ma, S., Ru, L., (2012) Identifying Web Spam With The Wisdom Of The Crowds, 6 (1), pp. 21-230. , ACM Trans. on the Web Rungsawang, A., Taweesiriwate, A., Manaskasemsak, B., Spam host detection using ant colony optimization IT Convergence and Services, ser, 107 (2011), pp. 13-21. , Lecture Notes in Electrical Engineering, Springer Netherlands Castillo, C., Donato, D., Gionis, A., Murdock, V., Silvestri, F., Know your neighbors: Web spam detection using the web topology (2007) Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR'07, pp. 423-430. , DOI 10.1145/1277741.1277814, Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR'07 Wu, X., Kumar, V., Quinlan, J.R., Ghosh, J., Yang, Q., Motoda, H., Ng McLachlan, A., Steinberg, D., Top 10 algorithms in data mining (2008) Knowledge and Information Systems, 14 (1), pp. 1-37 Haykin, S., (1998) Neural Networks: A Comprehensive Foundation, , 2nd ed. New York, NY, USA: Prentice Hall Bishop, C.M., (1995) Neural Networks for Pattern Recognition, , Oxford: Oxford Press Hagan, M.T., Menhaj, M.B., Training feedforward networks with the marquardt algorithm (1994) IEEE Trans. on Neural Networks, 6 (5), pp. 989-993 Cortes, C., Vapnik, V.N., Support-vector networks (1995) Machine Learning, pp. 273-297 Chang, C.-C., Lin, C.-J., Libsvm: A library for support vector machines (2011) ACM Trans, 27 (2), pp. 1-27. , On Intelligent Systems and Technology Hsu, C.-W., Chang, C.-C., Lin, C.-J., A practical guide to support vector classification (2003) National Taiwan University, Tech. Rep. Quinlan, J.R., (1993) C4.5: Programs For Machine Learning, , 1st ed. San Mateo, CA, USA: Morgan Kaufmann Breiman, L., Random forests (2001) Machine Learning, 45 (1), pp. 5-32. , DOI 10.1023/A:1010933404324 Aha David, W., Kibler Dennis, Albert Marc, K., Instance-based learning algorithms (1991) Machine Learning, 6 (1), pp. 37-66. , DOI 10.1023/A:1022689900470 Witten, I.H., Frank, E., Mining, D., (2005) Practical Machine Learning Tools and Techniques, 2nd ed., , San Francisco, CA: Morgan Kaufmann Freund, Y., Schapire, R.E., Experiments with a new boosting algorithm (1996) Proc. of the 13th ICML, pp. 148-156. , Bari, Italy: Morgan Kaufmann Breiman, L., Bagging predictors (1996) Machine Learning, 24 (2), pp. 123-140 Friedman, J., Hastie, T., Tibshirani, R., Additive logistic regression: A statistical view of boosting (2000) Annals of Statistics, 28 (2), pp. 337-407 Becchetti, L., Castillo, C., Donato, D., Leonardi, S., Baeza-Yates, R., Using rank propagation and probabilistic counting for link-based spam detection (2006) Proc. of the WebKDD'06, , Philadelphia,USA Shao, J., Linear model selection by cross-validation (1993) Journal of the American Statistical Association, 422 (88), pp. 486-494 Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H., The weka data mining software: An update (2009) SIGKDD Explorations Newsletter, 11 (1), pp. 10-18 Montgomery, D.C., Runger, G.C., (2002) Applied Statistics And Probability For Engineers 3rd Ed, , New York NY USA: John Wiley & Sons