Occam's Razor-based Spam Filter

Almeida T.A.; Yamakami A.

dc.creator	Almeida T.A.
dc.creator	Yamakami A.
dc.date	2012
dc.date	2015-06-25T20:24:37Z
dc.date	2015-11-26T15:20:12Z
dc.date	2015-06-25T20:24:37Z
dc.date	2015-11-26T15:20:12Z
dc.date.accessioned	2018-03-28T22:29:42Z
dc.date.available	2018-03-28T22:29:42Z
dc.identifier
dc.identifier	Journal Of Internet Services And Applications. , v. 3, n. 3, p. 245 - 253, 2012.
dc.identifier	18674828
dc.identifier	10.1007/s13174-012-0067-x
dc.identifier	http://www.scopus.com/inward/record.url?eid=2-s2.0-84888619395&partnerID=40&md5=d97d7ad953d447dbce0b4f5ef93262cd
dc.identifier	http://www.repositorio.unicamp.br/handle/REPOSIP/90271
dc.identifier	http://repositorio.unicamp.br/jspui/handle/REPOSIP/90271
dc.identifier	2-s2.0-84888619395
dc.identifier.uri	http://repositorioslatinoamericanos.uchile.cl/handle/2250/1259922
dc.description	Nowadays e-mail spam is not a novelty, but it is still an important rising problem with a big economic impact in society. Spammers manage to circumvent current spam filters and harm the communication system by consuming several resources, damaging the reliability of e-mail as a communication instrument and tricking recipients to react to spam messages. Consequently, spam filtering poses a special problem in text categorization, of which the defining characteristic is that filters face an active adversary, which constantly attempts to evade filtering. In this paper, we present a novel approach to spam filtering based on theminimum description length principle. Furthermore, we have conducted an empirical experiment on six public and real non-encoded datasets. The results indicate that the proposed filter is fast to construct, incrementally updateable and clearly outperforms the state-of-the-art spam filters. © The Brazilian Computer Society 2012.
dc.description	3
dc.description	3
dc.description	245
dc.description	253
dc.description	Almeida, T., Yamakami, A., Content-based spam filtering (2010) Proceedings of the 23rd IEEE International Joint Conference On Neural Networks, pp. 1-7. , Barcelona, Spain
dc.description	Almeida, T., Yamakami, A., Redução de Dimensionalidade Aplicada na Classificação de Spams Usando Filtros Bayesianos (2011) Revista Brasileira De Computação Aplicada, 3 (1), pp. 16-29
dc.description	Almeida, T., Yamakami, A., Almeida, J., Evaluation of approaches for dimensionality reduction applied with Naive Bayes anti-spam filters (2009) Proceedings of the 8th IEEE International Conference On Machine Learning and Applications, pp. 517-522. , Miami, FL, USA
dc.description	Almeida, T., Yamakami, A., Almeida, J., Filtering spams using the minimum description length principle (2010) Proceedings of the 25th ACM Symposium On Applied Computing, pp. 1856-1860. , Sierre, Switzerland
dc.description	Almeida, T., Yamakami, A., Almeida, J., Probabilistic antispam filtering with dimensionality reduction (2010) Proceedings of the 25th ACM Symposium On Applied Computing, pp. 1804-1808. , Sierre, Switzerland
dc.description	Almeida, T., Hidalgo, J.G., Yamakami, A., Contributions to the study of SMS spam filtering: New collection and results (2011) Proceedings of the 2011 ACM Symposium On Document Engineering, pp. 259-262. , Mountain View, CA, USA
dc.description	Almeida, T., Almeida, J., Yamakami, A., Spam filtering: How the dimensionality reduction affects the accuracy of Naive Bayes classifiers (2011) J Internet Serv Appl, 1 (3), pp. 183-200
dc.description	Almeida, T.A., Yamakami, A., Advances in spam filtering techniques (2012) Com Putational Intelligence For Privacy and Security. Studies In Computational Intelligence, 394, pp. 199-214. , In: Elizondo D, Solanas A,Martinez-Balleste A (eds), Springer, Berlin
dc.description	Almeida, T.A., Yamakami, A., Facing the spammers: A very effective approach to avoid junk e-mails (2012) Expert Syst Appl, pp. 1-5
dc.description	Anagnostopoulos, A., Broder, A., Punera, K., Effective and efficient classification on a search-engine model (2008) Knowl Inf Syst, 16 (2), pp. 129-154
dc.description	Androutsopoulos, I., Koutsias, J., Chandrinos, K., Paliouras, G., Spyropoulos C (2000a) An evalutation of Naive Bayesian anti-spam filtering Proceedings of the 11th European Conference On Machine Learning, pp. 9-17. , Barcelona, Spain
dc.description	Androutsopoulos, I., Paliouras, G., Karkaletsis, V., Sakkis, G., Spyropoulos, C., Stamatopoulos, P., Learning to filter spam e-mail: A comparison of a Naive Bayesian and a memory-based approach (2000) Proceedings of the 4th European Conference On Principles and Practice of Knowledge Discovery In Databases, pp. 1-13. , Lyon, France
dc.description	Androutsopoulos, I., Paliouras, G., Michelakis, E., (2004) Learning to Filter Unsolicited Commercial E-mail, , Technical Report 2004/2, National Centre for Scientific Research "Demokritos", Athens, Greece
dc.description	Baldi, P., Brunak, S., Chauvin, Y., Andersen, C., Nielsen, H., Assessing the accuracy of prediction algorithms for classification: An overview (2000) Bioinformatics, 16 (5), pp. 412-424
dc.description	Barron, A., Rissanen, J., Yu, B., The minimum description length principle in coding and modeling (1998) IEEE Trans Inf Theory, 44 (6), pp. 2743-2760
dc.description	Blanzieri, E., Bryl, A., A survey of learning-based techniques of email spam filtering (2008) Artif Intell Rev, 29 (1), pp. 335-455
dc.description	Bordes, A., Ertekin, S., Weston, J., Bottou, L., Fast kernel classifiers with online and active learning (2005) J Mach Learn Res, 6, pp. 1579-1619
dc.description	Bratko, A., Cormack, G., Filipic, B., Lynam, T., Zupan, B., Spam filtering using statistical data compression models (2006) J Mach Learn Res, 7, pp. 2673-2698
dc.description	Carreras, X., Marquez, L., Boosting trees for anti-spam email filtering (2001) Proceedings of the 4th International Conference On Recent Advances In Natural Language Processing, pp. 58-64. , Tzigov Chark, Bulgaria
dc.description	Cohen, W., Fast effective rule induction (1995) Proceedings of 12th International Conference On Machine Learning, pp. 115-123. , Tahoe City, CA, USA
dc.description	Cohen, W., Learning rules that classify e-mail (1996) Proceedings of the AAAI Spring Symposium On Machine Learning In Information Access, pp. 18-25. , CA, USA, Stanford
dc.description	Cormack, G., Email spam filtering: A systematic review (2008) Found Trends Inf Retr, 1 (4), pp. 335-455
dc.description	Cormack, G., Lynam, T., Online supervised spam filter evaluation (2007) ACM Trans Inf Syst, 25 (3), pp. 1-11
dc.description	Czarnowski, I., Cluster-based instance selection for machine classification (2011) Knowl Inf Syst
dc.description	Drucker, H., Wu, D., Vapnik, V., Support vector machines for spam categorization (1999) IEEE Trans Neural Netw, 10 (5), pp. 1048-1054
dc.description	Forman, G., Scholz, M., Rajaram, S., Feature shaping for linear SVM classifiers (2009) Proceedings of the 15th ACM SIGKDD International Conference On Knowledge Discovery and Data Mining, pp. 299-308. , France, Paris
dc.description	Frank, E., Chui, C., Witten, I., Text categorization using compression models (2000) Proceedings of the 10th Data Compression Conference, pp. 555-565. , Snowbird, UT, USA
dc.description	Grünwald, P., Atutorial introduction to theminimum description length principle (2005) Advances In Minimum Description Length: Theory and Applications, pp. 3-81. , In: Grünwald P, Myung I, Pitt M (eds), MIT Press, Cambridge
dc.description	Guzella, T., Caminhas, W., A review of machine learning approaches to spam filtering (2009) Expert Syst Appl, 36 (7), pp. 10206-10222
dc.description	Hidalgo, J., Evaluating cost-sensitive unsolicited bulk mail categorization (2002) Proceedings of the 17th ACM Symposium On Applied Computing, pp. 615-620. , Madrid, Spain
dc.description	Joachims, T., A probabilistic analysis of the Rocchio algorithm with TFIDF for text categorization (1997) Proceedings of 14th International Conference On Machine Learning, pp. 143-151. , Nashville, TN, USA
dc.description	John, G., Langley, P., Estimating continuous distributions in Bayesian classifiers (1995) Proceedings of the 11th International Conference OnUncertainty In Artificial Intelligence, pp. 338-345. , Montreal,Canada
dc.description	Katakis, I., Tsoumakas, G., Vlahavas, I., Tracking recurring contexts using ensemble classifiers: An application to email filtering (2009) Knowl Inf Syst, 22 (3), pp. 371-391
dc.description	Kolcz, A., Alspector, J., SVM-based filtering of e-mail spam with content-specific misclassification costs (2001) Proceedings of the 1st International Conference On Data Mining, pp. 1-14. , San Jose, CA, USA
dc.description	Losada, D., Azzopardi, L., Assessing multivariate Bernoulli models for information retrieval (2008) ACM Trans Inf Syst, 26 (3), pp. 1-46
dc.description	Matthews, B., Comparison of the predicted and observed secondary structure of T4 phage lysozyme (1975) Biochimica Et Biophysica Acta, 405 (2), pp. 442-451
dc.description	McCallum, A., Nigam, K., A comparison of event models for Naive Bayes text classication (1998) Proceedings of the 15th AAAI Workshop On Learning For Text Categorization, pp. 41-48. , Menlo Park, CA, USA
dc.description	Metsis, V., Androutsopoulos, I., Paliouras, G., Spam filtering with Naive Bayes-which Naive Bayes? (2006) Proceedings of the 3rd International Conference On Email and Anti-Spam, pp. 1-5. , Mountain View, CA, USA
dc.description	Peng, T., Zuo, W., He, F., SVM based adaptive learning method for text classification from positive and unlabeled documents (2008) Knowl Inf Syst, 16 (3), pp. 281-301
dc.description	Reddy, C., Park, J.-H., Multi-resolution boosting for classification and regression problems (2010) Knowl Inf Syst
dc.description	Rissanen, J., Modeling by shortest data description (1978) Automatica, 14, pp. 465-471
dc.description	Sahami, M., Dumais, S., Hecherman, D., Horvitz, E., A Bayesian approach to filtering junk e-mail (1998) Proceedings of the 15th NationalConference On Artificial Intelligence, pp. 55-62. , Madison, WI,USA
dc.description	Schapire, R., Singer, Y., Singhal, A., Boosting and Rocchio applied to text filtering (1998) Proceedings of the 21st Annual International Conference On Information Retrieval, pp. 215-223. , Melbourne, Australia
dc.description	Schneider, K., On word frequency information and negative evidence in Naive Bayes text classification (2004) Proceedings of the 4th International Conference On Advances In Natural Language Processing, pp. 474-485. , Alicante, Spain
dc.description	Siefkes, C., Assis, F., Chhabra, S., Yerazunis, W., Combining winnow and orthogonal sparse bigrams for incremental spam filtering (2004) Proceedings of the 8th European Conference On Principles and Practice of Knowledge Discovery In Databases, pp. 410-421. , Pisa, Italy
dc.description	Song, Y., Kolcz, A., Gilez, C., Better Naive Bayes classification for high-precision spam detection (2009) Softw Pract Experience, 39 (11), pp. 1003-1024
dc.description	Teahan, W., Harper, D., Using compression-based language models for text categorization (2001) Proceedings of the 2001 Workshop On Language Modeling and Information Retrieval, pp. 1-5. , Pittsburgh, PA, USA
dc.description	Wozniak, M., A hybrid decision tree training method using data streams (2010) Knowl Inf Syst
dc.description	Wu, X., Kumar, V., Quinlan, J., Ghosh, J., Yang, Q., Motoda, H., McLachlan, G., Steinberg, D., Top 10 algorithms in data mining (2008) Knowl Inf Syst, 14 (1), pp. 1-37
dc.description	Zhang, J., Kang, D., Silvescu, A., Honavar, V., Learning accurate and concise Naive Bayes classifiers from attribute value taxonomies and data (2006) Knowl Inf Syst, 9 (2), pp. 157-179
dc.description	Zhang, L., Zhu, J., Yao, T., An evaluation of statistical spam filtering techniques (2004) ACMTrans Asian Lang Inf Process, 3 (4), pp. 243-269
dc.language	en
dc.publisher
dc.relation	Journal of Internet Services and Applications
dc.rights	aberto
dc.source	Scopus
dc.title	Occam's Razor-based Spam Filter
dc.type	Artículos de revistas

Este ítem pertenece a la siguiente institución

Universidade Estadual de Campinas (Brasil)