Proceedings of ISP RAS


Combined Classifier for Website Messages Filtration

V. Tarasov (PSUTII, Samara), E. Mezenceva (PSUTII, Samara), D. Karbaev (PSUTII, Samara)

Abstract

The paper describes a new approach to website messages filtration using combined classifier. Information security standards for the internet resources require user data protection however the increasing volume of spam messages in interactive sections of websites poses a special problem. Spam messages vary significantly in content, however the common feature of these messages is that they are usually of little interest to the majority of the recipients. Many filtering approaches are based on the Naive Bayesian classifier - an effective method to construct automatically anti-spam filters with high performance. Unlike many email filtering solutions the proposed approach is based on the effective combination of Bayes and Fisher methods, which allows us to build accurate and stable spam filter. In this paper we consider the organization of combined classifier according to determined optimization criteria based on statistical methods, probability calculations and decision rules. We consider the optimization criteria for grading messages basing on statistical methods. The classifiers normally admit the compromise between the acceptable level of false-positive and false-negative errors, and use the threshold values for decision-making, which may vary. In order to receive more valid results of spam detection we need to analyze multitudes of results of various filters and a subset of their overlaps. The approach we suggest is to construct classifier organization, which presumes the combined use of Bayes and Fischer methods for improved the filtration quality based on the analysis of subsets and set overlaps identified by both methods (spam, non-spam, false triggering and spam leaks).

Keywords

combined classifier; spam filter; optimization criterion

Edition

Proceedings of the Institute for System Programming, vol. 27, issue 3, 2015, pp. 291-302.

ISSN 2220-6426 (Online), ISSN 2079-8156 (Print).

DOI: 10.15514/ISPRAS-2015-27(3)-20

Full text of the paper in pdf Back to the contents of the volume