View Related Documents

Abstract

Along with the widespread concern of spam problem, at present, there are spam filtering system nowadays about the problem of semantic imperfection and spam filter low effect in the multi-send spam. This paper proposes a model of spam filtering which based on latent semantic analysis (LSA) and message-digest algorithm 5 (SHA). Making use of the LSA marks the latent feature phrase in the spam, semantic analysis is led into the spam filtering technique; the "e-mail fingerprint" of multi-send spam is born with SHA on the LSA analytical foundation, the problem of filtering technique’s low effect in the multi-send spam is resolved with this kind of method. We have designed a spam filtering system based on this model. Our designed system was evaluated with an optional dataset. The results obtained were compared with KNN algorithm filter experiment results show that system based on Latent Semantic Analysis and SHA performs KNN. The experiments show the expected results obtained, and the feasibility and advantage of the new spam filtering method is validated.

Keywords  Latent Semantic Analysis - Secure Hash Algorithm - Mail Characteristic ID - Slipping Windows - Spam Filtering

Fulltext Preview

Image of the first page of the fulltext document