A Collaborative Spam Detection System with a Novel E-Mail Abstraction Scheme


Abstract:
E-mail communication is indispensable nowadays, but the e-mail spam problem continues growing drastically. In recent years, the notion of collaborative spam filtering with near-duplicate similarity matching scheme has been widely discussed. The primary idea of the similarity matching scheme for spam detection is to maintain a known spam database, formed by user feedback, to block subsequent near-duplicate spams. On purpose of achieving efficient similarity matching and reducing storage utilization, prior works mainly represent each e-mail by a succinct abstraction derived from e-mail content text. However, these abstractions of e-mails cannot fully catch the evolving nature of spams, and are thus not effective enough in near-duplicate detection. In this paper, we propose a novel e-mail abstraction scheme, which considers e-mail layout structure to represent e-mails. We present a procedure to generate thee-mail abstraction using HTML content in e-mail, and this newly devised abstraction can more effectively capture the near-duplicate phenomenon of spams. Moreover, we design a complete spam detection system Cosdes (standing for Collaborative Spam Detection System), which possesses an efficient near-duplicate matching scheme and a progressive update scheme. The progressive update scheme enables system Cosdes to keep the most up-to-date information for near-duplicate detection. We evaluate Cosdes on a live data set collected from a real e-mail server and show that our system outperforms the prior approaches in detection results and is applicable to the real world.

Existing System:
Ø Here, prior works mainly represent each e-mail by a succinct abstraction derived from e-mail content text.
Ø These abstractions of e-mails cannot fully catch the evolving nature of spams and are thus not effective enough in near-duplicate detection.
.




Proposed System:
Ø We propose a novel e-mail abstraction scheme, which considers e-mail layout structure to represent e-mails.
Ø The progressive update scheme enables system Cosdes to keep the most up-to-date information for near-duplicate detection.
Ø We explore a more sophisticated and robust e-mail abstraction scheme, which considers e-mail layout structure to represent e-mails.
Ø The specific procedure SAG is proposed to generate the e-mail abstraction using HTML content in e-mail.
Ø This newly-devised abstraction can more effectively capture the near-duplicate phenomenon of spams.


KEYWORDS:
Generic Technology Keywords: Database, User Interface, Programming
Specific Technology Keywords: C#.Net, ASP.Net, MS SqlServer-08
Project Keywords: Presentation, Business Object, Data Access Layer, Database
SDLC Keywords: Analysis, Design, Code, Testing, Implementation, Maintenance





SYSTEM CONFIGURATION
HARDWARE CONFIGURATION
S.NO
HARDWARE
CONFIGURATIONS
1
Operating System
Windows 2000 & XP
2
RAM
1GB
3
Processor (with Speed)
Intel  Pentium IV (3.0 GHz) and Upwards
4
Hard Disk Size
40 GB and above
5
Monitor
15’ CRT
SOFTWARE CONFIGURATION
S.NO
SOFTWARE
CONFIGURATIONS
1
Platform
Microsoft Visual Studio
2
Framework
.Net Framework 4.0
3
Language
C#.Net
4
Front End
Asp.net, html
5
Back End
SQL Server 2008


1 comments:

  1. I need this project please inform me the cost of the project.....9493600160

    ReplyDelete