![]() |
![]() |
Not all spam filters are perfect.
A small amount of real emails will end up in your spam bin.
We've developed an easy way to visually scan more than 100 messages in few seconds, making it a breeze to identify any real mail that might have been mis-diagnosed. Here's how it works... Spamcrunchers is a web based mail system, developed to allow you to remotely read mail, filter spam, and pre-filter out other unwanted mail into a "white list", using a POP connection that only downloads wanted mail into your computer. Once it is determined that all of the mail in the spam bin is spam, it is sent to the "Autoreporter", a cronjob running on the server that reads each spam, extracting its IP address (which cannot be forged), and splitting it up into two sections. The first section is email with IP addresses already in the database (now usually about 90 - 95%), which is ready to be sent to the auto-spam reporting system. The other section NOT in the database is sent to a "Whois" queue, where each IP is looked up and CIDR is extracted from the "Whois". The database (PostGreSQL) is then used to determine whether the new CIDR's are adjacent to any in the database, and owned by same company. Amazingly the database is less the 250K in size and contains IP blocks for the entire internet. The mail is sorted into these categories... 1. White list mail (known mail - definately not spam). 2. Green list mail (unknown mail - probably not spam). 3. Yellow list mail (419 mail) 4. Blue list mail (HTML) 5. Pink list mail (probably spam) 6. Red list mail (definitely spam) White classified mail is at top. Spam at bottom. User can manually re-classify items. Only the White mail list is POP'ed down into your local computer. large blocks of messages can be "checked" for moving, removal, etc with just two clicks - start message, end message, then pressing the "Check between" button to check hundreds in two clicks. Red classifed mail - determined to be spam, is then reported. 419 is reported to the treasury dept automatically. The information is displayed on a web page, allowing manual scanning. The user then selects the proper Email to use for the report. Once the Emails have been gathered up, another part of the program sends a probe email to the ISP, which might read... "Is this the proper Email to use for spam reports...?", giving the ISP the option to receive aggrigated or individual reports. Those that bounce are automatically queued into an Automatic "Bogus Whois" reporting system, sending the report to the appropriate IP Block agency - ARIN, APNIC, LACNIC... etc. recording the date and time in a database. The working ones then get merged and added to the database, and the 2nd section(about 5%) is then reported. This happens fast, and usually spam sent to our system gets reported within a half hour of receiving it. Participitating ISP's can have their reports aggrigated into either XML or CSV format, with dates and times recorded, making it easier for the ISP to identify the infected host. The web based mail browser can also display additional columns like IP Address... Clicking on an IP address brings up a new web page with the Whois data for that IP. Other mail with imbedded URLS are also used to pull in IP addresses of the spammer's web server, (i.e. The fake email from PayPal about your account). The user would just click on the IP address and if it does'nt say PayPal or EBay, they know it's bogus. Bogus IP addresses are also reported to the provider as being a possible "phishing" site. All this can be accomplished in two clicks. Participating ISP's can receive a daily report of where their Infected users are, and can take appropriate action. Advantages of this system Put to the test SpamCrunchers was beta tested last year by 60 users, and was reponsible for shutting down more then 250,000 infected hosts in 3 months. About 2000 spams per day were filtered out. Other features tested include the Spam Analyis display. A graph of the first octet of the IP is plotted against the 2nd, giving a good 3D model of the "spam neighborhoods" on the internet. Large concentrations of spam sources can be quickly identified, and an ISP "spam report card" can be displayed based on the amount of spam that comes from them. One can sort their mail by IP address, putting the main in order of where it "really" came from. The country of origin is also a column in the display and can be sorted. The database is an excellent source of spam trend information, showing just how and when the scared spammers flee one ISP and go to another. Most of the code is written in Python, using PostGreSQL or BSD Database. |