Mass mailing and antispam algorithms 

Most of the modern servers they have facilities for doing those DNS request and you just specify which list you want to use, that’s about it. • The least, one of the good and sometimes also the bad things is at least it is maintained elsewhere, you don’t maintain the list, so you don’t control it

by Steve Brody Friday, May 13, 2011
Inside the message itself, when you actually get to the message, there are a lot of stop words that spammers used, like a lottery scam would used staff like “you won the lottery” or something like “become a millionaire” or “get rich quick” or something like that. The stop-words are haunted like crazy. List of headers that can indicate fake mail client, for example, each mail client has a certain way of position in their headers inside the message. Let’s say, you have an outlook, it would always encode the message in such a way that there is first text portion stripped out of html or any formatting like fonts and so on and after this there is a mime encoded html rich text message and it is always like this. Definitely, spam checks know it.

Whenever they see a message the has a header with "Yes, I am outlook express" with positions and orders that do not match, they will either reject the message or will treat is as suspicious.

Some filters, they block content like exe files or zip files or others, some block any attachments and they can just say “Ok. If you want to attach something then just send the link and attach it on the website. Our mail servers will not pass any exe files. ” Gmail does it and Yahoo does it. They don’t accept any exe files and so on.

We will now proceed to AI filters. The most sophisticated checks.

Since we already filtered out most of the dumb spam, I used to call it stupid spam because any person who actually hopes to deliver something this way is pretty stupid because he knows already that 99. 9999% is going to be rejected. After this dumb spam has been rejected by the previous filters, now we come to much less messages. 99% of the messages were rejected by the previous filters, AI filters now comes into play.

Artificial intelligence filters are known to be a filter which really requires training. They are trained by feeding them with spam and ham content, messages that are spam and not spam. Let us presume that there is a message classified by other filters as a spam, we will look at this message and check if it really is a spam or not. We will then classify it correctly. This process is what training is talking about.

There are many types of complex filters. Some of them depend on fuzzy logic, some of them on neural networks while some are expert systems like model-based reasoning systems and so on, very complex staff. Absolutely there are pros and cons to it. I will not expand on this subject because there are so many of them; different types and they are very complex.

One advantage that they will offer is that they will be pre-accurate after a lot of training by users which makes them very useful in large environments such as Gmail.

Gmail has hundreds of millions of users and at least some percentage of users they press the built in buttons like “ this is spam” or “this is not spam”. Each time you push this button, what you do is you actually train this filter to be more and more sophisticated. Imagine the community effect of millions of users training this artificial intelligence filter. It is very useful in this type of environment where you have large corporation or a big provider that every user is allowed to train the filter.

