Corporate Bayesian Spam Filters

Recently I’ve encountered some problems with our company’s spam filter being a little overzealous, blocking emails from my personal account and from two different lawyers whose emails made it through in the past with no difficulty. But apparently they’ve tightened the screws on our protection.

While it sucks that they blocked emails from those three sources, that’s trivial: I contacted the IT guys and they’ve now whitelisted those three addresses. My serious concern (which the IT guys unfortunately don’t seem to understand) is that I may have discovered just the tip of the iceberg. It’s possible, that unbeknownst to all of us, the system is blocking countless legitimate, possibly critical, emails to thousands of employees in the company. I know one colleague and I are both having legit emails blocked, so it seems logical that everyone is.

Here’s the system they use:
clearswift.com/products/msw/ … logic.aspx

Anyone familiar with that? Is it a relatively good product?

Here’s a truly wacky thing about its recent performance. After I complained to the IT guys, they gave me a readout showing a 97.448% chance that the latest blocked message was spam. There was no mention of sexual organs, viagra or mortgages in the email. It looks totally unspam-like to me.

And here, stranger yet, are the key words that the filter based its decision on: nearby, impending, occasion, newly, him, had, chance, opened, and a couple of others. Each of those words had a rating of over 90%. Huh? Why the heck would a bayesian filter have such common words programmed in as potential spam indicators?

I haven’t thoroughly looked into this for a while.

One article says the best 2004 antispams are:

Clearswift, Cloudmark, Postini, SurfControl and Tumbleweed

If they are using MS Exchange, Trend Micro was (maybe still) the leader.

Before I searched, I was leaning towards Postini. I have heard of Cloudmark as well. Both of those have been around for at least 3-4 years, the last time I looked into corporate anti-spam.

The last time I checked, Cloudmark was community based, people mark emails, the company figures out spam from what people mark.

Postini is outsourced. At least they used to offer only this. not sure now. You repoint your MX record so all email goes first to Postini then to you. Everyone has a web interface where you control how much filtering you want. You can also import allowed addresses.

Since you may have no control over IT decisions, ah well. Who knows if the IT people there look into and care about the very best available solutions. Someone else here can talk about the state of IT in Taiwan.

Why would they use words such as

nearby, impending, occasion, newly, him, had, chance, opened

as potential indicators of spam?

Your IT department has people that can speak? Wow. What do you feed them?