One of my primary concerns in writing my own web publishing platform was how to manage spam. In the WordPress world I used both Akismet, as well as another third-party plugin, to keep things under control. In looking around at various options, I stumbled upon a terrific article at Ned Batchelder's blog on how he manages spam.
His technique can prevent both playback bots, as well as form-filling bots, from submitting garbage data. The process is fairly basic:
- A timestamp field is inserted as a part of the commenting form.
- A spinner field is included, its value being a hash of four key data elements:
- The timestamp
- The client's IP address
- The entry ID of the post being commented on
- A secret
- Field names on the form are all randomized, with the exception of the spinner. The randomization process uses the spinner value, the real field name, and a secret.
- Honeypots are scattered throughout the form, and made invisible to humans through CSS.
Once the form data is submitted, valdation occurs to detect whether a bot was present:
- The spinner gets read to figure out which form fields match which data.
- The timestamp is checked and rejected if it's in too far into the past, in the future, or not present.
- The spinner value is checked to ensure it hasn't been tampered with.
- Honeypots are checked to see if data is provided in any of them.
- The rest of the data is validated as usual.
Ned's article goes further into the details than I have above, so I highly recommend reading it if you're interested in this kind of thing. Time will tell as to whether this technique will be successful at keeping bad comments out, but I'm optimistic that it will.