Detection of duplicate green job postings

Here we go!! I just sent out a press release announcing the release of a new feature that detects duplicate job postings.

You may not think it's important, but it is... it's incredible how the top job search engines don't even bother trying. It's true of indeed.com and simplyhired.com, and my green job search competitor greenjobspider.com.

In the case of greenjobspider.com, they don't bother to clean house at all: they have duplicates, and very old jobs. To complicate things for the job seeker, the date format changes from one posting to another; sometimes it's mm/dd/yyyy, and sometimes it's dd/mm/yyyy.

Here's an example: run a search on greenjobspider.com with keyword "greenpeace".
You'll see the duplicates easily, and the different date formats as well. It's actually very easy to find out they don't find duplicates because in the "from" field where they display the source, there's never more than one source.
In this example (I did it on July 7th, 2010) you'll find:

  • IT Solution team leader: 3 times, from SustainLane, Green Jobs Free and GenGreenLife
  • IT Systems Specialist: 3 times, from SustainLane, Green Jobs Free and GenGreenLife
  • Development Director at Greenpeace : twice, both from Eco.org, with different post dates (April and June)
  • Online Director: 6 times, but at 2 locations, so it's 2 jobs 3 times.

So, of the 24 jobs listed at Greenpeace, it's actually only 14!