A better way to spot duplicates

Yesterday I changed the way duplicate postings are detected. I simplified it by removing the city in the comparison criteria.

My policy before was that if the same company was advertising for the same job title in 2 different cities of the same state, it was 2 different jobs.

However, because a lot of the sources we get our jobs from are often not including the city, we ended up have a lot of duplicates.

For example, a Wind Technician at Vestas in Golden CO, was different than a Wind Tehnician at Vestas in CO (no city).

So I made the choice of removing the city in the duplicate algorithm detection. The net result in that we have less jobs (we lost about 400), but more of them were duplicate, so we feel good anyway.


