Arch2Arch Tab BEA.com
Syndicate this blog (XML)

Sorting Candidate CV's using the anti-spam algorithm...

Bookmark Blog Post

del.icio.us del.icio.us
Digg Digg
DZone DZone
Furl Furl
Reddit Reddit

Simon Vans-Colina's Blog | November 5, 2006  12:02 PM | Comments (5)


Its 16:44 in the afternoon. For the past 4 hours I've been head down staring at a WebLogic Server log watching stack traces fly past like leaves on a waterfall.

The ticket queue is growing hour by hour and the deadline, while still weeks away is starting to look slightly transparent.

"Simon, Can you have a look at this CV for me and let me know what you think?" Says the Boss

"Oh and Simon, i need to know by 5!" He adds.

Thanks boss. I've devoted my life to understanding computers for the express reason that They're Easier To Understand Than People.
And you've just given me 15 minutes to read-between-the-lines of a 6 page CV that is almost certainally 60% lies.

15 Minutes to absorb the subtle nuaince, the hints of hesitation or confidence. 15 minutes to....
Oh sod it. Most CV's have the same level as truthiness as Spam anyway.. Ill just spend a couple of hours adapting the learning algorithm from a Spam Filter to work on CV's and ill never have to worry about this again.

So Ladies and Gentlemen, i present to you: The worlds first Bayesian CV sorting application.

Online to play with at http://londonmiddleware.org/chaff


*

My tool is based on the excellent "Learning Grep" called CRM114. In the coming weeks i plan on writing an article on how to use it to monitor logs files for new problems.

*Choom Cheef and Chaff, see a pattern here Hoos?


Comments

Comments are listed in date ascending order (oldest first) | Post Comment

  • Lol!!! This is so great. Thank you very much to the pointer to CRM114, that looks very promising. I am looking forward to your article on log file filtering. For the Cv, I dont think it is a good idea to post actual data to a hosted application, but I am really curious if it would work with my set of candidates and requirments. :) Gruss Bernd

    Posted by: b.eckenfels@seeburger.de on November 5, 2006 at 7:47 PM

  • Hi Bernd, I dont actually store the CV's in anyway. I just take a hash of each token and store those, and how frequent they are.
    If you worried about the privacy implicatons, feel free to delete the Candidate name, and any other personally identifiable data.

    Posted by: simonvc on November 6, 2006 at 1:04 AM

  • Lol, a couple of points for you simonVC. Firstly my rule of thumb is any C.V more than 2 pages long will find its way to the recycle bin, no questions asked. If you apply this rule to your tool you will save valuable computer processing time and probably improve the quality of candidates. Second, it had better filter your CV out VC (Sorry bur I couldn't resist the pun).

    Posted by: hoos on November 6, 2006 at 10:31 AM

  • Great idea! If I may make a suggestion... the same principle is often true of recruiter/headhunter emails. You know the story: "fantastic opporturtunity to join this major blue chip yada yada". Oh yeah, if they're so fantastic why can't you tell me who they are? And why are they only offering 15k per year for a senior Java/C++/C/C#/.NET/Ruby/Python/Haskell demigod?

    Looking forward to the article :-)

    Posted by: njbartlett on November 6, 2006 at 5:22 PM

  • A senior Haskell demigod? lol :-)

    Posted by: jonmountjoy on November 8, 2006 at 2:45 AM



Only logged in users may post comments. Login Here.

Powered by
Movable Type 3.31