Tokenizing IT jobs

One size does not fit all when it comes to building search applications - it is important to think about the business domain and user expectations. Here's a classic example from recruitment search (a domain which has absorbed six years of my life already...) - imagine you are a candidate searching for IT jobs on your favourite job board. Recall how a full-text index works as implemented in Solr or Elasticsearch - the job posting documents are treated as a bag of words (i.e. the order of the words doesn't matter in the first instance). When indexing each job, the search engine tokenizes the document to get a list of which words are included. Then, for each individual word we create a list of which documents include each word. ...

May 29, 2018 · Tim Retout

How not to parse search queries

While I remember, I have uploaded the slides from my talk about Solr and Perl at the London Perl Workshop. This talk was inspired by having seen and contributed to at least five different sets of Solr search code at my current job, all of which (I now believe) were doing it wrong. I distilled this hard-won knowledge into a 20 minute talk, which - funny story - I actually delivered twice to work around a cock-up in the printed schedule. I don't believe any video was successfully taken, but I may be proved wrong later. ...

December 2, 2013 · Tim Retout