Tokenizing IT jobs
One size does not fit all when it comes to building search applications - it is important to think about the business domain and user expectations. Here's a classic example from recruitment search (a domain which has absorbed six years of my life already...) - imagine you are a candidate searching for IT jobs on your favourite job board. Recall how a full-text index works as implemented in Solr or Elasticsearch - the job posting documents are treated as a bag of words (i.e. the order of the words doesn't matter in the first instance). When indexing each job, the search engine tokenizes the document to get a list of which words are included. Then, for each individual word we create a list of which documents include each word. ...