Thursday, May 23, 2013

SharePoint 2013: Consistent Search Results with Synonyms and Query Rules

When I took a class called "Information Retrieval Systems" back in the 90's at Drexel University I never thought I would have to deal with what we learned directly. While the course covered things like stop words (noise words), lingusitics, synonyms, and basic retrieval of information, it really was the foundation for what search engines are built around today.

My professor's classic example was "Time Flies Like an Arrow" and "Fruit Flies Like an Apple". In the first phrase we want "flies" to be a verb but in the second phrase we want "flies" to be a plural noun. Also, "like" means similiar in the first phrase but means enjoys in the second. a computer-based search engine they are words that will be searched against a content index and there really is no context around the meanings.

Same problem exists today when users want searches to produce the results they expect versus what the search engine actually finds and deems relevant. The specific problem I solved in SharePoint 2013 Search dealt with searching for "401K".

There are several ways someone may search for 401K ->  401K, 401 K, 401-K, and 401(K). Each of these phrases without any tinkering will produce a different set of results. I found that 401K and 401(K) produced the same number of results essentially because the parenthesis are ignored. So at least I had that covered.

The main problem is "401 K" because of the space. This produces results where there is just a 401 by itself or maybe a K by itself (as a middle initial for example). The "401-K" also produced different results as some content may actually contain the dash.

I originally thought this could all be solved just by using synonyms in my SharePoint 2013 Search thesaurus file (which you upload into Search using PowerShell). That was not the case. Then I thought I could solve the problem with a query rule - also not the case.

It turns out I needed to use a combination of thesaurus entries and a query rule to get the exact same number of results for any 401K combination. The ranking may change based on which version the user enters but as long as they get all of the possible results for each version we are making fruit fly.

Since 401K and 401(K) are treated the same, my thesaurus file contained the following:

401 K,401K
401 K,401-K
401K,401 K
401-K,401 K

My condition for the query rule was based on if any version of 401K was entered:

The action for the query rules was to change the ranked results by changing the query:

Essentially I am using quotes to produce exact matches in the content which eliminates any extra results and, by combining with OR statements, I am assuring I get everything out there no matter how 401K is represented.
I really thought the query rule by itself would solve the problem but the synonyms helped seal the deal:
My SharePoint 2013 Search instance is now displaying the same number of results with each version of the search term.


No comments:

Post a Comment

Matched Content