
Locayta Search - Technical Overview |
>> Locayta Search feature-set |
It is not an exaggeration to say that site search on many websites can be a poor experience for the user. There are a number of well-known problems associated with search engines, which include the following: The user's lexicon differs from data Locayta is different from most search engines in that it is a statistical model that uses formal mathematical definitions to index data. So whilst Locayta doesn't pretend to understand the actual meaning of the words, it does understand the statistical importance of the words being used in the search query. This means that the user doesn't need to know how to structure their search query as Locayta works out what are the statistical important terms in their search query. Boolean dilemma (AND vs. OR) The problem with this, is if you do a search for “olive oil” for example and the Boolean connector is set to “AND” then you may only find products that contain both “olive” and “oil.” You mat not find “extra virgin oil” for example, yet extra virgin oil is olive oil. To try to overcome this problem, most web-sites will set the Boolean connector to “OR.” But by setting it to “OR” a search for “olive oil” will find anything with “olive” and anything with “oil.” This is the dilemma, if the operator is set to “AND” you will get too few results and miss things, but if it is set to “OR” then you will get too many results, much of which isn't strictly relevant to you have searched for. Spelling mistakes and typos Research on eCommerce sites suggest that a lot of misspelling is actually miskeying. In other words, the user knows how to spell the product name, they've just miskeyed the word. Traditionally there are two approaches to the misspelling problem: Another approach is to use a dictionary, which requires creating and maintaining a dictionary specifically for the site. The problem with this approach, apart from the maintenance headache, is that you have to assume that the first few characters of the misspelled word are correct and that any misspelling occurs further along the word (note: if you don’t make this assumption, then any misspelled 5-letter word, could actually be any 5-letter word in the entire dictionary). However, if we know that a lot of misspelling is actually miskeying, then the dictionary approach won’t help as miskeying often occurs with the first strike of the keyboard, so you can’t assume that the first few characters are actually correct. |
To solve the misspelling problem, Locayta uses two algorithms (trigram analysis and Levenshtein edit-distance). Trigram analysis breaks the misspelled words into blocks of characters and tries to work out how to correct the misspelled word in relation to what it knows about the words in the index, using edit-distance as a measure of how misspelled the word is. The two combined algorithms provide a dynamic spell-correction capability. Naïve search results However, there is no reason why word-frequency or word repetition should equate to relevancy, because it doesn't take into account the context in which the data is being searched. For example, a search for “black cocktail dress” might find “wedding dress” if that item has “dress” repeated many times or the search might find “black cocktail bag” because “black” and “cocktail” appear close together. Field weightings Locayta's many years of experience have shown that the relevancy of search results can be significantly improved by using field weightings. Weights can be applied to the different fields within the product data indexed by Locayta. Sometimes it is also useful to include information about the navigational position of the product in the website, within the product index. Optimisation and Control To address this concern, Locayta have surfaced-up most of the configurable elements of the search engine within a control panel. This allows the client to configure items such as:
>> Click here for a free trial of Locayta ESP |