Locayta Search Mobile includes the following functionality:
Locayta's text relevance algorithm is derived from the powerful BM25 free-text search algorithm to return the best result for a user's search terms. For further details, refer to our Ranking Algorithm page.
In most languages, the same concept can be written in several different ways; the most obvious being pluralisation of words, but also different verb endings such as 'ed' for past tense, or 'ing' for present tense, eg 'play', 'played', 'playing', 'player'.
Word-stemming is the process of removing these suffixes and pluralisations to determine the 'stem' of each word in the record. The same process is also applied to the user's query words. This allows the user to search, for example, for 'dresses' and still find products that are described as a 'dress' (and vice versa).
Word-stemming for different languages is often required too; for example in German, pluralisation is often more complicated than simply adding an 's' onto the end of the word. Locayta has stemming algorithms available for 14 languages.
Users often misspell or make a typographical error while typing their query into a website's search box. Spelling correction is therefore very useful in allowing the search engine to still find results, so as not to confuse the user with zero results if they don't notice that they have misspelled their query.
Locayta performs spelling correction using an algorithm called tri-gram analysis. This uses the concept of edit-distance to determine which of the words available in the search index most closely matches the user's term, and accommodates for mistakes at the start of the word as well as in the middle and at the end. The leniency of the spell-corrector can be configured too.
This allows the term the user is currently typing to be guessed and suggested as a completion, based on the words known to be in the search index. For example, if the user is typing "clo..." the suggestions might include "clothing".
Users familiar with search engines such as Google will be used to entering queries containing AND and OR to perform an advanced search. Locayta supports common operators (AND, OR, NOT, double-quotation marks to denote a phrase, and a few others) to accommodate these users.
For simple queries, a default operator can be chosen. This allows a default of “matching all words” or “matching any words” to be set. The ESP platform also allows a staggered approach, eg: Use AND unless less than five results are found, in which case use OR. This behaviour can be customised.
Results can be sorted using any type of field (or by text relevance as described in the section above), provided that the index has been configured to be able to sort on that field. Sorting on text fields can optionally be case-sensitive, where this matters.
Synonyms are a way of solving any differences between the user's lexicon and the data's. For example, a user might search for 'xmas gifts', while the product data tends to describe products as 'a great present for Christmas'. Spelling correction would not be able to correct 'xmas' to 'christmas' or 'gifts' to 'presents', so the user's search would return few results here.
However, synonyms can be set up so that this does happen, allowing the user to find more relevant results, despite not knowing how the terms are phrased in the product data. Synonyms are the easiest way to solve low result searches.
Field weighting allows a greater importance to be placed on certain fields compared to others. This is especially useful in the case of product titles vs product descriptions. If a user searches for a term that appears in the title of one product and the description of another one, the product whose title matches should be returned higher in the results than the one that only matches the description.
This logic can also be applied to other parts of the products' information, if it is available, such as the brand name or manufacturer.
In addition to probabilistic terms (natural language words) Locayta can also search for boolean terms, which allows results to be filtered by their particular properties. For example, products can be filtered by a particular brand, if that data is available.
In certain situations, neither text relevance or sorting by a field is the best solution. In these cases, Locayta has developed a technology called balancing that allows the results to be skewed by the value of a field rather than completely sorted.
The amount the results are skewed can also be controlled using a balance factor, which is a percentage value between 0% (the results are not skewed) and 100% (the results are skewed such that their order is identical to as if they had just been sorted by the field).
This is especially useful when searching for products in a database that also contains accessories; a lot of the time searching for a product will return its accessories first in the results, so a balance factor can be set such that non-accessories are boosted to the top of the results in this case, while still allowing accessories to be found if the user searches for them specifically.
Locayta Search Mobile can be used to sort results in order of their distance from a known central point, for instance the user's current location.
This can be used to generate context-based list of refinement options based on the contents of the current results set (rather than a pre-determined list). On websites these lists generally appear at the side of results, and the user can choose to narrow their search to just one (or several) of the options generated. This is also known as Guided Navigation.
Each result found by Locayta is given a percentage based upon how well it matches the user's search terms. This is governed by the BM25 algorithm mentioned above. This means that results can be ordered according to their relevance, and less relevant results are placed lower down in the results list.However, when sorting or balancing results, textually irrelevant results can end up being presented on the first page of results, which is often undesirable. To avoid this happening, Locayta can be configured to ignore results that are less relevant than a certain percentage.
This also makes the number of results returned smaller, which can be less daunting to a user who would otherwise be presented with dozens or hundreds of pages of results to look through. Adding a threshold also gives a small improvement on the speed in which results are returned, as less data has to be fetched.
Future releases will include additional functionality, such as the following:
Stop words are another useful tool in making sure that the results match what the user is looking for. They are designed to strip words out of the user's query that a lot of documents will match, such as 'to', 'the', 'a', or 'in'. A lot of product descriptions will contain these words despite having nothing to do with the user's intention.
Stop words can also be used to speed up the results a little, by removing words that often return a lot of generic results that aren't very useful. For example, in a clothing search, 'clothes' might be a stop word because if the user searches for 'red clothes', we can assume they will want to search for 'red' in every product, rather than just products containing the word 'clothes'.
More like this
When browsing search results or individual products, users may want to view products that are described similarly to the one they are currently viewing. Locayta can perform a search based on the ID of a product, and automatically calculates search terms that will return similar products in results. This is done by examining the words used to describe the product (its title, description, etc) and choosing ones which are fairly specific to that product. It then searches for these terms in the index.
This is a different approach to behavioural recommendations because it looks only at the text describing a product, rather than examining user behaviour, and therefore makes it particularly useful on sites where that information is unavailable (for example, in new sites where no data has been accumulated yet), or for searches on data where user behaviour isn't considered useful.