The interface between users and the documents they seek is evolving rapidly. At a basic level, search engines look at the terms in the query, then consider a group of documents and bring back those that are the most relevant. For the user, the most relevant document is the one that contains the information they seek. But for the search software, the query is all they know about the user’s intent, and the documents are essentially bags of words. The most relevant document is the one that scores highest. The question for the engine is which bag has the best chance of containing the answer.
In the past Enterprise Search vendors argued about which had the best relevance algorithm, but today, most commercial and open source search engines use a very similar approach to relevance. In fact almost all engines now rely on the open source Lucene library as the basis for relevance ranking. The bag of words has gotten more sophisticated in that the engines consider what order the words are in, which words are near each other, whether they have synonyms, etc. But still they really don’t know much about what lives inside these documents and why.
To truly improve relevancy we need to go one big step farther. We need to start understanding what the documents are actually about. We need to get inside each document and think about what its author is trying to convey. The good news is that using modern AI tools – Natural Language Processing, Machine Learning, Knowledge Graphs and Cloud Based services coupled to a Search Engine this can be achieved at a fraction of the cost or time that it would have taken a few years ago. By applying sophisticated NLP and Machine Learning as the documents are processed and indexed, we can teach the computer to get inside the content and extract insights. By applying these tools and others to establish user intent we can get a more precise view of what each person is seeking. As a result, we can create applications that understand and create value from documents and give users a huge efficiency advantage.
- Market Intelligence – harvesting and Analysing Customer and Competitor data
- Matching Job descriptions to CVs within the Recruitment/Staffing Industry
- Risk Analysis – analysing legal documents to identify areas of risk, perhaps due to legislation changes.
- Identifying Personal / Private Information
- Storage Analytics – scanning and categorizing internal data with a view to reducing storage costs or assisting with Cloud Migrations
- Internal Threat Detection – Analyzing communication and event data within and across an organisation