SmarterData.ca: January 2017

Monday, January 23, 2017

Natural Language Processing

Natural Language Processing (NLP) is a technology that Gartner reported as a mature and usable technology as early as 2014. Since then, the focus has been on vertical integration. Products like the Amazon Alexa do a remarkable job interpreting spoken phrases and delivering value through skills.

An area of slower advancement is (and therefore opportunity!): Natural Language Question and Answering (NLQA)

Links:
Gartner 2016 Emerging Technology Key Trends
Forbes Summary

NLQA is currently in the dreaded Trough of Disillusionment (i.e., not meeting user expectations). Vendors like ThoughtSpot are defining the market but so far no one has delivered on the promise (think Hal in 2001: A Space Odyssey ♜ 😎)

The technology behind NLP is mature. If you studied it in school be prepared for a bit of a shock. The days of building and connecting your own parsers, lemma tools and Named Entity Recognizers (NER) are gone. NLP-as-a-Service (NaaS) is here. Driven by machine learning, modern NaaS easily beats the best academic products, like the venerable Stanford NLP Parser, that were state-of-the-art just a few years ago.

If you want to jump to the head of the NLP class, consider using services like the Google Cloud Natural Language API. It works remarkably well without custom training for both text ingestion and question parsing. It implements several high level functions that help developers execute commands and answer questions with greater accuracy and precision. One of its best features: dependency graphs that link verb (action) and noun (thing) phrases to create powerful interpretations without custom programming.

Here is an example of dependency parsing using the Google API:

The moral: NLP is more mature than you might think. Start at the top using NLP services and you can meet NLQA challenges faster than you (and your boss) may have thought possible.

Sunday, January 8, 2017

The Search Mindset

Consider your intent when using a search function. I like to think of two possible expectations:

Discovery - finding data and concepts related to search terms. In other words, "I am not sure what is out there. I want to learn more". Search results typically get wider when showing something like "related concepts" concepts panel.
Refinement - filtering to reduce results according to given search criteria. In other words, "I want to see only related data and concepts. Search results typically get narrower when showing something like a faceted result.

This may not seem like a big distinction but discovery and refinement generally focus on different outcomes. Mixing both in a single result list can be confusing.

With that said, the ability to "pivot" and let a user seamlessly switch from "narrowing" to "widening" activities is a critical feature that distinguishes awesome search engines from the also-rans.

Thursday, January 5, 2017

Relevance, Relevance, Relevance,

Question: What do you think is the NUMBER 1 capability in every great search engine?

For me, the answer never changes.
It is: RELEVANCE!

Finding data with search is easy. Finding too much data is unavoidable. Filtering, sorting, prioritizing and ranking to create the most relevant results has been a primary search goal for decades.

How many times have you found the best answer on the 4th page of Google results? If this was common I am sure you would have switched to something else long ago. Getting relevant results - with the best answers first - is what brings you back for more :).

Irrelevant results rapidly erode user confidence. Search engines that provide poor answers take users from hope to despair in only a few clicks. Once trust is lost, it can be very hard to get back.

Relevance on the Web - Learning from Google

Google has always been the standard bearer for good sets of ranked results. But frankly, they have had an easier job than most enterprise search products. Web content can be sorted and prioritized using straightforward statistics - some as simple as counting the number of sites that point to a given page. Google also gets a boost from hyperlink text that describes a link target. The text in a link literally offers a curated description of the page it points to. Finally, web content is routinely stored with relatively large passages of unstructured text that makes context and meaning easier to determine.

Enterprise Search Analytics can learn from Google (again)

Modern enterprise search systems need to find alternative ways to filter and rank their results. Page popularity and link text aren't nearly enough. Once again, Google shows us some of the options because they no longer exclusively rely on web ranking methods. Let's review some of the recent advancements we have all seen but don't necessarily think of as "relevance builders".

Consider a Google search for the single term "mercury". You get results like:

Google features now include:

Disambiguation of search terms. Note how this is not simply dictionary autocomplete (which is also good). It is a proactive listing of related concepts that answer the question: "Did you mean X?"
Knowledge graph of attributes related to the default concept. It helps you understand that you are asking about the right thing before even looking at the results.
List of related topics. In other words, "Did you also know X?"
What other people searched for. Learning from colleagues is often the quickest route to an answer - especially a high value curated answer.
And finally a "feedback" link to make sure items 1-4 are correct. Identifying inaccurate outcomes is crucial to user happiness (and the underlying machine learning algorithms too).

Oh yeah: there are the search results too. But you already expected that :)

It used to be that enterprise search solutions needed to be different than Google. Nowadays, being like Google is a good place to start.

Tuesday, January 3, 2017

What's new?

Back in the Saddle

No blog posts for 3 and half years. Time to fix that! 🔛

Today I embark on a new career at qlik.com focusing on search driven analytics. What's with that? I did some of the first commercial work in search driven BI 15 years ago. Isn't this a step backward 🔁?

**Search Driven Analytics - Ready for Prime Time.**

So much has happened in the Business Intelligence world. Innovators like Qlik and Tableau have brought BI and analytics to the more people than ever before. In the meantime, big data, Hadoop /Spark, NoSQL, Natural Language Processing (NLP) and Machine Learning / Artificial Intelligence have become commonplace. Open Source has also moved to the forefront. Facets, dimensions and clustering of terms is implemented in several world class open source packages.

Search technology is finally positioned to leverage these advancements in ways that we never envisioned. Conveniently (for me at least), I have spent the last ten years researching and developing apps in all these areas. I am excited at the prospect and can't wait to share some new insights about the current state-of-the-art and where I think we can go.