Azure Text Analytics for simple Natural Language Processing

Azure Text Analytics for simple Natural Language Processing

(cover image from Digital Trends)

Among all the Azure Cognitive Services one can consider when implementing a scenario that requires natural language processing, LUIS is the obvious choice. Short for "Language Understanding Intelligent Service", LUIS has been around for many years now and is the actual engine that makes Cortana understand you.

LUIS lets you design your domain by defining intents and entities. You can think of intents as the verbs and entities as the nouns of the sentences you expect it to understand. So when you fire up Cortana and say "Wake me up tomorrow at 6", "set an alarm" might be the intent and "next day at 6am" the entity. Once your model is designed, you train it by providing utterances, which are examples of inputs, review how LUIS breaks them down in intents and entities, and correct this classification if it's wrong.

LUIS is undeniably very powerful and once you get comfortable with the process of domain modeling, you can design extensive and complex models. But there's a significant head start required when tackling a new project; you have to figure out the best way to define your domain in terms of intents and entities, and you should then dedicate enough time to train your model in order to achieve acceptable performance.

Looking at Azure Text Analytics

Azure Text Analytics is part of the same Cognitive Services family as LUIS. When I present the spectrum of services available on the Microsoft AI stack, I usually classify them along 2 dimensions: customization vs. ease of integration.

MS-AI-spectrum

In this chart, Text Analytics sits one layer below LUIS because it doesn't require you to train it with your own data. It is already trained by Microsoft on stock data that makes it ready-to-use.

Text analytics can extract 3 kinds of information from the text you send to its API:

  • language detection: which language is predominantly used
  • sentiment analysis: as a single number ranging from 0 (negative) to 1 (positive)
  • key phrases: words or groups of words describing the main points

The main use-case for key phrase extraction is to identify general ideas from large, unstructured text. But let's see how it behaves with commands one would typically send to LUIS to analyse their meaning (note that this test has been made in January 2018 and only reflects the service's performance at this point in time).

Wake me up tomorrow at 6

No luck here, as the service didn't identify any key phrase

Set an alarm tomorrow at 6

Key phrase identified: "alarm"

Turn on the oven

Key phrase identified: "oven"

Where is the nearest library?

Key phrase identified: "nearest library"

How will the weather be this weekend?

Key phrases identified: "weather", "weekend"

Call a taxi

Key phrase identified: "taxi"

These examples obviously show that most of the time, Text Analytics successfully extract the entities, i.e. the most important "things", but does not provide any clue about the intent of the sentences as the verbs just seem to be ignored.

Good enough for simple scenarios or quick prototypes

It comes to no surprise that Text Analytics is not a replacement for LUIS. But there is certainly a class of use-cases where it can be used in place of LUIS, especially if you're dealing with situations where intents are implicitly carried by the entity: "taxi" most probably means that the user wants to call a taxi; "nearest library" means that the user is looking for such a place.

Using Text Analytics can also help to kickstart a prototype where you don't need to exhibit comprehensive NLP, but just want to use it as a placeholder that's readily functional. You may replace it with LUIS later down the road, once you have some time to dedicate on building and training a LUIS model.

Combine them to get the best of both

It's also important to note that a LUIS model is built for a specific language. LUIS does not perform any kind of language identification and will assume the text input to be in a specific language. So for multi-language interfaces, like chatbots exposed to a broad multi-national audience for example, you need to identify the language used by the user first.

By submitting the input to Text Analytics, not only do you get language identification in return, but you can also:

  • use key phrase extraction to potentially disambiguate LUIS' intent/entities analysis
  • use sentiment analysis to adapt the response to the user

Visit the Text Analytics documentation to learn more about this service and feel free to share your feedback in the comments below!

Comments