Page 33 - MSDN Magazine, July 2018
P. 33
Figure 6 Results for the “What Are Annual Rates for My Savings Accounts?” Query
Figure 7 Results for the “What Are Rates for My Savings Accounts?” Query
{
"query": "what are annual rates for savings accounts", "topScoringIntent": {
"intent": "OtherServicesIntent",
"score": 0.577525139 },
"intents": [ {
"intent": "OtherServicesIntent",
"score": 0.577525139 },
{
"intent": "PersonalAccountsIntent", "score": 0.267547846
}, {
"intent": "None",
"score": 0.00754897855 }
],
"entities": [] }
{
"query": "what are rates for my savings accounts", "topScoringIntent": {
"intent": "PersonalAccountsIntent",
"score": 0.71332705 },
"intents": [ {
"intent": "PersonalAccountsIntent",
"score": 0.71332705 },
{
"intent": "OtherServicesIntent", "score": 0.18973498
}, {
"intent": "None",
"score": 0.007595492 }
],
"entities": [] }
different categories are treated as a probability distribution, and therefore should be in the range of [0,1] and sum to 1. LUIS out- puts confidence scores in that range for the defined intents and the additional None intent, but those aren’t guaranteed to sum to 1. Therefore, when using LIME, I’ll normalize the LUIS scores to sum to 1. (This is done in the function call_with_utterance.)
The code listed in Figure 4 uses LIME to produce an explanation about the prediction for the utterance, “what are annual rates for my savings accounts?” It then generates an HTML visualization, which is presented in Figure 5.
In Figure 5 you can see the predicted probabilities for the utterance, focused here on PersonalAccountsIntent rather than the two other intents, OtherServicesIntent and None. (Note that the probabilities are very close to but not exactly the same as the confidence scores output by LUIS due to normalization.) You can also see the most significant words for classifying the intent as PersonalAccountsIntent (those are words on top of the blue bars and are also highlighted in blue in the utterance text). The weight of the bar indicates the effect on the classification confidence score should the word be removed from the utterance. So, for example, “my” is the word with the most significant effect for detecting the utterance’s intent in this case. If I were to remove it from the utter- ance, the confidence score would be expected to reduce by 0.30, from 0.56 to 0.26. This is an estimation generated by LIME. In fact, when removing the word and feeding the “what are annual rates for savings accounts?” utterance into LUIS, the result is that the confidence score for PersonalAccountsIntent is 0.26 and the intent is now classified as OtherServicesIntent, with a confidence score of about 0.577 (see Figure 6).
Other significant words are “accounts” and “savings,” which together with “my” provide similar insights to the ones provided by Scattertext. Two important words with significant negative weights are “annual” and “rates.” This means that removing them from the utterance would increase the confidence scores for the utterance to be classified as PersonalAccountsIntent. Scattertext showed that “rates” is more common in utterance examples for Other-
ServicesIntent, so this isn’t a big surprise.
msdnmagazine.com
However, there is something new to be learned from LIME— the word “annual” is significant for LUIS in determining that the intent in this case doesn’t belong in the PersonalAccountsIntent, and removing it is expected to increase the confidence score for PersonalAccountsIntent by 0.27. Indeed, when I remove annual before feeding the utterance, I get a higher confidence score for the PersonalAccountsIntent intent, namely 0.71 (see Figure 7).
In this way, LIME helps you identify significant words that drive classification confidence scores. It can thus provide insights that help you fine-tune your utterance examples to improve intent classification accuracy.
Wrapping Up
I have shown that when developing an application based on NLU, intent prediction for some utterances can be rather challenging and can be helped by a better understanding of how to fine-tune utterance examples in order to improve classification accuracy.
The task of understanding word-level differences and similarities among utterances can yield concrete guidance in the fine-tuning process. I’ve presented two open source tools, Scattertext and LIME, that provide word-level guidance by identifying significant words that affect intent prediction. Scattertext visualizes differences and similarities of word frequencies in utterance examples, while LIME identifies significant words affecting intent classification
confidence scores.
I hope these tools will help you build better NLU-based products
using LUIS. n
Zvi Topol has been working as a data scientist in various industry verticals, including marketing analytics, media and entertainment and Industrial Inter- net of Things. He has delivered and lead multiple machine learning and analytics projects including natural language and voice interfaces, cognitive search, video analysis, recommender systems and marketing decision support systems. He can be contacted at zvitop@gmail.com.
Thanks to the following Microsoft technical expert who reviewed this article: Ashish Sahu
July 2018 27