Page 31 - MSDN Magazine, November 2017
P. 31
and used the conjunction word “but” to combine it with the third sentence. You’ll see that I have a purpose in doing so.
I’ll now call the Analyze Text operation of the Linguistic Analysis API with the following request body:
{
period character at the end of the second sentence in the original text in order to highlight this capability of the Linguistic Analysis API.
Better Together
This little experiment demonstrates an extremely reliable way to break the original example into three separate sentences, which can now be processed using the Text Analytics API to gain deeper insight about the customer input. You might think that simple string parsing and splitting would provide a similar result offline, but that’s easier said than done—take my word for that.
Now, in the same manner, I can take the output of the Linguistic Analysis API and, with a simple string-manipulation routine, in any programming language, separate the review into individual parts. The original example can be broken down into the following three parts:
"language" : "en",
"analyzerIds" : ["22a6b758-420f-4745-8a3c-46835a67c0d2", "08ea174b-bfdb- 4e64-987e-602f85da7f72"],
"text" : "This phone has a great battery.
The display is sharp and bright but the store does not have the apps I need."
I’m sending this request with the Constituency Tree and Tokens analyzers only. Figure 5 shows the result.
As you can see, the Constituency Tree analyzer broke the text record in to two sentences and indicated the conjunction word “but” with the tag CC in the result. As noted, I purposely removed the
}
Figure 5 Using the Analyze Text Operation of the Linguistic Analysis API
[ {
] },
{
"analyzerId": "08ea174b-bfdb-4e64-987e-602f85da7f72", "result": [
{
"Len": 31, "Offset": 0, "Tokens": [
{
"Len": 4, "NormalizedToken": "This", "Offset": 0,
"RawToken": "This"
}, {
"Len": 5, "NormalizedToken": "phone", "Offset": 5,
"RawToken": "phone"
}, {
"Len": 3, "NormalizedToken": "has", "Offset": 11, "RawToken": "has"
}, {
"Len": 1, "NormalizedToken": "a", "Offset": 15, "RawToken": "a"
}, {
"Len": 5, "NormalizedToken": "great", "Offset": 17,
"RawToken": "great"
}, {
"Len": 7,
"NormalizedToken": "battery", "Offset": 23,
"RawToken": "battery"
}, {
"analyzerId": "22a6b758-420f-4745-8a3c-46835a67c0d2", "result": [
"(TOP (S (NP (DT This) (NN phone)) (VP (VBZ has) (NP (DT a) (JJ great) (NN battery))) (. .)))",
"(TOP (S (S (NP (DT The) (NN display)) (VP (VBZ is) (ADJP (JJ sharp) (CC and) (JJ bright)))) (CC but) (S
(NP (DT the) (NN store)) (VP (VBZ does) (RB not)
(VP (VB have) (NP (NP (DT the) (NNS apps)) (SBAR (S (NP (PRP I)) (VP (VBP need)))))))) (. .)))"
"Len": 1,
"NormalizedToken": ".", "Offset": 30, "RawToken": "."
} ]
}, {
"Len": 76, "Offset": 32, "Tokens": [
{
"Len": 3, "NormalizedToken": "The", "Offset": 32, "RawToken": "The"
}, {
"Len": 7,
"NormalizedToken": "display", "Offset": 36,
"RawToken": "display"
}, {
"Len": 2, "NormalizedToken": "is", "Offset": 44, "RawToken": "is"
}, {
"Len": 5, "NormalizedToken": "sharp", "Offset": 47,
"RawToken": "sharp"
}, {
"Len": 3, "NormalizedToken": "and", "Offset": 53, "RawToken": "and"
}, {
"Len": 6, "NormalizedToken": "bright", "Offset": 57,
"RawToken": "bright"
}, {
"Len": 3, "NormalizedToken": "but", "Offset": 64, "RawToken": "but"
}, {
"Len": 3, "NormalizedToken": "the", "Offset": 68, "RawToken": "the"
},
}
{
"Len": 5, "NormalizedToken": "store", "Offset": 72,
"RawToken": "store"
}, {
"Len": 4, "NormalizedToken": "Offset": 78, "RawToken": "does"
}, {
"Len": 3, "NormalizedToken": "Offset": 83, "RawToken": "not"
}, {
"Len": 4, "NormalizedToken": "Offset": 87, "RawToken": "have"
}, {
"Len": 3, "NormalizedToken": "Offset": 92, "RawToken": "the"
}, {
"Len": 4, "NormalizedToken": "Offset": 96, "RawToken": "apps"
}, {
"Len": 1, "NormalizedToken": "Offset": 101, "RawToken": "I"
}, {
"Len": 4, "NormalizedToken": "Offset": 103, "RawToken": "need"
}, {
"Len": 1, "NormalizedToken": "Offset": 107, "RawToken": "."
} ]
} ]
"does",
"not",
"have",
"the",
"apps",
"I",
"need",
".",
msdnmagazine.com November 2017 27