
上QQ阅读APP看书,第一时间看更新
Stemming for raw text
As we saw in Chapter 3, Understanding Structure of Sentences, stemming is the process of converting each word of the sentence to its root form by deleting or replacing suffixes.
In this section, we will apply the Stemmer concept on the raw text.
Here, we have code where we are using the PorterStemmer available in nltk. Refer to Figure 4.5:

Figure 4.5: PorterStemmer code for raw text
The output of the preceding code is:
stem is funnier than a bummer say the sushi love comput scientist. she realli want to buy cars. she told me angrily.
When you compare the preceding output with the original text, then we can see the following changes:
Stemming is funnier than a bummer says the sushi loving computer scientist. She really wants to buy cars. She told me angrily.
If you want to see the difference, then you can refer to the highlighted words to see the difference.