The analysis of text data in social media is gaining more and more importance every day. The need for companies to know what people think and want is key to invest money in providing customers what they want. The first approach to text analysis was mainly statistical, but adding linguistic information has been proven to work well for improving the results.
One of the problems that you need to address when analyzing social media is time. People are constantly exchanging information, users write comments every day about what they think of a product, what they do or the places they visit. It is difficult to keep track of everything that happens. Moreover, information is sometimes expressed in short sentences, keywords, or isolated ideas, such as in Tweets. Language is usually unstructured because it is composed of isolated ideas, or without context.
I will talk about the problem of text analysis in social media. I will also explain briefly Naïve Bayes classifiers, and how you can easily take advantage of them to analyse sentiment in social media, and I will use an example to show how linguistic information can help improve the results. I will also evaluate the pros and cons of supervised vs unsupervised learning.
Finally, I will introduce opinion lexicons, both dictionary based and corpus-based, and how lexicons can be used in semi-supervised learning and supervised learning. If I have time left, I will explain about other use cases of text analysis.