Twitter and Machine Learning

The value of Twitter’s database of human networks and communication is only just revealing its worth to researchers, analysts and social scientists.

One statistic sheds light on the unseen revolution that will shape the human experience of the future as much as any war or disaster – in just the past 2 years, humans have created over double the amount of data that ever existed previously.

The popularity of social media sites and the ease at which its data is available means these platforms are increasingly becoming primary sources for social research and I’ve spent the last 2 years learning as much as I can about this fascinating well of information and how I can extract and analyse it.

Why Twitter?

Twitter makes it easier to find and follow conversations with its search feature and ability to Google search for Tweets.

Twitter has the hashtag convention, a particularly useful way to gathering, sorting, and expanding searches when collecting data, as well as the ability to classify users.

Twitter data is easy to retrieve as major incidents, news stories and events on Twitter are tend to be centred around a hashtag

It is also important to note that the Twitter API is more open and accessible compared to other social media platforms, vastly increasing its speed of use.

So what can you research with Twitter?

Sentiment analysis works well with Twitter data, as tweets are consistent in length

In my own case I was able to build a twitter sentiment analyzer to assess attitudes to topics and conversations centered around a figure or event and output value semantic analysis and insight.

Twitter’s treasure trove of reactions opinions on every topic under the sun act as a big psychological database, accessed through machine learning.

Twitter is also a rich source of user interests in the form of  the public biography, people followed, logges Retweets and favorites and more. Its an exciting proposition for a researcherwith a lot of information to process.

Using the Twitter API it’s fairly simple to classify and extract relevant information from thousands of Tweets and together they offer incredible insights into social media usage, with a wide variety of applications:

The potential for news or content sites to harness their Followers’ interests is huge and makes it possible to tailor your content to a Twitter audience..

Time series analysis is a common use, examining tweets over a period of time to see when a peak of tweets may occur, and has great effects on predictive modelling.

Network Analysis

The final application I’d like to highlight is the capacity for network analysis with Twitter, visualizing the connections between people to better understand the structure of the conversation surrounding a person, place or event.

bag of words model

look up sentiment value from pre recorded lexicon to assign


write sentiment analyser script


download dependency/ package


the possibilities in terms of analytics applications are endless

Twitter is ripe for empirically informed sociological scholarship