Angry Rant Analysis: Creating Word Clouds in Tableau from Twitter Data

In the field of analytics, one of the areas with the most exciting research is the field of sentiment analysis.  Due to the potential for complex meanings behind even the simplest of statements (we've all had that "what did you really mean" discussion with a loved one), the sentiments of people can be difficult to ascertain without a human analyzing them.

One of the simpler tools currently utilized to understand how people are feeling is called a Word Cloud.  A word cloud is essentially a list of the most common words used, with color and size changed to reflect the frequency.  An example of a word cloud from a popular Beach Boys song is below.

Ha!  We have fun here at AE Business Solutions.  How about this one from Surfin' USA?


Anyway, as anyone who visited Tableauza during the unfortunate period of the NFL Draft series can attest, I'm a bit of a sports nut.  So, I thought it would be fun to do some Tableau sentiment analysis on the Twitter feeds of the two most obnoxious sports personalities I could think of, Skip Bayless and Stephen A. Smith. 

To begin, I collected every tweet from both Stephen A. and Skip from the past month and a half (April 2017 - mid-May 2017).  This was then cleaned and put into two Excel sheets.

Once the data was into Excel I loaded it into a software package called Alteryx, which is a data blending and preparation tool that can load and modify data sets and save the result as a Tableau Data Extract (.tde).  A screenshot of the Excel set-up, as well as the Alteryx drag-and-drop interface is below.

Raw tweet data put into Excel

Raw tweet data put into Excel

Alteryx workflow

Alteryx workflow

The workflow shown above is used to "Tokenize" the workflow, which puts every single word into its own row for each pundit.  An example of this formatting is below.

In order for this word cloud example to work, each word must be in its own row, hence the formatting.

Now that our data is processed, we can make the viz.  To begin, drag your word field (Tweets in my case) to the text card.  Then, drag the same field to the size card and change the measure for that field to "Count".  This will change the viz to a tree map, which we will fix next.  Change the automatic mark selection to text.  Finally, duplicate the "Count Tweets" pill used for the size card and drag it to the color card.  This progression is shown below.

This will generate a word cloud! 

However, it isn't quite fully formed yet.  Typically word clouds filter out uninteresting words like "the", "an", or "he".  We can do that with a filter.  Drag the Tweets pill onto the filter and remove all of the words you find uninteresting (note: if you wish to do a more formal analysis, I would recommend obtaining Alteryx and doing more thorough data prep first to make this easier).

Better, but still overwhelming!  Try adding a condition to the filter to only show the top 500 results.


That's better!  I also added a filter to show Stephen A. or Skip on their own (to the right), but our viz is now complete!

As we can see, the two definitely have an obsession with a few different topics, and to the surprise of no one, Skip Bayless is a shameless Twitter self-promoter.

Back next week with another viz!

Author: Chris Bick