Feed the Meter! Part Two's Long-Awaited Conclusion

Hello all!  Sorry for the delay in blog postings, but I'm excited to say I'm back to finish this post!

To refresh you on the last blog, we are trying to do a deep dive into neighborhood parking tickets.  Last time we defined the neighborhoods, now we are going to analyze the actual tickets.

The first step in doing this is defining what neighborhoods our parking tickets were in.  In order to do this we are going to utilize a software package called Alteryx, which has a bunch of cool capabilities with data analysis and blending.  Fortunately for us, one of the things it does well is spatial analysis.

For our purposes we are going to use Alteryx to tell us what neighborhoods parking tickets occurred in.  Since this isn't an Alteryx blog I will just discuss the below workflow at a high level.

Essentially what we are doing here is assigning point locations for each parking ticket in the upper workflow, defining boundaries for each neighborhood in the below workflow, then comparing the two to see what tickets fall within what neighborhoods, and throwing out the tickets that don't fit.  If you'd like to know more about how this works drop me a line at chris.bick@aebs.com or check out alteryx.com.

Once this is done, we have assigned neighborhoods for all of our parking tickets, which we can upload into Tableau.

The first thing we need to do is blend the previously created visualization with this one, which we can do by linking on Shape ID (see below).

Once this is complete we can change our color coding to reflect a count of tickets (see view options in the viz below) to see where the bulk of our parking tickets are occurring.

Capture.PNG

As can be seen the east side has by far the most parking tickets issued, with over 145,000!  Those poor college kids...

After adding some polish to the viz, we come up with the following result.

As you can see, the East Side does get the most parking tickets, with downtown being a close second. 

Back again soon with another viz!

Author: Chris Bick

 

Feed the Meter! A Visual Exploration of Parking Tickets in Milwaukee

Part One: Defining Neighborhoods

Regardless of what metropolitan area you may be most familiar with, one common nuisance is the dreaded parking ticket.  Whether you don't feed the meter in time, pick the wrong space, or flip the bird to a parking enforcement officer, parking tickets are very easy to get.  In fact, in the city of Milwaukee hundreds of thousands of parking tickets are issued every year!

Over the next couple of weeks we are going to do a deep dive into this issue for the good land, Milwaukee. 

The first step that we will complete this week is to define Milwaukee neighborhoods for aggregation.  In order to do this we have to create custom polygons for Tableau's map.  This is an arduous process, but a cool feature to use! 

The first step is to set up an Excel sheet with the proper columns for the map.  In our case, we need to capture the neighborhood (the name of the polygon), the order of the points (think of a connect-the-dots image), the shape ID (only needed if you have non-contiguous shapes, which we do), the latitude, and the longitude.

One this is done, we need to find a map of neighborhoods (unless you know them already).  In my case, I used the map below to approximate neighborhood starts and stops (note: I did take some liberties, but my end product is very similar).

After obtaining the map, I used Google Maps to approximate the points for each neighborhood's boundaries.  This is a tedious task, so if you would like to get approximate Milwaukee neighborhood points please contact me at chris.bick@aebs.com and I would be happy to e-mail you the raw data!

After all of this is done, we are ready for Tableau!  To begin, make sure that all of your fields other than longitude and latitude are dimensions.  You will probably have to make some changes for this to happen.  Once your data is categorized properly, conduct the following steps.

  1. Set the mark type to "Polygon"
  2. Drag the field with the sequence of your points (in my case it is Point Order) to the path card
  3. Drag the shape ID (or the neighborhood if yours are contiguous) to the detail card
  4. Drag the neighborhood to the color card
  5. Double-click on longitude and latitude to bring them into the viz

And there you have it, we have our neighborhoods defined!  Next week we will use this map to analyze our parking tickets to draw some conclusions. 

Author: Chris Bick

Angry Rant Analysis: Creating Word Clouds in Tableau from Twitter Data

In the field of analytics, one of the areas with the most exciting research is the field of sentiment analysis.  Due to the potential for complex meanings behind even the simplest of statements (we've all had that "what did you really mean" discussion with a loved one), the sentiments of people can be difficult to ascertain without a human analyzing them.

One of the simpler tools currently utilized to understand how people are feeling is called a Word Cloud.  A word cloud is essentially a list of the most common words used, with color and size changed to reflect the frequency.  An example of a word cloud from a popular Beach Boys song is below.

Ha!  We have fun here at AE Business Solutions.  How about this one from Surfin' USA?

Capture.PNG

Anyway, as anyone who visited Tableauza during the unfortunate period of the NFL Draft series can attest, I'm a bit of a sports nut.  So, I thought it would be fun to do some Tableau sentiment analysis on the Twitter feeds of the two most obnoxious sports personalities I could think of, Skip Bayless and Stephen A. Smith. 

To begin, I collected every tweet from both Stephen A. and Skip from the past month and a half (April 2017 - mid-May 2017).  This was then cleaned and put into two Excel sheets.

Once the data was into Excel I loaded it into a software package called Alteryx, which is a data blending and preparation tool that can load and modify data sets and save the result as a Tableau Data Extract (.tde).  A screenshot of the Excel set-up, as well as the Alteryx drag-and-drop interface is below.

Raw tweet data put into Excel

Raw tweet data put into Excel

Alteryx workflow

Alteryx workflow

The workflow shown above is used to "Tokenize" the workflow, which puts every single word into its own row for each pundit.  An example of this formatting is below.

In order for this word cloud example to work, each word must be in its own row, hence the formatting.

Now that our data is processed, we can make the viz.  To begin, drag your word field (Tweets in my case) to the text card.  Then, drag the same field to the size card and change the measure for that field to "Count".  This will change the viz to a tree map, which we will fix next.  Change the automatic mark selection to text.  Finally, duplicate the "Count Tweets" pill used for the size card and drag it to the color card.  This progression is shown below.

This will generate a word cloud! 

However, it isn't quite fully formed yet.  Typically word clouds filter out uninteresting words like "the", "an", or "he".  We can do that with a filter.  Drag the Tweets pill onto the filter and remove all of the words you find uninteresting (note: if you wish to do a more formal analysis, I would recommend obtaining Alteryx and doing more thorough data prep first to make this easier).

Better, but still overwhelming!  Try adding a condition to the filter to only show the top 500 results.

Capture.PNG

That's better!  I also added a filter to show Stephen A. or Skip on their own (to the right), but our viz is now complete!

As we can see, the two definitely have an obsession with a few different topics, and to the surprise of no one, Skip Bayless is a shameless Twitter self-promoter.

Back next week with another viz!

Author: Chris Bick

Planes, Trains, and Automobiles! Part Three: Automobiles

Loyal fans of the Tableauza! blog, today we will put a bow on another multi-part series.  It's car day!

For our automobiles question, we are going to take a look at the top-selling automobiles in 2016 with a cool viz!  After all, I am supposed to write about how to do things in Tableau.  Probably should do that every now and then...

The first step is to compile the data in Excel.  Pretty easy, below is the data set.

Once we have our data set, we import it into Tableau.  A few short clicks later, and we have the following:

Capture.PNG

Meh.  Not very exciting, right?  We can color code the chart by the makers, but that doesn't do a ton either.  But what if we incorporate pictures of the car into the viz?  I can hear your gasps of excitement through the computer screen (and in the future), so I will continue.

First things first, we need to go online to get pictures of the cars.  Once we have the pictures we will want to get them into Tableau as shapes.  To do this, we add them to the Tableau shapes repository, which can be found in the Documents folder (see below).

Now that the pictures can be accessed in Tableau, we can begin the process of utilizing the pictures as points on a plot.

  • Change the mark type in your view to Shapes
  • Drag the Make and Model (combine into one field) onto the Shapes card (see below)
  • Click on the Shapes Card and change the mark types to the folder just added (see below)

 

  • Adjust the Size Card to make the images larger (see below)

Whoops, we have an issue.  The pictures did not come in the right order!  To correct this, rename the pictures in the file with the order they should appear (see below). 

Add in a label and we're all done!

Back next week with a new exploration!

Author: Chris Bick

Planes, Trains, and Automobiles! Part Two: Trains

Welcome readers!  Today we will continue our foray into commuter transportation options, this time with commuter rail. 

For this analysis, we are going to look at the spread (or stagnancy) of high-speed rail.  High speed rail is a much-ballyhooed form of transportation that is gaining in popularity, specifically in Eurasia.  Due to the high speeds (upwards of 320 km/hr, or 200 miles/hr) and high efficiency (high speed rail, once constructed, uses less BTU/passenger mile than airplane or automotive travel) high speed rail is becoming more prevalent throughout the world.

But who is the leader in this phenomenon?  Let's take a look at the top nations in high speed rail constructed through 2016.

Whoa, that is stark.  As it turns out, China has almost eight times as much high speed rail track constructed as the next closest nation, Spain.  Let's try that again without the outlier.

This is a little more revealing.  As it turns out, Western Europe has several countries leading the way in terms of high-speed rail (Spain, Germany, France, and Sweden jump out), and Japan has a fair amount as well. 

What happens when we bring the size of the countries into the analysis?  If we look at density of track by size of the country, we get a different picture.

When size of the country is brought into play, some of our mid-table countries in terms of total mileage look much better.  The number one country in this metric is actually Taiwan, with South Korea, Germany, and Japan close behind.  Given the size of China (third largest country in the world), the fact that it ranks 13th is fairly impressive.

So why so little high speed rail in our very own USA?  Well, there have been numerous studies done to look into the impact of high speed rail on carbon emissions, the most common reason cited for implementation, and it is not as clear cut as one might think.  While the actual transportation itself is more efficient than auto or plane, the construction is somewhat of an ecological nightmare. 

In order to secure the structure for high speed rail, most design plans utilize a great deal of concrete, the creation of which is a major source of carbon emissions.  There is also a feeling that Americans would be hesitant to grab onto high speed rail, due to the stigma of public transit.  Will the calculus change?  Only time will tell, but it is certainly well on the way for other countries.  Below is a table showing planned high speed rail mileage by country.

The future is here...for some.

Author: Chris Bick

Planes, Trains, and Automobiles! Part One: Planes

Greetings Tableau enthusiasts!

This week we are going to begin a new three part series, this one inspired by John Candy and Steve Martin.  That's right, it is Planes, Trains, and Automobiles!

Part one will look into a plane-related topic, in this case the busiest airports in 2016.

To analyze this question I acquired data on passengers through airports from the most reliable source out there, Wikipedia.  From there I imported it into Tableau and mapped the 50 busiest airports.

Unfortunately, my knowledge of world geography is suspect, so many of the Chinese cities and provinces took some time to get straight.  However, at the end of the day the following visual was produced.

Two things stuck out from this visual.  First and foremost, Hotlanta!  The US has the busiest airport in the world, Atlanta, which had over 100,000,000 passengers go through it in 2016.  Wow.

But, it does appear that the US rivalry with China is going strong!  China has the second busiest airport, Beijing, and seven airports on the list.  So, which country is #1?  Let's go to the viz to find out!

When looking at the top 50 airports only, it appears the US lead is safe.  Looking at the top 16 airports in the US, commuter traffic was over 880,000,000!  Holy cow. 

Back next week with trains!  Whatever could that viz be...

Author: Chris Bick

What Else is On? The NFL Draft and TV Ratings

Well, we have finally reached the end of the ill-conceived draft series.  To put a bow on things, today we will look at the entertainment side of the NFL draft to see where it is most popular.

To do this, I looked up the top ten TV markets for the first round of the draft in 2014, 2015, and 2016, as well as overall draft TV ratings for the US.  Let's check it out!

Below is a map of the top ten TV markets for 2014, 2015, and 2016 (via Tableau, of course...)

Those of you who have been following this series likely won't be too surprised by this, but there is a heavy concentration in a few key areas.  The first of these, and number one market by percentage for three years straight, is good ol' Cleveland Ohio (Columbus was in second each year as well, with Dayton added in two of three years).  There was also a consistent showing from New Orleans, which was a bit of a surprise.

But what about the national numbers?  Let's take a look!

With the exception of a spike in 2014 (Johnny Football!) the ratings are pretty flat.  Will this year be different?  Who knows. 

Back next week with something more interesting (who could have predicted a seven part series would be difficult to think up topics for?)

Author: Chris Bick

The More the Merrier? Correlating Success with Quantity in the NFL Draft

As a long-suffering fan of the Cleveland Browns, I am no stranger to just about every NFL Draft strategy out there.  Trade up for a QB?  Been there.  Trade picks for players?  Yep.  How about players for picks?  Of course. 

For the past couple of years, the new-look, analytics-driven Browns have been amassing picks for the new "acquire assets" model championed by the Patriots.  But does it have basis in fact?  Let's dive in?

We will start with a simple Tableau scatter plot of win percentage vs. number of picks for the year.  What do we get?

Uninspiring to say the least.  The complete lack of a correlation is backed up by adding a regression line.

The lowly R-Squared value of 0.02% tells the tale.  Yeesh.

But what if we look at the teams a year or two in the future?  To do this, I created a delayed field by joining on a calculated field and two years to the draft season.   See below.

This join will show us win percentages correlated with number of picks from the prior year (2016 record vs. 2015 pick count).  The results are below.

Still nothing great, but the R-Squared percentage is up to 1.5%.  However, it actually shows a negative effect! 

What about 2 and three years down the road?  Those results are below.

As we can see, while the trend moves back to a positive one as we would expect, it is still not a significant predictor of success (R-Squared values of 2.1% for a two-year lag and 0.4% for a three-year lag).

Well, once again Browns fans' hopes are dashed.  Oh well.

Back next week with the conclusion of the NFL Draft series!

Author: Chris Bick

 

The Pretentiousness Index: A Quest to Quantify the Most and Least Pretentious States in the U.S.

Devoted readers of this blog, I have decided to take a break from football from the week, so hopefully you enjoy this change of pace!

As a proud driver of a Toyota Prius who has no qualms with using words like pedantic and kerfuffle in everyday speech I can safely say that pretentiousness is in my blood.  Whether it is saying that American Football isn't as enjoyable as real football (you know, the kind where you kick the ball) or simply rolling my eyes at someone who says the word literally before making a wildly unrealistic claim (OMG, those french fries are literally the best thing I have ever eaten), pretentiousness can be a blast!

But which states do it best?  To answer this question I set out to define a "Pretentiousness Index" based off of some search terms using Google Trends.

To obtain data, I settled on four search terms, two representing pretentiousness and two representing "average joes".  For the first group, I selected "Kale" and "Yo-Yo Ma", and for the second group I selected "Monster Trucks" and "Cheetos".  The resulting data set, along with the index calculations, is below.

As you can see, the Pretentiousness Index (PI) is a simple calculation.  Now let's throw it into our old friend Tableau to see it in color!

To the surprise of very few (presumably), the Northeast and Northwest were clear winners in the most pretentious rankings.  Vermont came in at a staggering 240% (the runner-up, Massachusetts, was way behind at 190%).  On the other end of the spectrum Mississippi cleared the field comfortably at 32%, while the Mountain Momma herself, West Virginia, came in second at 40%.

Well, I'm going to cut this post off here before someone takes offense to this nonsense; I'll be back next week!

Author: Chris Bick

How Do 40 Times Change From Round to Round? A Study Using a Flimsy Question to Show Off a Cool Tableau Feature

In the months leading up to the NFL Draft, potential players are poked, prodded, and tested in just about every imaginable way.  Perhaps the most well-known of all of these measurements taken by teams is the 40-yard dash, which is the agreed-upon way to test speed of players.

Today we are going to take a look at how the average 40 times change by round.  Is there a direct relationship between draft round and 40 time?  Let's find out!

To begin, I have connected Tableau to a table showing 40 times for all draft picks from 1999-2015. 

Once the data was uploaded, I created a simple bar chart showing average 40 yard dash times by position.

Once the basics were there, I color-coded the positions and grouped them to clean up the view (Offensive Linemen, Defensive Linemen, Defensive Backs, and Linebackers were put into their own categories, and special teams players were removed).  This resulted in the view below.

Ok, it at least looks better than Excel now.  But this doesn't answer the question we originally posed, and it certainly is not a cool Tableau feature!  So let's dive deeper.  How can we show the round-by-round comparison? 

Pages!  Pages is a new-ish feature for Tableau that allows us to animate our graphs based on an attribute.  In this case, I will use the round of the draft.

To do this, I just drag round onto the Pages card as is shown below.

By doing this there is now a new control under our color legend, that will allow for scrolling the graph by draft round, and even allows animation!  The control, and an explanation of its features, is below.

These controls allow us not only to scroll from round to round, but to let Tableau do it automatically for us!  Let's take a look at our updated graph (first round and seventh round).

If you look closely you can see some differences, but it doesn't really "pop".  Let's make a few more small changes (fix the axis, clean up the titles). 

There you have it!  We can now clearly see the rise in times from the first round to the seventh.  This is even more apparent when you animate the graph using the page feature.

I hope you enjoyed this demo of the page feature thinly veiled as draft analysis! 

Author: Chris Bick

Tableau and Modified Paretos of NFL Injuries (or Why I am Okay with a Desk Job)

It is no secret that playing in the NFL is a physically demanding job, and along with that comes the unfortunate side effect of injuries.

Today we will take a bit of a break from the NFL Draft and will look into creating a cool-looking Pareto Chart of the current list of NFL injuries, including some key prospects.  At least I think it is cool-looking.  Don't judge.

In order to map injuries, the first step is to obtain data on who is injured.  This was done from RotoWorld (thanks guys!) and compiled in Excel.  It looks like this.

Exciting, right?  and that is just 20% of it!

Exciting, right?  and that is just 20% of it!

Once data was collected, the next step to making the pareto is to map the various body parts listed in the injury report onto an image.  This is where the Tableau magic happens. 

The next step is to create a new tab in Excel that you can use to map the body parts.  Once you get the size of the image you're using (in my case it is 289x525) into the table (this is done to force Tableau into the correct data conventions, you don't need it otherwise).

Now, we can open Tableau and load in the data set.  The tables created will need to be joined, so do that now.

The next step is to load our player image in as a map.  First, make sure your X and Y columns from the mapping table come in as measures.

Capture.PNG

Then, you will follow the path shown below, and fill in the screen with your corresponding picture size (again, mine is 289x525, but yours will likely differ).

Capture.PNG

Once this is all done, click OK.  Then, right click and drag X to columns and Y to rows as unique values.  This should load your picture onto the screen.

Now, you will need to plot your data points.  For every point to be mapped (in my case, for all body parts with an injury), right click on the area and select Annotate-->Point. 

Example: point annotation for abdomen injuries

Example: point annotation for abdomen injuries

Copy the corresponding X and Y values into your table.  Repeat until all points are plotted, then save your data file.  It should now look comparable to this.

Now we're on the home stretch!  Add the item being counted in the Pareto to the Size Card, adjust coloring and shapes as you desire, and you're all set.

Capture.PNG

Hmm, our football player appears to have lost an arm.  Let's fix that.  Go back to the Map-->Background Images menu and click edit the background image you created earlier.  On the second tab, there is a check box that forces the entire image to be shown.  Select that to correct the issue.

Voila!  The entire Pareto Chart is now shown.  The only steps remaining are to adjust the title and Tooltip, add a filter by position, and hide the axes.  Let's have a look at the finished product.

I'll be back next week with another Tableau nugget!

Author: Chris Bick

Baby It's Cold Outside: Do Warm-Weather School Players Handle the Cold Worse than Cold-Weather School Players?

As those of us that have lived in the North are well aware, it gets pretty damn cold out in the winter time.  Every person who grew up in a state with regular snowfall knows that winter weather can have a dramatic effect on their life and careers:

  • Commuting time increases
  • Illnesses
  • Walking to school in the snow, to and from, uphill both ways

Often when these things are discussed in the north, they are accompanied with a smugness about the abilities of folks down south to handle it.  And of course, this stereotype has extended into the realm of football, via the old "southerners can't handle the cold!" argument.

Today I am going to investigate if this is indeed true by looking into the effectiveness of a single position relative to expected performance in 2016: Running Backs.  Take notes, AFC/NFC North teams!

To begin, I pulled data game-by-game for all running backs in games that had temperatures below freezing.  Once this was complete, the running backs' schools were added, along with a code for whether or not it was a "cold weather" school (this was a judgment call).

Once this was completed it resulted in 76 data points.  Even though in a perfect world we would acquire more data, this was deemed enough to proceed with.  The data was loaded into Tableau, at which point I started playing around with graphs.

The first data points analyzed were yards per carry, divided by warm and cold weather schools.

As is evident from the box plots, there did not appear to be a difference between cold weather schools (Yes) and warm weather schools (No).  To confirm this, I analyzed the data two additional ways.  The first was total yards per game.

In this case, there appears to be a slight advantage for the Northern schools, but this is largely due to Le'Veon Bell being incredible (see the points to the far right).

What about fatigue?  Do Southern players have fewer carries in cold weather?

Here we see a similar pattern to what appeared in the total yards, as would be expected.  Slight advantage to the North in terms of skew, but nothing statistically significant.

As a northerner at this point, I was getting desperate!  The last hail mary test dealt with touchdowns.  Do Northerners score more touchdowns per carry?  Come on, pull through Big Ten!

To test this, I created a new calculated field.  See below for the complex formulation if you'd like to try this at home.

Then, I created the graph by grouping by RB (doing this per game would mostly yield zeros).

Once this was created, I made a final box plot and crossed my fingers.

Northerners, it appears we must admit that our feelings of football superiority in cold weather are in our heads and nowhere else.  In 2016, anyway.  For running backs, anyway.

Back next week with another draft-related analysis!

Author: Chris Bick

 

There's No Place Like Home: A Study of NFL Draft Location Bias

The NFL Draft is coming up next month, and with it come a wide variety of questions from fans:

  • Will my team get the long snapper it so desperately needs?
  • Do I have enough beer in the fridge to make the NFL Draft interesting to watch?
  • How can I pretend that the glazed look in my friends' eyes is genuine interest when I tell them about my Fantasy Football draft?

However, for the truest of NFL fans, other more in-depth questions often arise.  Over the next several weeks we are going to dive into some more complex subjects. 

Today's topic deals with NFL Draft Location Bias.  In other words, do teams prefer to draft players close to or far away from home?

To begin answering this question, a data set was required that could show what teams players were drafted from, as well as where the various college and pro teams were.  To do this, Google Maps was utilized to generate longitudes and latitudes for all pro and 1-A college teams, and draft data from the past ten years (2007-2016) was acquired from the web.  The resulting data sets looked like this:

Once the raw data was acquired, a master distance table was calculated for every NFL Team from every college team.  This was important so that expected distances for each draft could be calculated.  Luckily, a very smart guy named Josef de Mendoza y Rios that created a very cool table to allow for these very calculations.  Even better, in the late '80s there was a woman named Jenny Excel who created a software package that now makes using Joe's table a breeze(Jenny may be made up, but the software does indeed exist!)

After compiling the distance table an expected distance table was created.  This was done by summing up distances from the teams to each player drafted and dividing by the total number of players.  This process was repeated for each year's draft. 

After completing the expected distance table the actual draft distances for each team by year were calculated and the data set was loaded into Tableau. 

To utilize the data, the graph below was created to show the average deviation from expected distance by team over the ten year period.

The data seemed to be fairly random, but one team in particular stuck out above all the rest: the Cleveland Browns, whose deviation was almost double that of any other NFL Team (~320 miles farther than expected, the next highest team was the New York Jets at ~175 miles closer than expected).

As an Ohio native and Cleveland Browns fan, this data was quite startling.  After digging a little deeper, I noticed that several Browns players were drafted from Hawai'i.  Could those outliers be the cause of the trend?  To find out, I removed the Hawai'i players from the data set, re-calculated the expected distances, and re-created the chart.

The removal of Hawai'i helps the Browns come closer to the mean, but they are clearly still an outlier.  Why?  To dig deeper, I created a map of where the Browns players were drafted from (without Hawai'i).

Per the map above, the Browns drafted very few Midwest players and many from the South and West (not quite neighbors of Cleveland...)

Contrast this with the other end of the spectrum, the New York Jets.

The Jets have a large quantity of their draftees coming from the Midwest, the Mason-Dixon line region, and Mid-Atlantic region, all fairly close to New York City.  Not to mention zero players from Florida, Washington, or Oregon, and only a few from California.

Now let's take a look at a team that is exceptionally close to its expected distance (~14 miles farther than expected), the Tennessee Titans.

As you can see, the Titans have a smattering of players all across the US, and do not seem to favor any specific region.

To put a bow on this discussion, let's take a look at some statistics.  The graph below shows the 95% confidence interval for the median of the population.

As we can see, many teams fall outside of the confidence interval, with the Browns and Jets leading the way.  But is this a statistically significant relationship between Teams and distance?  To find out, I plotted each individual data point and looked at confidence intervals for the median values, which the Tableau Analytics pane makes very easy!

As you can see, the majority of the teams fall within the confidence bands of each other, meaning there is not a statistically significant difference in the averages (this is in effect an ANOVA statistical test, just done at a remedial level).  However, the Cleveland Browns do fall outside of the typical range.  But when we look closely, there are quite a few outliers.  Might a median test work better?  Let's try!

The median comparison tells a bit of a different story.  The Browns have come back to the pack (although still on the high end), while other teams have moved up substantially due to the fact that most players are not located near them (four of the top five medians are West Coast teams, the fifth of which is the Browns).  In fact, when we look at the five West Coast teams (the Rams were not included since a large % of their data set comes from their days in St. Louis) we see that all five median values are above the expected distance (0).

Capture.PNG

So where does this all leave us?  Without using some more rigorous statistical tests not available in Tableau (Mood's Median Comparison, ANOVA) it is difficult to quantify exact relationships, but our Tableau analysis can lead us to a few definitive answers:

  • 30 of 32 NFL Teams' average deviation from expected distance can't be statistically shown to be greater than or less than zero with 95% confidence (Browns and Ravens being the exceptions).
  • 17 of 32 NFL Teams' median deviation from expected distance can't be statistically shown to be greater than or less than zero with 95% confidence.  However, a noticeable geographic effect is evident, as all West Coast Teams were skewed to one side, and only 8 teams had a median above 0, 5 of which were West Coast-based.

To wrap it all up, there are three major conclusions to take from this data.

  • There does not appear to be a correlation for most NFL teams between average expected distance and actual distance from their draft pick's schools to the NFL teams themselves.
  • There may be a relationship for median distances, but the lack of players in the middle of the country creates bimodal distributions, adversely affecting West Coast teams.
  • Never underestimate the ability of the Cleveland Browns to do their own thing, and to do it stupidly.

That's all for now, we'll be back with another NFL Draft question next week!

Author: Chris Bick

Stop the Pop-Ups! Please!

A common concern with linking dashboards together in Tableau Server is that new web browser tabs are created for every new dashboard that is opened.  Well...for those of you that wish to stop this behavior, read on!  If you like the behavior, please enjoy this video of puppies and kittens instead of the rest of the post.  Pretty cute, right? 

1.  Open the URL action for the associated link.

2.  Go to Dashboard-->Actions--> and open the URL Action used to take the user from their current dashboard to the destination dashboard. 

3.  At the end of the URL, add in the following:

?:embed=y&:linktarget=_self

The link should look like the screen shot below:

Blog Pic.png

4.  Publish the updated dashboard.  Note that if anyone was using a link to access the dashboard it needs to be updated to include the addition shown above.

Happy Tableau-ing!  And yes, that is now a verb. 

Author: Chris Bick

Utilizing REGEX in Tableau

 

During the Tableau Conference in Las Vegas a few weeks ago, my fellow coworkers and I have learned some exciting tips and tricks in Tableau and how to answer some common issues we have been trying to solve. While all of the seminars were useful and handy, one particular topic stood out to me as lifesaver for those twitter feed-analyzing enthusiasts. 

ISSUE

You have a field that gives a string of text such as a url or email, and you want to retrieve part of that string to analyze/visualize. For example, you would like to retrieve names from email addresses. 

ANSWER

Regex Expressions! These special text strings work phenomenally towards matching, replacing, and extracting particular phrases from a string. In this article I will  give a basic introduction to regex expressions and an example utilizing the regex functions available in Tableau.

INTRO TO REGEX: THE ALL-POWERFUL PARSING TOOL

To begin, lets start with a simple example. Let us say we are looking at, say, data concerning shootings that have happened in America from 2013 to 2015. There are very interesting fields in this data set, particularly a list of articles that have reported these shootings across America:

The problem is, we can't really use this data because these links are very complicated and really have no use to us in their current form. If we could extract, however, the domain name of the actual website and utilize that for our visualizations, we'd be in business! This is where a regex expression comes in handy. 

To utilize regex, one needs to create a calculated field. There are three functions you can choose from to utilize regular expressions. These functions are shown below:

 

  • REGEXP_EXTRACT: This function lets one extract a particular pattern from a string variable.
  • REGEXP_MATCH: This function is used for finding a repeated pattern in a string and returning a boolean.
  • REGEXP_REPLACE: Replace a pattern with a set of characters. 

 

Before I go through retrieving the website name from the links within this field, let me go over some of the basic syntax of regex. Let us say we have the word cat that we want to extract from a string field.

  • (cat) will extract the word cat and drop everything else in the string
  • ^cat  will extract only cat if its at the beginning of the string
  • cat$ will extract cat only if its at the end of the string
  • ^cat$ will only extract cat if it is alone in a string
  • [cat] will only extract the specific characters c,a,t from a string
  • [c-t] extracts any characters from c to t
  • [^c-t] will extract any character except those between c to t
  • [cat]+ will give back one or more characters that are c, a, or t
  • (cat|dog) will extract the word cat OR the word dog
  • (cat){2,4} extract cat when it is repeated 2 to 4 times
  • c*t extracts any pattern that starts with c and ends with t 
  • ca?t extracts cat but will also extract ct
  • ca\st will extract ca t
  • cat\d{3} will extract cat012, cat111, cat356, or any pattern with cat and 3 digits afterwords.
  • cat\w{3} is the same as above but with character values instead of numbers.
  • cat. will match any character one time after cat. (i.e. catt, cata, cats,..)

NOW BACK TO OUR EXAMPLE

What we want from the urls is the actual website that posts these articles. I am interested in seeing with website posts the most about the shootings in America. What website is most interested in enlightening the public about the immense issues of gun control in our country?

The expression we want to use is the following:

 

 

Within the function REGEXP_EXTRACT, I give the function the field (in this case Article1), and the expression 'http://(\w.*.com)/'. I will break this down step by step.

  • We first need to express we want to leave out the beginning of the url. We type this out WITHOUT any parentheses. This tells regex to drop this part of the string.
  • We want basically what is in-between http:// and the next forward slash. We are not sure what and how many characters we need to retrieve, but we do know the next delimiter and what is at the end of the expression (/ is the next delimiter). Thus we utilize \w (referring to an arbitrary character) and then use the expression .* afterwards, meaning any character 0 or more times. 
  • Putting .com at the end of the expression in parentheses tells regex oh hey, we want what ever is in the middle of the first value after http:// and ends with .com. 
  • Placing a / outside the parentheses lets the function know to drop the forward slash afterwards..just in case. 

AND Voila! We have our websites:

Lets make a dashboard out of this, just for funsies:

Click on a particular location and you will be able to see the number of victims from the shootings over time and also the top websites reporting shootings in that area. One interesting observation  is summer seems like a popular time for shootings to occur. Why would that be? 

Anyways, check back to our blog in December for another visually enlightening post. Hope everyone has a great weekend!

AUTHOR AND VISUALIZATIONS BY ERIN MIDDLEMAS

Navigation Buttons

ISSUE

Sometimes you want to be able to click somewhere within a dashboard in order to get to a specific dashboard. This comes in handy if your workbook has several dashboards, making it hard to navigate through the top tabs for the particular dashboard you want to view next.

ANSWER

Navigation buttons on your dashboards is your solution. They are quick, easy to use, and make navigation much simpler in comparison to utilizing tabs. This article will give you the low down on how to create these handy mechanisms.

 

OVERVIEW: Create a button that goes back and forth between the dashboards               

 

Step 1:

Create a New Sheet and calculated field that contains some constant value. I used the constant 1 (See Fig. 1).

FIg. 1: Create a Calculated Field.

 

Step 2:

Throw the calculated field on the Columns Shelf and change the field to a continuous dimension (See Fig. 2).

Fig. 2: Make the calculated field a continuous dimension.

 

Step 3:

Change the mark type to shapes and choose an arrow. To get to the arrows selection, click shapes >> more shapes >> Select Shape Palette. You can edit the size of the arrow to your liking.

 

Step 4:

Now we are going to work with the position of the button on the sheet. Right click on axis and hit edit. Change the axis range to a fixed range (See Fig. 3). Right click on the axis and uncheck show header.

Fig. 3: Change the Range to Fixed.

 

Step 5:

We now bring the calculated field to label. Click on the icon, hit the ellipses, and delete what is there for the label. Write anything you wish to direct your users to the next dashboard.

 

Step 6:

This step is for actually creating the navigation feature. Bring the button that we have created to your dashboard of interest (I usually like to format the navigation button to be at the top right of the sheet, but it’s your prerogative). Go to dashboard actions >> Add Actions >> Filter. Check only the navigation button sheet as the source sheet and change the target sheet to the dashboard you want to go to (See Fig. 4).

Step 7:

Under Target Fields select Selected Fields but don’t add in a field.

Step 8:

That’s it! It’s that easy! Test it out. Yours should do something similar to mine below:

AUTHOR AND VISUALIZATIONS BY ErIN Middlemas

Custom Dynamic Date Views

Custom Dynamic Date Views

Issue

What do you do if you always have to show the same number of date records, but I need to be able to change the date level: day, week, month, quarter, and year? How do I accomplish this in Tableau? For example: I want to be able to dynamically change between 24 months and 24 years in my visualization.

Answer

This can be fairly easily accomplished with a mixture of calculated fields, parameters, and filters.

Cigarette Use Percentage Per State

Which state has the worst smoking habit? Now we have the Viz!

View, interact, and download this visualization here.

This visualization displays the smoking use percentage per state, displayed as a label and on a tool-tip. Dark blue indicates states with the best (lowest) smoking percentage with dark orange indicating the worst (highest) smoking percentage.

The data is from a 2012 CDC report.

Visualization and post by Alex Christensen.