And the most loyal fans in the NBA are…

NBA basketball is the one the sports I enjoy watching the most. As I was ordering my (undisclosed amount)th beer while watching a game during after-work hours, it occurred to me how often I had seen sparsely populated arenas during games, with large areas of seats going unoccupied. This got me to thinking about the average fan attendance for NBA teams, what could be the factors influencing attendance, and ultimately, which NBA team had the most loyal fans?

After some online browsing, Python scraping and data cleansing, I was able to obtain a good amount of data from the awesome guys at basketball-reference.com. Unfortunately, I could not find any records of fan attendance beyond 1981, so this analysis will be restricted to the period between 1981 to 2013 (with records for 2002-2006 also missing). First, I wanted to see if there were any trends in NBA fan attendance per season.

fig_League_attendance_by_year

Fan attendance for each NBA teams during the seasons 1981 to 2013. Years marked with a red asterik represent shortened seasons due to a lockout. Data for the year 2002 to 2006 was not available

The two most striking features of the plot above are the obvious increase in fan attendance from 1981 to 1995, and the subsequent stagnation thereafter. This makes sense, since this period is widely regarded as the golden era and renaissance of basketball, full of rivalries and Hall of Fame players in their prime. Unsurprisingly, the year 1999 and 2012, which were both shortened by ~4 months due to a lockout, saw a drop in total number of fan attendance (purely as a result of lesser games being played – if I were more rigorous, I would normalize for this and also the overall US population, but I wanted to visualize the raw numbers).

Next, I investigated whether team success (the net number of wins per season) during a season could be an indicator of fan attendance. Not surprisingly, teams that won more also attracted more fans (doh!). This was true regardless of the conference in which the team was (East or West).

fig_attendance_team_success

Fan attendance as a function of number of wins for all NBA teams during the period of 1981-2013

I also looked at whether fans were more attracted by teams that scored a lot, or by teams that put an emphasis on defense. However, I had to consider historical trends in scoring, and adjust for the fact that defenses/offenses have gotten more sophisticated over time. Therefore, I decided to look at the fan attendance numbers of each NBA team during a given season, and plot that as a function team’s deviation from the median number of points scored by all teams during that season. The plot below shows the aggregate of all points after considering each individual season between 1981 and 2013. Interestingly, although teams that score more attract more fans, it seems that good defense is even more likely to attract crowds.

fig_correlation_points_attendance

Fan attendance as a function of the number of points scored for and against the home team. To adjust for the variability in offensive/defensive points scored at each season, the attendance numbers are plotted against the home team’s deviation from the season average.

Of course, the caveat of the above plot is that teams that score a lot and/or defend well are more likely to win, and thus attract more fans. Indeed, winning teams usually develop bandwagon fans and thus inflate their attendance numbers. Therefore, I sought to find out who were the most loyal fans in the NBA. In my mind, the mark of a truly loyal fanbase is one that shows up to support its team regardless of win/loss ratio. For these reasons, I plotted the fan attendance of each NBA team normalized per number of wins.

fig_attendance_per_win

And so the most loyal fanbase are the good people of Memphis, Minnesota and Toronto!

I will add all the relevant code to my github account soon (basically as soon as I’ve commented it!)

President Approval Ratings from Roosevelt to Obama

I have been watching the awesome Netflix show “House of Cards” and been fascinated by the devious schemes that Underwood is constantly plotting. The show often mentions approval ratings and it got me to wondering what Obama’s ratings currently were, and all other past US president  for that matter. However, I didn’t have much chance finding publicly available data that was a) easily accessible and b) free. (granted – I was quite lazy in my search).

Ultimately, I resorted to scraping the Roper Center website for the data that I needed. Below is the distribution of approval ratings for each president from Roosevelt (when records began) to Obama.

Image

JFK, Bush-Sr and Eisenhower rank as the top three presidents which the highest approval rate during their tenure in the presidential office. However, the variation in ratings for Bush-Sr was considerably larger. Similarly, Truman and Bush-Jr has large variance in their ratings, but were also the two most unpopular presidents. As we can also see, Obama does not rank very high amongst presidential approval ratings, with only four other presidents with lower ratings (although it should be noted that Obama still has three remaining years to bump up his average).

Below is the breakdown of approval ratings for each individual president. Note some of the sharp peaks that we see for some presidents, like the spike in approval ratings for Bush-Jr after the 9/11 tragedy; or the drop in ratings for Bush-Sr and Nixon after the start of the Iraq war and Watergate, respectively.

generate_image

Which states are the most concerned by gun crime?

I recently discovered the Capitol Words API and have had some fun playing around with it. One of the categories in the API allows you to search for the words spoken by the senators of each state in the USA, and I was interested in finding out the number of times the words “gun” were recorded on a state bill between January 2012 and December 2013.

gun_reference_US_map

As we can see, the most densely populated states of New York, California, Illinois and, to a lesser extent, Texas, mention the word “gun” the most often. It is in interesting (but not surprising) to note that the more Republican and pro-gun Midwestern states are conspicuously quiet about mentioning guns. We can also track the monthly occurence at which the word “gun” was mentioned in state bills between January 2012 and December 2013:

gun_reference

The sharp peak we observe across many states on April 2013 illustrates the national response and outrage that followed the tragic Boston marathon bombing and subsequent shootings. We can also see that the state of California shows some peaks in February, June and November 2013, which can be associated to the Christopher Dorner shooting, the June 7 Santa Monica shooting and the November 1 LAX shooting.

Finally, we can explore the underlying relationship between references to “education” in state bills and that of “gun” and “shooting”. Again, the obvious outliers are Connecticut, California and Illinois, which all refer to education an unordinary amount of times. Interestingly, if these three outliers were removed, we could argue that a decent linear fit (with positive coefficient) could be achieved between the number of times the word “education” is stated in a bill and that of “gun” and “shooting”. In that case, we could interpret this as education being mentioned as a result of gun crime and shooting (a causal analysis will be in order for future work, namely finding the average lag time between shooting events and the reaction of statesmen).

Education_shooting

Relationship between the number of times the words “shooting” and “education” were mentioned in state bills between January 2012 and December 2013

Education_gun

Relationship between the number of times the words “gun” and “education” were mentioned in state bills between January 2012 and December 2013

New York crime rates

While browsing through different sites, I randomly cam across the ominous-sounding disaster center website. There is a fair amount of data that could be analyzed there, but my attention was caught by an entry stating that they had just updated the “1965 to 2012 State Crime Pages”. From there, I chose the completely biased option of analyzing crime rates in my hometown (NYC) from 1965 – 2012.

 

Number of crimes per 100,000 habitants in NYC during 1965 to 2012.

Number of crimes per 100,000 habitants in NYC during 1965 to 2012.

You will notice that I have also added the periods during which various NYC mayors were in office, where I color-coded each period by the party ideologies of each mayor. It’s  already a well-known anecdote, but it is remarkable to see the drop-off in crime rate after Giuliani took office, and how Bloomberg was able to maintain that.

However, it does make us consider the slightly deeper question of whether the crime rate in NYC has converged to a minimum, and whether human nature would ever be capable of reducing that to zero? There’s an interesting prediction problem…

Mapping the taste profile of Scottish whishkeys

Recently, I came across this interesting blog post http://blog.revolutionanalytics.com/2013/12/k-means-clustering-86-single-malt-scotch-whiskies.html by the Revolutions blog poster Luba Gloukhov. This post initially caught my attention because of the originality of the dataset: 86 scottish whiskeys marked on a scale of 0-4 in 12 different taste profile (source data is here). Now I know what I like, and I like my whiskey, so I liked what I saw.

For these reasons, I set out to analyze the data a little bit more. Since Luba had already addressed most of what could be done in terms of clustering analysis, I restricted myself to mostly visualization of the data. First, I simply set out to plot a straightforward infographic of the data (the full version with all 86 whiskeys is available here Whiskey_taste_profile_infographic):

radar_chart_snapshot

I then geotagged each distillery to the Scottish territory and color coded them according to the marks given for each taste profile. The most distinctive pattern we see is the clear separation between the land-based whiskeys and those from the Isles of Argyll.

Geo-tagging of scottish whiskeys in 12 different taste profiles

Geo-tagging of scottish whiskeys in 12 different taste profiles

We can then look at how geographical position (i.e. longitude and latitude) correlates to each taste profile.

Correlation between geographical position (longitude and latitude) of whiskeys and their taste profile score.

Correlation between geographical position (longitude and latitude) of whiskeys and their taste profile score.

Better yet, we can look at similarities between different whiskeys and how these are affected by geographical location.

Correlation matrix of Scottish whiskeys constructed on the basis on their similarities in 12 different taste profiles

Correlation matrix of Scottish whiskeys constructed on the basis on their similarities in 12 different taste profiles