President Approval Ratings from Roosevelt to Obama

I have been watching the awesome Netflix show “House of Cards” and been fascinated by the devious schemes that Underwood is constantly plotting. The show often mentions approval ratings and it got me to wondering what Obama’s ratings currently were, and all other past US president  for that matter. However, I didn’t have much chance finding publicly available data that was a) easily accessible and b) free. (granted – I was quite lazy in my search).

Ultimately, I resorted to scraping the Roper Center website for the data that I needed. Below is the distribution of approval ratings for each president from Roosevelt (when records began) to Obama.

Image

JFK, Bush-Sr and Eisenhower rank as the top three presidents which the highest approval rate during their tenure in the presidential office. However, the variation in ratings for Bush-Sr was considerably larger. Similarly, Truman and Bush-Jr has large variance in their ratings, but were also the two most unpopular presidents. As we can also see, Obama does not rank very high amongst presidential approval ratings, with only four other presidents with lower ratings (although it should be noted that Obama still has three remaining years to bump up his average).

Below is the breakdown of approval ratings for each individual president. Note some of the sharp peaks that we see for some presidents, like the spike in approval ratings for Bush-Jr after the 9/11 tragedy; or the drop in ratings for Bush-Sr and Nixon after the start of the Iraq war and Watergate, respectively.

generate_image

Advertisements

Which states are the most concerned by gun crime?

I recently discovered the Capitol Words API and have had some fun playing around with it. One of the categories in the API allows you to search for the words spoken by the senators of each state in the USA, and I was interested in finding out the number of times the words “gun” were recorded on a state bill between January 2012 and December 2013.

gun_reference_US_map

As we can see, the most densely populated states of New York, California, Illinois and, to a lesser extent, Texas, mention the word “gun” the most often. It is in interesting (but not surprising) to note that the more Republican and pro-gun Midwestern states are conspicuously quiet about mentioning guns. We can also track the monthly occurence at which the word “gun” was mentioned in state bills between January 2012 and December 2013:

gun_reference

The sharp peak we observe across many states on April 2013 illustrates the national response and outrage that followed the tragic Boston marathon bombing and subsequent shootings. We can also see that the state of California shows some peaks in February, June and November 2013, which can be associated to the Christopher Dorner shooting, the June 7 Santa Monica shooting and the November 1 LAX shooting.

Finally, we can explore the underlying relationship between references to “education” in state bills and that of “gun” and “shooting”. Again, the obvious outliers are Connecticut, California and Illinois, which all refer to education an unordinary amount of times. Interestingly, if these three outliers were removed, we could argue that a decent linear fit (with positive coefficient) could be achieved between the number of times the word “education” is stated in a bill and that of “gun” and “shooting”. In that case, we could interpret this as education being mentioned as a result of gun crime and shooting (a causal analysis will be in order for future work, namely finding the average lag time between shooting events and the reaction of statesmen).

Education_shooting

Relationship between the number of times the words “shooting” and “education” were mentioned in state bills between January 2012 and December 2013

Education_gun

Relationship between the number of times the words “gun” and “education” were mentioned in state bills between January 2012 and December 2013

Dynamic arrays in R, Python and Julia

Although I have a heavy background in statistics (and therefore am primarily an R user), I find that my overall knowledge in computer science is generally lacking. Therefore, I have recently delved deeper into learning about data structures, their associated ADT’s and how they can be implemented. At the same time, I am using this as an opportunity to play around with more unfamiliar languages like Julia, and to a lesser extent, Python.

In Part 1 of 2 of this series, I investigated some of the properties of dynamic arrays in R, Python and Julia. In particular, I was interested in exploring the relationship between the length of an array and its size in bytes, and how this was handled by different languages. For this purpose, I wrote extremely simple functions that recursively added an integer (for this purpose the number 1) to a vector (or one-dimensional array) and extracted its size in bytes at each step.

In Python:

from sys import getsizeof
  def size(n):
  data = []
  for i in range(n):
    data.append(1)
    print '%s,%s' % (len(data), getsizeof(data))

In Julia:

function size(n)
  data = Int64[];
  for i = 1:n
    push!(data, 1)
    @printf "%d,%d\n" i sizeof(data)
  end
end

In R:

'size' <- function(n)
{
  data <- c()
  for(i in 1:100)
  {
    data <- c(data, 1)
    print(sprintf('%s,%s', i, object.size(data)))
  }
}

A call to each of these functions using n=100 yields the following plot
amortization_vector

The results of this experiment are quite striking. Python starts with an empty array of size 72 bytes, which increases to 104 as soon as an element is appended to the array. Therefore, Python automatically adds (104-72) = 32 = 4 x 8 bytes. This means that Python extends the array so that it is capable of storing four more object references than are actually needed. By the 5th insert, we have added (136-72) = 64 = 8 x 8 bytes. This goes such that the growth pattern occurs at points 4, 8, 16, 25, 35, 46, 58, 72, 88. interestingly, you can observe that the position i at which the array is extended can be related to the points at which the array itself grows through the relation i = (number_of_bytes_added_at_i – 72) / 8.

For R, I found this link, which does a great job of explaining the mechanisms behind the memory management of R vectors.

Finally, it appears that the push!() function in Julia does not proceed to any kind of preemptive dynamic reallocation of memory. However, I observed that repeatedly calling push!() was fast (more on that later) but not constant time. Further investigation led me to the C source for appending to arrays in Julia, which suggests that Julia performs an occasional exponential reallocation of the buffer (although I am not sure so please correct if wrong!).

Next, I looked at the efficiency of each languages when dealing with dynamic arrays. IN particular, I was interested in how quickly each language could append values to an existing array.

append_vector_time

The results clearly show that Julia is far quicker than both Python and R. It should be noted that when defining a function in Julia, the first pass will actually compile and run. Therefore, subsequent calls are generally faster than the first (an important fact to consider during performance benchmarking). Overall, it seems that the statements made by the Julia community are true in this context, namely Julia is a lot faster than Python and R (will add a C++ benchmark eventually 🙂 )

Using APIs in Python: a quick example

Python has an extremely intuitive and straightforward way of dealing with APIs, and makes it simple for people like you or me to access and retrieve information from databases. Before I quickly describe how to use APIs in Python, maybe we should begin with: What is an API?

API (Application Programming Interface): An API is a software intermediary that makes it possible for application programs to interact with each other and share data. It’s often an implementation of REST that exposes a specific software functionality while protecting the rest of the application. (definition taken straight from Google itself). Here, it is important to also define the REST acronym (Representational State Transfer), which is a fancy way of describing a protocol for sending and receiving data (in JSON, XML and even text format) between a client and server.

A typical (and popular!) use of API revolves around the twitter API, which many people have used to predict outcomes, perform sentiment analysis, geomapping etc.. (for example: http://www.kazemjahanbakhsh.com/codes/election.html)

But with no further ado, I will show how to use APIs in Python with the awesome Capitol Words API (heavily inspired from the highly recommended CodeAcademy tutorial):

# import required libraries
import requests
import pprint

# set query parameters
query_params = { 'apikey': 'XXXXXXXXXXXXXX',
  'per_page': 100,
  'phrase': 'debt',
  'sort': 'count desc',
  'start_date': '2012-01-01',
  'end_date': '2013-12-31'
}

# define the endpoint database we wish to search
endpoint =  'http://capitolwords.org/api/phrases/state.json'

# extract data
response = requests.get(endpoint, params=query_params)
data = response.json()

# print data to file
out_file = open('debt_reference.txt', 'w')
for i in range(len(data['results'])):
  out_file.write('%s,%s\n' % (data['results'][i]['count'], data['results'][i]['state']))

out_file.close()

Here, we used the ‘phrases/states.json’ endpoint to find how many times the word debt was uttered by senators of each US state between the period of January 1 2012 and December 31 2013.

In order for the code above to work, you will need to obtain your own API key and insert in the code. You can obtain an API key by simply registering (for free) to the Capitol Words API: http://sunlightlabs.github.io/Capitol-Words/