Jul 07, 2008

Divided nation

Professor Gelman generally believes the red state, blue state paradigm is too simplistic to describe the American electorate.  He has been sharing some of his work on his blog, and has just published a book about this topic.  Recently he produced the following chart, which is gimmick-looking but crystal clear in its message.

Gelman_redblue

Here, economic and social ideology are plotted on a scatter chart, with positive values indicating conservatism and negative values liberalism.  Further, each state is represented twice on the chart, the red point for the Republicans and the blue for Democrats within the state.

This is a cluster analyst's dream data set.  The absolute separation of the Republican cluster and the Democrat cluster is astounding: imagine a diagonal line perfectly classifying all points.

We should not miss a host of details:

  • as Andrew pointed out, "the big thing we see from the graph ... is that Democrats are much more liberal than Republicans on the economic dimension: Democrats in the most conservative states are still much more liberal than Republicans in even the most liberal states."  This is clear from the wide gap on the horizontal axis.
  • there is a small degree of overlap on the social ideology axis so the nation is closer together on that front.
  • but wait a minute, the scale on the social axis is not the same as that on the economic axis.  This means that the extremes are more extreme on the social axis: the difference between MS and VT is roughly 0.8 on the social scale while the largest difference on the economic scale is roughly 0.5.  (here, I am assuming that the scales are comparable to each other)
  • there is high correlation between social and economic ideologies: the points are well-aligned along the 45-degree line
  • especially on social issues, the Democrats are divided within (the elongated shape of the blue cluster).

Reference: Gelman, "Ranking states by conservatism/liberalism of their voters", June 30 2008.

Jul 05, 2008

It's raining colors here too

Via the Data Mining blog, I came across ChoiceRanker, which is some kind of straw polling site that uses the following visualization of the data.

Choiceranker

This particular chart is related to the question of who should be Obama's VP.  I can't say I understand what's going on here.  If you can figure it out, let us know.


Jun 30, 2008

A splitting headache

Fry_baseballsalaryTodd B didn't like this chart showing the correlation between baseball team salaries and their win-loss records.

A few problems are in plain sight:

  • Most importantly, putting a second set of logos next to the salaries column would really help
  • Unclear why the lines should be of varying widths
  • Winning percentage is more telling than win-loss, especially in the middle of a season when there is a  slight imbalance in total games played
  • the spread of salaries is so wide (10 times) that reducing the numerical scale to rank scale meant a big loss of information
  • Each column is sorted by its own metric while the most important sorting variable should be the slope of the lines (i.e. the cost per win)


The interactive feature of individual plots for each day (control bar at the top) of the baseball season is something of a gimmick.  Props though for realizing that the first few days of the season don't tell us anything.  There really is little use for investigating this correlation on a day-by-day basis.  Particularly when the salaries are given in aggregate.

On the diagram, the blue lines represent teams such as the Devil Rays and Arizona that had better winning records than their salaries would suggest.  Red lines display those teams spending more money than their records would suggest.  The steeper the line, the best/worst the team's cost efficiency.

With so many long steep lines in both colors (directions), one might posit that a negative correlation may exist between salary level and winning record. 

The following scatter plot suggests otherwise:

Redo_baseballsalary The correlation between salary and winning is very weak.  If one were to fit a linear model, it would show that the higher-salaried teams generally were doing slightly better (black line).  The Yankees were sufficiently outside the range in salaries that I didn't include them in estimating the line.  (However, as the chart shows, the line in fact estimated the Yankees winnning percentage really well.)

Teams above the line are performing better than their salaries would lead us to believe. 



Reference: Ben Fry's baseball salary page

Jun 26, 2008

Light enertainment: raining colors

Nick B sent in this example.

Ottawa_calendar

Note especially the need to differentiate Good Friday and Easter Monday.  And the broad-stroke beige background used to set off chunks of weeks.  Wouldn't a list work better?

Mentions


  • My Amazon.com Wish List

  • Yahoo! Picks

Search Junk Charts


  • Custom Search

Residues

July 2008

Sun Mon Tue Wed Thu Fri Sat
    1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30 31