UPDATE: I’ve added in the play-in teams, so the graphs should have (and reflect) all 64 teams in the tournament. Also, I corrected Duke’s opponent averages, since the NCAA misreported it (see Data section).
UPDATE 2: I have written a post on how I came up with this visualization, which has all of the code. That post may be found here.
With the NCAA basketball tournament around the corner, I decided to take a look at how the teams in the tournament performed over the course of the 2012-2013 season, across several statistics. I haven’t had too much time to interpret the data yet, so I’ll hold off on offering any insights and instead just give you a wonderful potpourri of visuals to explore. A lengthier post will follow on how I scraped, calculated, and ultimately plotted a rather large set of data. As usual, the visualization follows the jump.
Update: I have taken this visualization offline to reduce the load on my server. You may access the 2014 version here.
Data and Source Code
All data were collected by me by scraping the NCAA website. The resulting data file(s) may be obtained here. The code for that scraper may be obtained here. All analysis was done in R, using Hadley Wickham’s excellent ggplot2 package.
Update: For the 2012-2013 data set, the NCAA website is misreporting the opponent averages (e.g., opp_team_rebavg in summary_team_data.tsv) for these 6 schools: Cal St. Northridge, Duke, Ill.-Chicago, Northern Ariz., Quinnipiac, and South Dakota. Be sure to correct those statistics if you do anything with the data. I have not checked the data for the other years, so be sure to check the min/max values for all variables before you do any analysis with them.
Do note: The tournament team averages were calculated before the play-in games had concluded, so it does not include those teams. Only teams that were slotted into the bracket on Selection Sunday are included.
Update: I’ve added in the play-in teams, so the data includes all 64 teams in the tournament.