Visualizing Season Performance by NCAA Tournament Teams

Share me: Tweet about this on TwitterShare on FacebookShare on Google+Email this to someone

UPDATE: I’ve added in the play-in teams, so the graphs should have (and reflect) all 64 teams in the tournament. Also, I corrected Duke’s opponent averages, since the NCAA misreported it (see Data section).
UPDATE 2: I have written a post on how I came up with this visualization, which has all of the code. That post may be found here.

With the NCAA basketball tournament around the corner, I decided to take a look at how the teams in the tournament performed over the course of the 2012-2013 season, across several statistics. I haven’t had too much time to interpret the data yet, so I’ll hold off on offering any insights and instead just give you a wonderful potpourri of visuals to explore. A lengthier post will follow on how I scraped, calculated, and ultimately plotted a rather large set of data. As usual, the visualization follows the jump.



Update: I have taken this visualization offline to reduce the load on my server. You may access the 2014 version here.



Data and Source Code

All data were collected by me by scraping the NCAA website. The resulting data file(s) may be obtained here. The code for that scraper may be obtained here. All analysis was done in R, using Hadley Wickham’s excellent ggplot2 package.

Readers may also be interested in data from 2011-2012 and 2010-2011.

Update: For the 2012-2013 data set, the NCAA website is misreporting the opponent averages (e.g., opp_team_rebavg in summary_team_data.tsv) for these 6 schools: Cal St. Northridge, Duke, Ill.-Chicago, Northern Ariz., Quinnipiac, and South Dakota. Be sure to correct those statistics if you do anything with the data. I have not checked the data for the other years, so be sure to check the min/max values for all variables before you do any analysis with them.



Do note: The tournament team averages were calculated before the play-in games had concluded, so it does not include those teams. Only teams that were slotted into the bracket on Selection Sunday are included.
Update: I’ve added in the play-in teams, so the data includes all 64 teams in the tournament.

Share me: Tweet about this on TwitterShare on FacebookShare on Google+Email this to someone
Bookmark the permalink.

19 Responses to Visualizing Season Performance by NCAA Tournament Teams

  1. Igor Sosa says:

    Very interesting! Could you post the technical data (R, javascript…) for create a post like that?

  2. Igor Sosa says:

    thanks! It could be very interesting!

  3. whitemacboy says:

    Hi, nice job, are you build this in rapache or shiny ?

    • Rodrigo Zamith says:

      Hey whitemacboy,

      No, I’m afraid it’s just a few thousand static images (batch-generated with R), with JavaScript (jQuery) used to switch between them. I’m actually not too familiar with either Rapache or Shiny, although they both look very cool. The one issue I instantly foresee, however, is that I’m on shared hosting and thus lack the ability to install any server-side modules. But something to keep in mind and bring up in my local R User Group.

  4. ram says:

    hey rodrigo,

    also be sure to write about how you created the post itself, with the dropdowns. I am guessing you used D3. A post on that would be great

    • Rodrigo Zamith says:

      Hey ram,

      I’m hoping to find time to get a post up either tonight or tomorrow night. I was hoping to optimize the code first, but my next two weeks are absolutely packed. Nonetheless, I’ll try to go into a decent amount of detail with what I have. Also, I didn’t use D3 for this project. I’m just using jQuery to check for changes in the select boxes and use the information from each of the select boxes to put together a filename; once that’s done, it tells the browser to refresh the image using the new filename. Anyway, I’ll try to go into more detail with the post, so stay tuned!

  5. Josh Browning says:

    Awesome plots! How did you create the online GUI? Shiny?

    • Rodrigo Zamith says:

      Hey Josh,

      I’m afraid I haven’t had the opportunity to experiment with Shiny (nor do I have the permissions necessary to install the module on the server hosting this site). I basically just generated a few thousand images using R and ggplot2, and then used some JavaScript to switch between those images. I’ll try to put together a more detailed post about the process either tonight or tomorrow night.

  6. Pingback: Using data to analyze the NCAA Sweet 16 – The Storage Effect

  7. Pingback: Going Under the Hood of the NCAA Tournament Visualization - Rodrigo Zamith

  8. christian says:

    I used to run a tourney pool by hand and use the the newspaper as my data source. How far we’ve come!

    Thanks for compiling and posting. Maybe I can use this to get ahead for the 2014 bracket.

    • Rodrigo Zamith says:

      Hey Christian,

      Ha! My pleasure, and I’ll try to replicate this next season since the groundwork is already there. That said, my first bracket relied on nothing more than a bunch of statistics found in my newspaper’s special section on March Madness. Funny enough, I’ve never done better than I did that year. :)

  9. I like the graphics, look very good and comprehensive. Why do you constrain the ylim to be from (1, max)? It seems as though some of these are truncated at the lower bound.

    Also – may allow you to do this if you sign up for the beta server and host it (if the data is not too big or can be pulled over internet).

    • Rodrigo Zamith says:

      Hey John,

      Thanks for the note. I just wanted consistent Y axes and it wasn’t until after I had generated the charts that I noticed the truncation on a few of the variables. I then meant to fix them, but simply forgot.

      Thanks for the tip on Glimmer. My biggest fear is that they would pull that access after they wrapped up their beta testing (unsubstantiated, to be clear), but it seems to be great for testing.

  10. Salil says:

    This may just be my lacking familiarity with dropbox, but do the dropbox data links send others to a dropbox homepage instead of a data file? Anyone know what I’m doing wrong? Probably just going to stick to the scrapper for now.

    • Rodrigo Zamith says:

      Hey Salil,

      Thanks for noticing that. Dropbox apparently changed the links around. I’ve updated them, but if you’re looking for the data for this season, you should check out the newest blog post.

Leave a Reply

Your email address will not be published. Required fields are marked *

Time limit is exhausted. Please reload the CAPTCHA.