Rec’ing on…The Ranking Project (1)
All tournament long, I kept hearing all of this yammering about how the process of selection and seeding teams for the NCAA tournament is screwed up, and this, and that, and yada yada.
And…? Where’s your solution, gripers? Oh, what’s that? You don’t have one? It’s up to other people to work it out?
Here at Ellipses, we try to provide at least suggestions for solutions whenever there’s a gripe (no guarantee that that’s what’s always going to happen, but the intent is there). With that in mind, and a curiosity as to what this year’s brackets might have looked like, I figure why not try figuring something out? (This isn’t going to all be done in this one post, but we have to start somewhere.)
OK. So, what are our parameters, and what’s our goal based on what we can observe from recent selection results?
- GOAL: We don’t simply choose the top 64 teams in the country. The bracket must accommodate automatic bids from winners of conference tournaments (or the conference champion if there is no tournament).
- GOAL: The four regional brackets must be seeded more-or-less equally and without bias.
- GOAL: Teams must be placed so that they cannot play on their home court except (possibly) for a Final Four
- Quality of wins must be considered.
- Quality of non-conference opponents must be considered separate from conference opponents.
- Quality of conference play must be considered
- Time of year games are played should be weighed.
- Codify the process into an algorithm that provides everyone with clear guidelines as to what will get them into the tournament.
- Make the algorithm applicable only for end-of-season rankings and tournament selection. While a variant of the process might be used for in-season rankings, that is not its purpose.
- Ask for patience when the evenhanded and unbiased results blows up in your face.
The surprise first step in all of this was establishing a database. You’d think that somewhere on the web there would be some sort of table or database (Excel, SQL, text, whatever) for all the games played in Women’s Division I for any season. If there is, I couldn’t find one. The only thing I found was a listing of results for all schools at the NCAASports.com site. I’ve spent a couple of evenings trying to massage that file into something usable. You see, it comes as a PDF file, which isn’t known as the friendliest of database models.
The first step was to generate a text version of this file. Fortunately, Acrobat provides an export to text function. Easy. Well, easy is a relative term. Instead of replicating the table format in the PDF, the text file is a list, one item per line, of every datum in more-or-less column order. I needed to filter through all of this.
Text processing is just the sort of thing the programming language Perl is great for. After writing and testing a nasty 70 lines of code, I finally got the data into a comma-delimited format. Now it was time to load it in Excel and save it so that I could bring the whole kit and caboodle into Access without any hassle.
Once in Access, I went through several queries in order to reduce the data by almost half. Why? In the raw data, there are generally two listings for each game representing the two combinations of expressing team order (i.e. TeamA vs TeamB, and TeamB vs TeamA). I then had to filter games with non-Division I opponents (as these only had one listed game, not two) as well as games played on neutral sites. Lastly, I added conference tables for each of the schools.
What a freakin’ mess.
So, barring any goofs on my part, I should now have a database with (hopefully still accurate) game data for the 2005-2006 season with which I can test and compare the various formulae in this project. Next time, we’ll start playing with some numbers.
Leave a Reply