Tuesday, January 11, 2011

Project Chadwick #2–Top 5 SF Giants OBP (Ruby version)

Before I get started on this post if you aren't familiar with Project Chadwick here's a quick overview. The data for this problem can be downloaded from here

The Problem

The On Base Percentage or OBP is the percentage of time the player reaches base. It is calculated using the following formula:
OBP = (H + BB + HBP) / (AB + BB + HBP + SF)

H   = Hits
BB  = Walks
HBP = Hit By Pitch
AB  = At Bats
SF  = Sacrifice Flies

The batter should have had at least 200 at bats to be eligible. The results should be printed out with in the following format: players name, season, and OBP.

The Solution

New Ruby Concepts

Ruby Class

The first ‘new’ thing you see in this script is a class. There are three items I'd like to discuss about this class:  attr_accessor, initialize and to_s methods.

The attr_accessor is a method that is used to indicate which instance variables, also known as attributes, are available outside of the class. All of the variables declared in the attr_accessor call can be read and updated from outside of the class. You can also set up attributes that are read-only by calling attr_reader. To set up a write-only attributes by calling the attr_writer. When attributes are used within a class they are prefaced with @ sign.

Another feature of a Ruby class is the initialize method which is the class’s constructor. In this class there isn’t much going on here other than taking the stats and putting them in a variable that makes it easier to read the code.

Well, there is a little more going on then a simple assignment. In the stats fields the ||= says that if the value on the left is nil then assign 0.0 to it. The value is then being converted to float so we can do division.

The last method I want to discuss is the to_s method. This method is overriding the base class’s to_s which is the a to string class. In this method I’m just formatting the output of the class so my loop is a little cleaner.


One of my favorite things with Ruby is how arrays can be manipulated and used. The first example I’m adding the newly created Batter object to the end of the all_obps array. In the second line I'm sorting the array by the obp value. What I'd like to point out here is the ! at the end of sort. The ! indicates that the sorting will be done in place, in other words it changes the value of the array I’m sorting. If I left the ! off the sort would create a new array that was sorted leaving me with the original array AND a new, sorted array. Finally, the reverse! line has the [1,5] added to it. Which means I want the first through the fifth elements. In our case that is how I grab the top five OBP for the SF Giants.

An Overview of My Solution

In this script I’ve opted to write code that was easier to read instead of the fewest lines that’s why I created the Batter class. This way we can access the stats needed to calculate the OBP in a more straight forward way. Since I’ve already pointed out some of the nuances of Ruby classes and the calc_obp method being straight forward I’m not going to going to dissect the class any further here.

What I will talk about how the script reads and processes the data.  The first line in the snippet reads all lines of the file and then loops over it. I then create the Batter object and check to see if the batter had at least 200 at bats. If not I move on to the next line. If the batter had at least 200 at bats I calculate the OBP and store the Batter object in the all_obps array. The rest of the script sorts, reverses and prints the top five OBP for the San Francisco Giants.

That’s it, nothing to the Ruby script when you compare it to the F# script.  This may be due to the fact that I’ve worked with Ruby more than I have F# but I believe Ruby is just a little more concise than the F#.  I may change my mind after writing more F# code but for now that’s the way it seems to me.

I really enjoy writing Ruby code, it allows me to concentrate on what I need to do more than 'how do I do this in Ruby'. I guess what I'm trying to say it feels more natural than other languages such as C#.

Up Next...

I'm still working on getting more problems up for those of you who are interested in continuing on the baseball stats path. I'll write the Erlang version next and then wrap up the second problem with the Objective-C code.

Thanks for stopping by.

No comments:

Post a Comment