My Clojure Adventure: November 2011

Wednesday, November 30, 2011

Intro to Clojure-clr: Connecting to SQL Server and MySQL

Today’s post is a quick introduction to how connect and retrieve rows from a SQL Server and MySQL databases.

The Setup

If you haven’t already done so setup Clojure-clr. Obviously you’ll need to have access to a SQL Server and MySQL instances. If you do not have one or both of the databases you can download and install them from here: SQL Server Express, MySQL. After downloading and installing the databases grab the data and schema for this post from here: SQL Server version, MySQL version. Next load the data/schema files then you are ready to go!

Connecting to SQL Server

Since we are working with Clojure-clr I think It would only be proper to start off with connecting to SQL Server. In order to do that I need to load the System.Data assembly. It contains the necessary classes for interacting with the database.

Once I have System.Data loaded its time to start the connection process. The first thing I need to do is create a connection and then open it.

Now that I have an open connection I can create a SqlCommand object. The SqlCommand constructor version I’m using takes two parameters a SQL Statement and a database connection object. After creating the SqlCommand object I run SQL statement by calling the ExecuteReader method. Now the data is ready to be retrieved.

To keep the blog post simple I’m going to grab the results and print out the player id value using a while loop. When the while loop completes I close the reader and the database connection objects.

That is the quick and dirty way to connect to SQL Server from Clojure-clr. It isn’t as elegant as it is in the JVM version but I hope to get it that way some day. I am working on a project at work that, as time permits, I am attempting to port the java.jdbc code over to the CLR. If/When I get it working I will be sure to blog about it. Now it is time to connect to a MySQL database.

Connecting to MySQL

Connecting to MySQL follows the same process as connecting to SQL Server. In fact if you don’t look too closely you might think I’m using the same code to connect to MySQL. I wish that was the case but its not. In order to connect to MySQL from .NET you’ll need to download the assembly Mysql.Data from the MySQL developer site. I have used the Mysql.Data assembly for awhile so I added it to the GAC which allows me to load it into my Clojure code the same way I did for the SQL Server version. If you don’t want to add Mysql.Data to the GAC you can load it using the assembly-load-from function: (assembly-load-from “the path to the dll”).

I now have the MySQL libraries loaded and I’m ready to grab the playerId’s from the database. Here is the my-run-it function:

As you can see the method names are almost the same, everything is prefaced with My. You still do the same process, create the connection, the command, execute the command and read from the reader.

Summary

Connecting to SQL Server and MySQL is pretty straight forward. Just load the appropriate assembly and you are off. In the future I hope to have a cleaner method of interacting with databases through either a direct port of java.jdbc or something very similar.

I am relatively new to Clojure so if you see code that I’ve written that makes you cringe please feel free to leave a comment with your suggestion. I am all ears.

About the Data

The data I’m using for this blog post comes from the Baseball Databank project. The project has gathered all the baseball status from previous season and offers the data in many different formats.

Resources

Clojure-clr Setup: Clojure-clr Download Page and Getting Started with Clojure-clr.

Database Downloads: SQL Server Express, MySQL

Data: SQL Server version, MySQL version

MySQL Assembly: Mysql.Data

My Source: https://github.com/rippinrobr/clojure-clr-intro/zipball/master

Tuesday, November 22, 2011

Intro to Clojure-clr: Using the spit function

A couple of weeks back I opened an issue with the Clojure-clr ‘team’ because I thought the spit function had a bug. When I checked my email this morning I had a response from David Miller who runs the Clojure-clr port. In the email he politely showed me the error of my ways. I wanted to share this with anyone else who may have been looking to use spit to append to a file on the CLR.

Using spit the JVM way

When I was trying to see if I could use spit to append to a file I did a quick Google search and found this page on the ClojureDocs site for spit. When I saw their append example below I tried it in the Clojure-clr REPL.

I received no errors from the spit function but as you can see each call to the function was overwriting the previous call. My next step was to try the same thing in the JVM REPL to see if it worked as the web page stated it should. On the JVM spit worked as advertised. Up to this point I hadn’t come across a core function on the CLR that didn’t have the same type of parameters as the JVM function. This lead me to believe I had found a bug so I submitted a bug report.

Using spit the CLR way

Fast forward a couple of weeks to the point where I checked my email this morning. In his response David laid out what I had done wrong and gave more information related to spit:

Lack of documentation is definitely an issue here.

Clojure 1.4.0-master-SNAPSHOT
user=> (spit "hi.txt" "Test 1\n" :file-mode System.IO.FileMode/Append)
nil
user=> (spit "hi.txt" "Test 2\n" :file-mode System.IO.FileMode/Append)
nil
user=> (println (slurp "hi.txt"))
WARNING: (slurp f enc) is deprecated, use (slurp f :encoding enc).
Test 1
Test 2

Generally, the options available are ones that can be handled by the appropriate methods/ctors in System.IO.

In the source:

Common options include

 :buffer-size   Ths size of buffer to use (default: 1024).
 :file-share    A value from the System.IO.FileShare enumeration.
 :file-mode     A value from the System.IO.FileMode enumeration.
 :file-access   A value from the System.IO.FileAccess enumeration.
 :file-options  A value from the System.IO.FileOptions enumeration.
 :encoding  The encoding to use, either as a string, e.g. \"UTF-8\",
            a keyword, e.g. :utf-8, or a an System.Text.Encoding instance,
            e.g., (System.Text.UTF8Encoding.)
More documentation is most certainly needed.

I gave it a try on my 1.3 REPL with the System.IO.FileMode/Append and it worked like a champ!

Lessons Learned

So what did I learn from this? Next time I come across something that I think is a bug I will check the Clojure-clr source first. It is ok to write lazy code but the programmer himself cannot be lazy!

Summary

The CLR version of the spit function differs slightly from the JVM version. We can use the :file-mode key with the value System.IO.FileMode/Append when we want to append to a file using spit. It is also a good idea to check the Clojure-clr source prior to reporting any other issues I come across. Thank you David for your kind response!

Hugo-clr: Parsing Web Pages with Clojure-clr and HtmlAgilityPack

When I first became interested in learning Clojure I was in the middle of a science fiction reading kick and I was looking for new authors to read. So I decided I would try and pick up Clojure by writing code to parse the winners and nominees for the Best Novel category on the Hugo Awards web site. While I was writing the code writing I decided I would share my experience as a Clojure noob (still am) through a three part blog series that covered what I did with Clojure on the JVM (Parsing Web Pages with Clojure and Enlive, Creating a Hugo Award DB with Clojure and Sqlite, and Creating a Simple UI for the Hugo DB) .

About a month ago I decided to really give Clojure-clr a try so I thought I would go through the same process I did on the JVM version. Why? I thought it would give me a good way to compare and contrast the JVM and CLR versions of Clojure. Not that I’m a Clojure guru, I’m new to the world of parenthesis but doing the same project will allow me to point out the differences I came across between the CLR and JVM versions. With that said, let’s make sure you have your Clojure-clr environment set up.

Setup

Since the CLR world doesn’t have lein or a lein equivalent I have to do the configuration by hand. The first step is to install Clojure-clr if you haven’t already installed it. My post Getting Started with Clojure-clr will walk you through the steps. After setting up Clojure-clr download the HtmlAgilityPack. It is the .NET library I am using to parse the Hugo web pages. If you want the HtmlAgilityPack lib and source code you can grab it here: https://github.com/rippinrobr/hugo-clr/tree/hugoclr-parser and follow along that way. Just make sure you have the code from the hugoclr-parser branch. With the setup complete it is time to start looking at some code.

hugoclr.clj

The hugoclr.clj file is where the –main function lives. It calls hugoclr.parser/get-awards to retrieve the award pages, parse the nominees and winners data out and passes the results to the hugoclr.data.csv/write-to-file function to write out the data in a comma-delimited file.

There are only a couple of items I’d like to point out in the hugoclr.clj file. First is the way that the HtmlAgilityPack library is loaded.

(assembly-load-form "..\\libs\\HmtlAgilityPack.dll")

The function assembly-load-from is a new function to Clojure-clr. It was added in the 1.3 release. It is a wrapper around the System.Reflection.Assembly/LoadFile call. I find the assembly-load-form more clojure’esque and less typing so I’ve started using it.

The next line of interest is the :gen-class line. Using the :gen-class call is what triggers the generation of the hugoclr.exe file. if I didn’t add that line to my source I would only generate DLLs when I compile hugoclr. That’s it for hugoclr.clj. Its main purpose in life is to kick off the parsing and pass the results to the hugoclr.data.csv/write-to-file function. Next, I’ll discuss work horse of the project, the hugoclr/parser.clj file.

hugoclr/parser.clj

The hugoclr/parser.clj file is where most of the work is done. It handles the fetching of the web pages, parsing the award page links, and grabs the data from the awards pages, and converts the data into records that can will be used later. The entry point into the file is the get-awards function.

get-awards / get-html-elements / fetch-url

The get-awards function is the ‘main’ function of the hugoclr/parser.clj file. It is what drives the parsing process. The function starts by calls the get-html-elements function passing a URL to the history page and the XPATH that when applied will return a sequence of anchor tags starting with the 2011 awards page link.

Next get-html-elements passes the URL to fetch-url. fetch-url makes a request to the URL by creating a HtmlAgilityPack.HtmlWeb object and making a call to the HtmlWeb.Load method. The HtmlWeb.Load method ‘converts’ the retrieved web page into a HtmlDocument object.

The returned HtmlDocument’s SelectNodes method is called the XPATH that was passed to get-html-elements. SelectNodes applies the XPATH and returns a sequence of HtmlNode objects that represent the anchor tags on the Hugo History page. Since I only want the anchor tags that will lead me to the awards pages I us the map function to pass the HtmlNode objects through the validate-award-link function. The results of the map call is a sequence of links to award pages or nulls. The nulls are in the place of links that were not award page links. I remove them by calling filter passing a function that only keeps non-null entries. At the end of this process I have a sequence of valid award page links.

The last step of collecting the nominees and winners data is to parse each individual award page. I start by taking the first 12 links from the awards-link sequence and pass each one to the parse-awards-page function using the map function. Each link is then processed in the parse-awards-page function returning a sequence of Category records that represent each awards category for the given year. Now I have a sequence of Category sequences ready to be written out to a file. Before I go over that part of the code I would like to walk you through the parse-awards-page function.

You may be asking yourself why I’m only taking the 2000s. The answer is simple, I’m lazy. While writing the JVM and CLR versions I found that if I didn’t load the pages first in a browser I was unable to retrieve them programmatically. So if I wanted to process all of the pages I would have had to load them all. I’d be bored before I got of the 90s so I cut it off at 2000.

parse-awards-page

parse-awards-page uses the get-html-elements function to get a HtmlDocument object that represents the awards page to parse. The function then passes the object to the create-category-record function which as you might expect creates a Category record that represents each award category on the awards page. Since each page has more than one award category parse-awards-page returns a sequence of Category records.

create-category-record

As I said earlier, the Category record is the data structure that represents the nominees and winners of a particular Hugo Award category. The first step in creating a Category record is to find the paragraph tag that appears just before the category’s UL tag. The paragraph tag contains the year the award was given and the name of the award.

Once I have the paragraph node the next step is to find all of the list item tags in the award category’s unordered list. All but the first of the li tags contain the text that describe the nominees and winners for the award category currently being parsed. The nominee/winner li nodes are passed through a filter to make sure that only the li tags are kept.

Now that I have the paragraph and li tags I’m ready to create the Category record. The get-category-heading and get-year functions simply parse the text from the paragraph tag and return the award name and year. The li tags are passed to the create-works-seq function which creates a sequence of Work records that represents nominees and winners for the category. Once each category on the page has been parsed control is returned back to the parse-awards page so it can continue parsing the award pages until they have all been processed.

A Quick Side Note: Records vs. Structs

When I wrote the JVM version of this ‘application’ I used structs to model the categories and works. Using structs worked fine for what I was doing. However when I started writing the CLR project I was in the middle of reading the book The Joy of Clojure: Thinking the Clojure Way by Michael Fogus and Chris Houser. The authors mentioned that records have some advantages over structs and for that reason structs are falling out of favor. Some of the advantages of records are that the are created quicker than structs and take up less memory. They also look up keys quicker than array or hash maps. After reading that I went with records instead of structs in the CLR version. By the way I have really enjoyed reading The Joy of Clojure and I would highly recommend it.

And Now Back to the Code…

Now that we have parsed the all of 2000s award pages the only step left is to write the results out to a comma-delimited text file. In the –main function the results of the get-awards are passed to hugoclr.data.csv/write-to-file as its first parameter and the name of the output file as its second parameter. Lets walk through the last bit of code, the hugoclr/data/csv.clj file.

hugoclr.data.csv.clj

The write-to-file method does exactly what its name implies, writes something to a file. In our case it takes the awards, converts each record into a comma-delimited line and then writes them to the output file.

First I create a writable stream using .NET’s System.IO.StreamWriter class. I’ve told the stream to write the results to c:\temp\hugo.txt. I could have used the spit function but I decided to use a .NET library here. Once I have the stream I pass each category to the delimit function which simply cleans the title and publisher string and places a comma between all of the Work record’s fields. After each category has been converted the lines are then reduced into a single string. The string is written to the output file. Running the code produces an output file a file like this:

Running hugoclr

Now that I’ve walked you through the guts of the code it is time to show you what it looks like when it runs. First, I will show you how to run it in the REPL.

It is pretty straight forward. Fire up the REPL, load the hugoclr.clj file and then call the –main function. From there the code grabs the link page, parses it out and lets you know where it is in the process by telling you which page it is retrieving.

Remember, you must ‘prime’ the app before you run the code by loading each page in your favorite browser. I’m not sure why this is required. If anyone knows why this is happening and knows a way around please let me know.

Next, I will compile and run the code from the command line.

One thing to keep in mind when you compile your CLR code with Clojure 1.3.0 Debug on the 4.0 .NET CLR the executable and DLLs generated are placed in the compiler’s directory. Obviously the results are the same either way I run it.

Summary

Parsing the Hugo Awards list for the winners in the 2000s wasn’t all that different from the JVM version. I did find using the HtmlAgilityPack library a little easier to work with when parsing the web pages. This probably due to my familiarity with HtmlAgilityPack since I’ve used it in a few C# projects. Another reason I found it easier this time around is probably related to the fact that I’m a ‘little’ more comfortable writing Clojure code. I still have a long way to go though before I’m fluent in it.

Writing Clojure in the CLR environment wasn’t much different in this part of the project than the JVM version. In the CLR world we don’t have things like lein but so far I haven’t come across any issues that would prevent me from continuing to become familiar with Clojure CLR in hopes of using it at my day job. Which may come soon as in the next few days.

My next post in this serious will be on taking the data from the csv file, creating a SQL Server table, and loading the new table with the data from the file.

Since I am still pretty green in the Clojure world please feel free to leave a comment if you see something that is not idiomatic Clojure or if there is a better way to do something. I’m eager for any and all feedback.

Resources

Clojure-clr I’m using the 1.3 version with .Net 4.0 and HtmlAgilityPack
The Joy of Clojure: Thinking the Clojure Way

The Code

,You can download the code for this post from https://github.com/rippinrobr/hugo-clr/tree/hugoclr-parser . Just make sure you are on the hugoclr-parser branch.

Friday, November 4, 2011

Intro to Clojure-clr: How to Interact with .Net objects

Today’s blog post is another ‘quick hitter’ covering how to instantiate a .NET object and interact with it’s instance methods and properties.

The Setup

If you haven’t already installed Clojure-clr take a look at my previous post: Getting Started with Clojure-clr and get it installed. Next, open up a cmd.exe session and enter clojure.main.exe. to start a REPL session. Throughout all of my Clojure-clr blog posts I will assume you have added the clojure-clr directory to your PATH variable.

Instantiating a .NET Object

According to the CLR Interop page there are two ways to instantiate an object:

(Classname. args*) or (new Classname args*)

I will use the first function to create a System.IO.StreamWriter object that I will use to write to a file. At the REPL prompt enter:

(def file (System.IO.StreamWriter. “testing.txt”))

The line above created a StreamWriter object that will write to a file named testing.txt. Testing.txt will be in the directory where we started the REPL. The object is stored in the symbol named file. Now it's time to write a line to testing.txt.

Calling an Instance Method StreamWriter.WriteLine

To write our line enter the following lines into the REPL:

(.WriteLine file “This is a line sent from the REPL!”)

(.Close file)

Since WriteLine isn’t a static method we start off the list with the method call with .WriteLine. Next comes file, the symbol that represents our StreamWriter object. Everything after file is a parameter passed to WriteLine method, in this case it is the string I want to write to the output file. The .Close call flushes the buffer and closes the file. Ok, I’ve put the two lines into the REPL and ran them. How do I know if the line was actually written? Run the line below in the REPL:

(println (slurp “testing.txt”))

The slurp function reads and returns the contents of the testing.txt file. The println function prints it out to the REPL screen. Your REPL should look something like this:

I can see that my code did write out the line like I had hoped. However the slurp function generated a deprecated message. It is an easy issue to resolve by adding :encoding “ascii” to the call. The deprecation message only appears in the CLR REPL. If I run the same line in the JVM REPL I don’t see this message. So when I’m using slurp in CLR land I call slurp like this:

(slurp f :encoding “ascii”)

Now when I run slurp I see the line from testing.txt without the deprecation message. If you want to see a list of all the possible encoding values check out io.clj in the Clojure-clr source repo. The encoding values start on line 179.

Summary

You should now be able to instantiate a .NET object and call an instance method. In this case I created a StreamWriter object and used it to write a line out to a file. During the process we did see a difference between the JVM version of slurp and the CLR version of slurp which was easy to resolve the issue by passing in :encoding “ascii”.

Resources

Clojure-clr download page, Clojure-clr Interop page, Getting Started with Clojure-clr, An example that parses a web page and writes to a file using .NET objects