Thursday, December 22, 2011

Intro to Clojure-clr: Calling Clojure from C#

In today’s post I am going to show you how to call Clojure functions from a C# project. 

The Setup

The requirements for this post is to have ClojureCLR installed and to have access to a C# compiler.  If you haven’t installed Clojure-clr 1.3 you can do so by following the steps in my previous post Getting Started with Clojure-clr.  If you do not have a C# compiler already you can grab Visual C# 2010 Express or MonoDevelop.  I used ClojureCLR 1.3 and ClojureCLR 1.4-master-snapshot for this post.  I will explain the ClojureCLR 1.4 snapshot in moment.  For the C# portion of the post I used Visual Studio 2010 Ultimate.  I haven’t tried using MonoDevelop but I’m sure it would work fine for the C# portion of this post.

The Clojure Code

I have written a couple of functions in Clojure that I would like to re-use in a C# project.  The functions are ba which will calculate a batting average and standings which will tell me where a particular team finished in Major League Baseball’s National League West division. 

There isn’t much to either one of the Clojure functions but there are a few things I’d like to point out in the ns statement, mainly the :gen-class call and the :methods options passed to it. The :gen-class call is what forces the generation of the executable, without it only DLLs would be created when the code is compiled.  The :methods options us a way to indicate which functions should be exposed as methods in the generated .NET class. :methods expects a vector of method signatures. 

The method signature is prefaced with meta data to indicate that I want the methods to be static in the generated class. The exposed method's signature is described by a vector that follows the pattern below:

[method name [parameters] return value]

As you can see the ba method takes two int parameters and returns a double and the standings method takes a string and returns a string.


By default methods listed in :methods will be mapped to a Clojure function with the same name prefixed by a –. You can change the function’s prefix by passing the :prefix option to :gen-class with your desired prefix. As an example if I wanted to prefix the functions with csharp- I would pass :prefix “csharp-“. Then ba would map to a function of csharp-ba and the standings function would be csharp-standings.  In this example, I’m using the – functions to echo the parameters the function was called with and then I call the real functions. 

You may have noticed that the –ba and –standings functions have a parameter named dummy.  The dummy parameter is there as a work-around.  When I first started working on calling Clojure from C# I noticed that a function that did math was returning the incorrect value when called from C#.  If I ran the code with the same parameters from the REPL or after I compiled the code and ran it from the command line it worked fine.  I was only having the problem when I called the functions from C#.  To figure out what was going on I opened the clj file in Visual Studio and set a debug point in the –ba function. Here’s an example of stopping at a breakpoint in Visual Studio.  If you have VsClojure installed you can debug clojure code in the same way you can C#.  You can see how to get clojure support in Visual Studio in my blog post Getting VsClojure Up and Running in Visual Studio 2010

image

After inspecting all of the parameters I noticed that the last one always had a huge number in it as if it was a memory address when integers were the parameters.  When I added an extra parameter to the method declaration and the – function in the in the clj file the calculation returned correctly.   

The work-around is only for ClojureCLR 1.3 this has already been fixed in the 1.4 snapshot.  Thank you  to David Miller – he had it fixed in less than 24 hours of my reporting it!

In addition to the –ba and –standings functions I have a –main function which will be called when I run the executable. I compile it

clojure.compile one

Then run it:

one.exe 

Gives me the results below

image

In my github project I have a one14.clj file that works perfectly with the 1.4-snapshot that contains the fix for the issue. It is basically the same as one.clj except it does not need the dummy parameter.

That’s it for the Clojure side of things.  Now its time to talk about the C# code

The C# Code

The C# code itself doesn’t have anything remarkable.  It is vanilla C# code.  Which is exactly the way I like it, no special hoops to jump through if I want to call Clojure functions.  In order to call the Clojure code I needed to add a few references.  First I added the assemblies generated when I compiled one.clj: one.clj.dll and one.exe files.  Next I added the assembly Clojure.dll which can be found in the directory where you installed ClojureCLR. After getting the references in place I started writing code.  When I ran the code for the first time I received a System.TypeInitializationException with the message “The type imageinitializer for 'one' threw an exception." which wasn’t all that helpful.  I dug into the inner exception’s inner exception and found this message: "Could not locate clojure.core.clj.dll or clojure/core.clj on load path." Now that is a message I can do something with.  I added the clojure.core.clj.dll reference and re-ran the app only to find out I was missing a few other references. When I got my code to run I had added the references that are listed on the left.  Once I had the one.clj.dll, one.exe references with the clojure related ones on the left I was ready to go!  In the future adding the Clojure references will be much easier.

Now I can call the ba and standings methods from my C# app just like I call any other .NET static methods.  Here’s the code for calling the Clojure compiled with the 1.3 version compiler.


Notice that the extra parameter is the first parameter in the calls from C# even though in the clojure code it is the last parameter.  Here’s what the output looks like for the 1.3 code:

image

And the output of the 1.4 snapshot version:

image

Obviously they both produce the same outputs but 1.3 needs the placeholder parameter whereas the 1.4-snapshot version does not.

Summary

As you can see there isn’t much work to be done if you want to expose ClojureCLR functions to the outside .NET community.  You just need to tell gen-class which methods to expose and make sure you follow the naming convention for the functions on the clojure side.  On the C# side once you have all the references in place, which will become easier to do in the future, writing the code to call the clojure generated methods is no different than calling any other method in a .NET library. 

If you have any questions or blog post suggestions please feel free to leave a comment.

Resources

ClojureCLR: Getting Started with Clojure-clr and Getting VsClojure Up and Running in Visual Studio 2010.

C# IDEs: Visual C# 2010 Express or MonoDevelop

My Source (this blog’s code is the 3-calling-clojure-from-c-sharp): https://github.com/rippinrobr/clojure-clr-intro/zipball/master

Wednesday, November 30, 2011

Intro to Clojure-clr: Connecting to SQL Server and MySQL

Today’s post is a quick introduction to how connect and retrieve rows from a SQL Server and  MySQL databases.

The Setup

If you haven’t already done so setup Clojure-clr.  Obviously you’ll need to have access to a SQL Server and MySQL instances.  If you do not have one or both of the databases you can download and install them from here:  SQL Server Express, MySQL.  After downloading and installing the databases grab the data and schema for this post from here: SQL Server version, MySQL version.  Next load the data/schema files then you are ready to go!

Connecting to SQL Server

Since we are working with Clojure-clr I think It would only be proper to start off with connecting to SQL Server.  In order to do that I need to load the System.Data assembly.  It contains the necessary classes for interacting with the database.

Once I have System.Data loaded its time to start the connection process.  The first thing I need to do is create a connection and then open it.

Now that I have an open connection I can create a SqlCommand object.  The SqlCommand constructor version I’m using takes two parameters a SQL Statement and a database connection object.  After creating the SqlCommand object I run SQL statement by calling the ExecuteReader method.  Now the data is ready to be retrieved.

To keep the blog post simple I’m going to grab the results and print out the player id value using a while loop. When the while loop completes I close the reader and the database connection objects.

That is the quick and dirty way to connect to SQL Server from Clojure-clr.  It isn’t as elegant as it is in the JVM version but I hope to get it that way some day.  I am working on a project at work that,  as time permits, I am attempting to port the java.jdbc code over to the CLR.  If/When I get it working I will be sure to blog about it.  Now it is time to connect to a MySQL database.

Connecting to MySQL

Connecting  to MySQL follows the same process as connecting to SQL Server. In fact if you don’t look too closely you might think I’m using the same code to connect to MySQL.  I wish that was the case but its not. In order to connect to MySQL from .NET you’ll need to download the assembly Mysql.Data from the MySQL developer site.  I have used the Mysql.Data assembly for awhile so I added it to the GAC which allows me to load it into my Clojure code the same way I did for the SQL Server version. If you don’t want to add Mysql.Data to the GAC you can load it using the assembly-load-from function: (assembly-load-from “the path to the dll”). 

I now have the MySQL libraries loaded and I’m ready to grab the playerId’s from the database. Here is the my-run-it function:

As you can see the method names are almost the same, everything is prefaced with My.  You still do the same process, create the connection, the command, execute the command and read from the reader.

Summary

Connecting to SQL Server and MySQL is pretty straight forward. Just load the appropriate assembly and you are off.  In the future I hope to have a cleaner method of interacting with databases through either a direct port of java.jdbc or something very similar.

I am relatively new to Clojure so if you see code that I’ve written that makes you cringe please feel free to leave a comment with your suggestion.  I am all ears.

About the Data

The data I’m using for this blog post comes from the Baseball Databank project.  The project has gathered all the baseball status from previous season and offers the data in many different formats. 

Resources

Clojure-clr Setup: Clojure-clr Download Page and Getting Started with Clojure-clr.

Database Downloads: SQL Server Express, MySQL

DataSQL Server version, MySQL version

MySQL Assembly: Mysql.Data

My Source: https://github.com/rippinrobr/clojure-clr-intro/zipball/master

Tuesday, November 22, 2011

Intro to Clojure-clr: Using the spit function

A couple of weeks back I opened an issue with the Clojure-clr ‘team’ because I thought the spit function had a bug.   When I checked my email this morning I had a response from David Miller who runs the Clojure-clr port. In the email he politely showed me the error of my ways. I wanted to share this with anyone else who may have been looking to use spit to append to a file on the CLR.

Using spit the JVM way

When I was trying to see if I could use spit to append to a file I did a quick Google search and found this page on the ClojureDocs site for spit. When I saw their append example below I tried it in the Clojure-clr REPL.

image

I received no errors from the spit function but as you can see each call to the function was overwriting the previous call.  My next step was to try the same thing in the JVM REPL to see if it worked as the web page stated it should.  On the JVM spit worked as advertised.  Up to this point I hadn’t come across a core function on the CLR that didn’t have the same type of parameters as the JVM function. This lead me to believe I had found a bug so I submitted a bug report.

Using spit the CLR way

Fast forward a couple of weeks to the point where I checked my email this morning. In his response David laid out what I had done wrong and gave more information related to spit:

Lack of documentation is definitely an issue here.

Clojure 1.4.0-master-SNAPSHOT
user=> (spit "hi.txt" "Test 1\n" :file-mode System.IO.FileMode/Append)
nil
user=> (spit "hi.txt" "Test 2\n" :file-mode System.IO.FileMode/Append)
nil
user=> (println (slurp "hi.txt"))
WARNING: (slurp f enc) is deprecated, use (slurp f :encoding enc).
Test 1
Test 2

Generally, the options available are ones that can be handled by the appropriate methods/ctors in System.IO.

In the source:

Common options include

:buffer-size Ths size of buffer to use (default: 1024).
:file-share A value from the System.IO.FileShare enumeration.
:file-mode A value from the System.IO.FileMode enumeration.
:file-access A value from the System.IO.FileAccess enumeration.
:file-options A value from the System.IO.FileOptions enumeration.
:encoding The encoding to use, either as a string, e.g. \"UTF-8\",
a keyword, e.g. :utf-8, or a an System.Text.Encoding instance,
e.g., (System.Text.UTF8Encoding.)
More documentation is most certainly needed.

I gave it a try on my 1.3 REPL with the System.IO.FileMode/Append and it worked like a champ! 

Lessons Learned

So what did I learn from this?  Next time I come across something that I think is a bug I will check the Clojure-clr source first.  It is ok to write lazy code but the programmer himself cannot be lazy!

Summary

The CLR version of the spit function differs slightly from the JVM version.  We can use the :file-mode key with the value System.IO.FileMode/Append when we want to append to a file using spit.  It is also a good idea to check the Clojure-clr source prior to reporting any other issues I come across.  Thank you David for your kind response!

Hugo-clr: Parsing Web Pages with Clojure-clr and HtmlAgilityPack

When I first became interested in learning Clojure I was in the middle of a science fiction reading kick and I was looking for new authors to read. So I decided I would try and pick up Clojure by writing code to parse the winners and nominees for the Best Novel category on the Hugo Awards web site.  While I was writing the code writing I decided I would share my experience as a Clojure noob (still am) through a three part blog series that covered what I did with Clojure on the JVM (Parsing Web Pages with Clojure and Enlive, Creating a Hugo Award DB with Clojure and Sqlite, and Creating a Simple UI for the Hugo DB) . 

About a month ago I decided to really give Clojure-clr a try so I thought I would go through the same process I did on the JVM version. Why? I thought it would give me a good way to compare and contrast the JVM and CLR versions of Clojure. Not that I’m a Clojure guru, I’m new to the world of parenthesis but doing the same project will allow me to point out the differences I came across between the CLR and JVM versions.  With that said, let’s make sure you have your Clojure-clr environment set up.

Setup

Since the CLR world doesn’t have lein or a lein equivalent I have to do the configuration by hand. The first step is to install Clojure-clr if you haven’t already installed it.  My post Getting Started with Clojure-clr will walk you through the steps. After setting up Clojure-clr download the HtmlAgilityPack.  It is the .NET library I am using to parse the Hugo web pages.  If you want the HtmlAgilityPack lib and source code you can grab it here: https://github.com/rippinrobr/hugo-clr/tree/hugoclr-parser and follow along that way.  Just make sure you have the code from the hugoclr-parser branch.  With the setup complete it is time to start looking at some code.

hugoclr.clj

The hugoclr.clj file is where the –main function lives. It calls hugoclr.parser/get-awards to retrieve the award pages, parse the nominees and winners data out and  passes the results to the hugoclr.data.csv/write-to-file function to write out the data in a comma-delimited file.

There are only a couple of items I’d like to point out in the hugoclr.clj file. First is the way that the HtmlAgilityPack library is loaded.

(assembly-load-form "..\\libs\\HmtlAgilityPack.dll")

The function assembly-load-from is a new function to Clojure-clr.  It was added in the 1.3 release.  It is a wrapper around the System.Reflection.Assembly/LoadFile call. I find the assembly-load-form more clojure’esque and less typing so I’ve started using it. 

The next line of interest is the :gen-class line.  Using the :gen-class call is what triggers the generation of the hugoclr.exe file. if I didn’t add that line to my source I would only generate DLLs when I compile hugoclr.  That’s it for hugoclr.clj. Its main purpose in life is to kick off the parsing and pass the results to the hugoclr.data.csv/write-to-file function. Next, I’ll discuss work horse of the project, the hugoclr/parser.clj file.

hugoclr/parser.clj

The hugoclr/parser.clj file is where most of the work is done. It handles the fetching of the web pages, parsing the award page links, and grabs the data from the awards pages, and converts the data into records that can will be used later. The entry point into the file is the get-awards function.

get-awards / get-html-elements / fetch-url

The get-awards function is the ‘main’ function of the hugoclr/parser.clj file. It is what drives the parsing process. The function starts by calls the get-html-elements function passing a URL to the history page and the XPATH that when applied will return a sequence of anchor tags starting with the 2011 awards page link.

Next get-html-elements passes the URL to fetch-url.  fetch-url makes a request to the URL by creating a HtmlAgilityPack.HtmlWeb object and making a call to the HtmlWeb.Load method.  The HtmlWeb.Load  method ‘converts’ the retrieved web page into a HtmlDocument object.

The returned HtmlDocument’s SelectNodes method is called the XPATH that was passed to get-html-elements. SelectNodes applies the XPATH and returns a sequence of HtmlNode objects that represent the anchor tags on the Hugo History page. Since I only want the anchor tags that will lead me to the awards pages I us the map function to pass the HtmlNode objects through the validate-award-link function.  The results of the map call is a sequence of links to award pages or nulls.  The nulls are in the place of links that were not award page links. I remove them by calling filter passing a function that only keeps non-null entries. At the end of this process I have a sequence of valid award page links. 

The last step of collecting the nominees and winners data is to parse each individual award page.  I start by taking the first 12 links from the awards-link sequence and pass each one to the parse-awards-page function using the map function.  Each link is then processed in the parse-awards-page function returning a sequence of Category records that represent each awards category for the given year.  Now I have a sequence of Category sequences ready to be written out to a file.  Before I go over that part of the code I would like to walk you through the parse-awards-page function.

You may be asking yourself why I’m only taking the 2000s.  The answer is simple, I’m lazy.  While writing the JVM and CLR versions I found that if I didn’t load the pages first in a browser I was unable to retrieve them programmatically.  So if I wanted to process all of the pages I would have had to load them all.  I’d be bored before I got of the 90s so I cut it off at 2000. 

parse-awards-page

parse-awards-page uses the get-html-elements function to get a HtmlDocument object that represents the awards page to parse.  The function then passes the object to the create-category-record function which as you might expect creates a Category record that represents each award category on the awards page.  Since each page has more than one award category parse-awards-page returns a sequence of Category records. 

create-category-record

As I said earlier, the Category record is the data structure that represents the nominees and winners of a particular Hugo Award category.  The first step in creating a Category record is to find the paragraph tag that appears just before the category’s UL tag.  The paragraph tag contains the year the award was given and the name of the award.     

Once I have the paragraph node the next step is to find all of the list item tags in the award category’s unordered list.  All but the first of the li tags contain the text that describe the nominees and winners for the award category currently being parsed. The nominee/winner li nodes are passed through a filter to make sure that only the li tags are kept. 

Now that I have the paragraph and li tags I’m ready to create the Category record.  The get-category-heading and get-year functions simply parse the text from the paragraph tag and return the award name and year.  The li tags are passed to the create-works-seq function which creates a sequence of Work records that represents nominees and winners for the category.   Once each category on the page has been parsed control is returned back to the parse-awards page so it can continue parsing the award pages until they have all been processed.

A Quick Side Note: Records vs. Structs

When I wrote the JVM version of this ‘application’ I used structs to model the categories and works.  Using structs worked fine for what I was doing.  However when I started writing the CLR project I was in the middle of reading the book The Joy of Clojure: Thinking the Clojure Way by Michael Fogus and Chris Houser.  The authors mentioned that records have some advantages over structs and for that reason structs are falling out of favor.  Some of the advantages of records are that the are created quicker than structs and take up less memory.  They also look up keys quicker than array or hash maps.  After reading that I went with records instead of structs in the CLR version.  By the way I have really enjoyed reading The Joy of Clojure and I would highly recommend it. 

And Now Back to the Code…

Now that we have parsed the all of 2000s award pages the only step left is to write the results out to a comma-delimited text file.  In the –main function the results of the get-awards are passed to hugoclr.data.csv/write-to-file as its first parameter and the name of the output file as its second parameter.  Lets walk through the last bit of code, the hugoclr/data/csv.clj file.

hugoclr.data.csv.clj

The write-to-file method does exactly what its name implies, writes something to a file.  In our case it takes the awards, converts each record into a comma-delimited line and then writes them to the output file.

First I create a writable stream using .NET’s System.IO.StreamWriter class.  I’ve told the stream to write the results to c:\temp\hugo.txt.  I could have used the spit function but I decided to use a .NET library here.  Once I have the stream I pass each category to the delimit function which simply cleans the title and publisher string and places a comma between all of the Work record’s fields. After each category has been converted the lines are then reduced into a single string.  The string is written to the output file.  Running the code produces an output file a file like this:

Running hugoclr

Now that I’ve walked you through the guts of the code it is time to show you what it looks like when it runs.  First, I will show you how to run it in the REPL.

image

It is pretty straight forward.  Fire up the REPL, load the hugoclr.clj file and then call the –main function.  From there the code grabs the link page, parses it out and lets you know where it is in the process by telling you which page it is retrieving. 

Remember, you must ‘prime’ the app before you run the code by loading each page in your favorite browser. I’m not sure why this is required. If anyone knows why this is happening and knows a way around please let me know.

Next, I will compile and run the code from the command line.

image

One thing to keep in mind when you compile your CLR code with Clojure 1.3.0 Debug on the 4.0 .NET CLR the executable and DLLs generated are placed in the compiler’s directory.  Obviously the results are the same either way I run it.

Summary

Parsing the Hugo Awards list for the winners in the 2000s wasn’t all that different from the JVM version.  I did find using the HtmlAgilityPack library a little easier to work with when parsing the web pages. This probably due to my familiarity with HtmlAgilityPack since I’ve used it in a few C# projects.  Another reason I found it easier this time around is probably related to the fact that I’m a ‘little’ more comfortable writing Clojure code.  I still have a long way to go though before I’m fluent in it.

Writing Clojure in the CLR environment wasn’t much different in this part of the project than the JVM version.  In the CLR world we don’t have things like lein but so far I haven’t come across any issues that would prevent me from continuing to become familiar with Clojure CLR in hopes of using it at my day job.  Which may come soon as in the next few days.

My next post in this serious will be on taking the data from the csv file, creating a SQL Server table, and loading the new table with the data from the file.

Since I am still pretty green in the Clojure world please feel free to leave a comment if you see something that is not idiomatic Clojure or if there is a better way to do something.  I’m eager for any and all feedback.

Resources

Clojure-clr I’m using the 1.3 version with .Net 4.0 and HtmlAgilityPack
The Joy of Clojure: Thinking the Clojure Way

The Code

,You can download the code for this post from https://github.com/rippinrobr/hugo-clr/tree/hugoclr-parser .  Just make sure you are on the hugoclr-parser branch.

Friday, November 4, 2011

Intro to Clojure-clr: How to Interact with .Net objects

Today’s blog post is another ‘quick hitter’ covering how to instantiate a .NET object and interact with it’s instance methods and properties.

The Setup

If you haven’t already installed Clojure-clr take a look at my previous post:  Getting Started with Clojure-clr and get it installed.  Next, open up a cmd.exe session and enter clojure.main.exe. to start a REPL session.  Throughout all of my Clojure-clr blog posts I will assume you have added the clojure-clr directory to your PATH variable.

Instantiating a .NET Object

According to the CLR Interop page there are two ways to instantiate an object:

(Classname. args*) or (new Classname args*)

I will use the first function to create a System.IO.StreamWriter object that I will use to write to a file.  At the REPL prompt enter:

(def file (System.IO.StreamWriter. “testing.txt”))

The line above created a StreamWriter object that will write to a file named testing.txt.  Testing.txt will be in the directory where we started the REPL.  The object is stored in the symbol named file. Now it's time to write a line to testing.txt.

Calling an Instance Method StreamWriter.WriteLine

To write our line enter the following lines into the REPL:

(.WriteLine file “This is a line sent from the REPL!”)

(.Close file)

Since WriteLine isn’t a static method we start off the list with the method call with .WriteLine. Next comes file, the symbol that represents our StreamWriter object. Everything after file is a parameter passed to WriteLine method, in this case it is the string I want to write to the output file.  The .Close call flushes the buffer and closes the file.  Ok, I’ve put the two lines into the REPL and ran them.  How do I know if the line was actually written?  Run the line below in the REPL:

(println (slurp “testing.txt”))

The slurp function reads and returns the contents of the testing.txt file. The println function prints it out to the REPL screen.  Your REPL should look something like this:

image

I can see that my code did write out the line like I had hoped.  However the slurp function generated a deprecated message.  It is an easy issue to resolve by adding :encoding “ascii” to the call. The deprecation message only appears in the CLR REPL.  If I run the same line in the JVM REPL I don’t see this message.  So when I’m using slurp in CLR land I call slurp like this:

(slurp f :encoding “ascii”)

Now when I run slurp I see the line from testing.txt without the deprecation message.  If you want to see a list of all the possible encoding values check out io.clj in the Clojure-clr source repo.  The encoding values start on line 179.

Summary

You should now be able to instantiate a .NET object and call an instance method.  In this case I created a StreamWriter object and used it to write a line out to a file.  During the process we did see a difference between the JVM version of slurp and the CLR version of slurp which was easy to resolve the issue by passing in :encoding “ascii”. 

Resources

Clojure-clr download pageClojure-clr Interop page, Getting Started with Clojure-clr, An example that parses a web page and writes to a file using .NET objects

Thursday, October 27, 2011

Getting Started with Clojure-clr

Update: All github ClojureCLR links have been updated to use the new repository at github.com/clojure/clojure-clr

After taking the summer off from blogging to spend time with the family, I am ready to get back to my blog. My first post of the ‘fall blogging season’ is a brief intro Clojure-clr. In this post I will walk you though getting clojure-clr installed, show you how to interact with the REPL and compile an application.

The Goal

By the end of this post I will be able to interact with .NET assemblies from the clojure-clr REPL and be able to compile a basic clojure-clr application from the command line.

The Setup

The first thing to do is to visit the Download Page of the Clojure-clr project on github. I chose to install the clojure-clr-1.3.0-Debug-4.0zip version. Once you have downloaded a version follow these instructions Getting started binary distribution page. Make sure to ‘Unblock’ the zip file before you unzip it. Failure to do so will cause exceptions to be thrown every time you try to run any of the executables or use any of the DLLs in the zip file. For ease of use I added the directory I extracted the files to into my PATH environment variable.

The REPL

Next, I started up cmd.exe and typed: clojure.main.exe . This will start up the REPL. If everything goes right you should see something similar to this image below.

image

The version number you see will be determined by the version you downloaded. Now that we have the REPL up and running lets take try some clojure code.  At the prompt enter:  (println “yep, it worked!”)  and press the enter key.  That should give you:

image

Ok, the ‘normal’ clojure seems to be working, now its time to try interacting with .NET.

Interacting with .NET

Now that we know we have the REPL working it is time to try out a few calls into the .NET world.  The first call we will make is to Console.WriteLine.  Since System.Console is loaded automatically by the REPL we can call it like this: 

(System.Console/WriteLine “I just called a .NET method!”)

image

It works the same as the println call did.  What if you want to interact with a .NET lib that hasn’t already been loaded into our namespace by the REPL?  There are a couple of ways to do it one takes a lot of typing and the other takes about half the amount.  To illustrate I will load the System.Windows.Forms assembly and load the MessageBox class. The verbose way to load a library is to call the Assembly.Load method:

(System.Reflection.Assembly/Load "System.Windows.Forms,
Version=2.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089")

When you hit enter after typing the above code in you should see this:

image

The second way to load a .NET assembly takes far less typing.  It makes use of the Assembly/LoadWithPartialName method.  Here’s what it looks like:

(System.Reflection.Assembly/LoadWithPartialName "System.Windows.Forms")

The results of this call will give you the same output as the more verbose Load call did.  Either way you choose will work fine.  If loading a particular version of a an assembly is crucial then I would go with the Load call.  Otherwise I would stick with the LoadWithPartialName method. 

Now that we have loaded System.Windows.Forms I’m going to include it into our namespace so I can use the MessageBox class so I can make calls into it a little cleaner.  I can do that by typing the following in the REPL

(import (System.Windows.Forms MessageBox))

The call above tells the REPL we want to bring the System.Windows.Forms.MessageBox into our user namespace.  This will allow us to use MessageBox/Show instead of System.Windows.Forms.MessageBox/Show.   When entered the line of code below I saw the dialog box that follows it. Pretty straight forward.

(MessageBox/Show “Hi from clojure-clr!” “Clojure-CLR Dialog”)

image

Now that I’ve shown you how to make a call into .NET from the REPL let’s compile a clojure-clr ‘application’. 

Compiling a Clojure-clr Application

For our example I will create an ‘application’ that consists of all the code that we entered into the REPL above in a file called intro.clj .  After you have downloaded the file, change into the directory where it was saved.  Then run:

clojure.compile intro

The compile command will create the following files in the same directory where the clojure.compile.exe file lives.  image

I haven’t found out how to tell the compiler where to put the output yet so for now I just run them from that directory.  (If you know how to tell the compiler where to put the output please leave a comment below.  I would really appreciate it!)

Running the intro.exe file will produce the println, Console.Writeln, and MessageBox.Show output just as it was in the REPL.

Summary

That’s it for my very brief intro into the world of clojure on the CLR.  I walked you through using the REPL to call ‘normal’ clojure and how to interact with .NET.  I also showed you how to compile a clojure-clr application on the command line.

In future clojure-clr posts I will convert my hugo project posts (Parsing Web Pages with Clojure and Enlive, Creating Hugo Awards DB-with Clojure, and Creating a Simple UI for the Hugo Awards DB) to hugo-clr using the .NET equivalents.  Plus I plan on having a post or two on how to use the clojure-clr assemblies in a C# application.

Wednesday, June 15, 2011

Creating a Simple UI for the Hugo DB

In my past two posts, Parsing Web Pages with Clojure and enlive and Creating a Hugo Awards DB with Clojure and Sqlite, I parsed and stored the list of Hugo Best Novel award nominees and winners. In this, the last post on the Hugo data, I will build a very basic UI in Clojure using Swing and Java’s awt library.

The Goal

By the end of the post I will have code to display a listing of the nominees in a UI. The data will come from the sqlite database I created in my previous post.

The Setup

If you’ve been following the past few posts then you already have everything you need installed. The only thing you will need to do is grab the code from my hugoUI branch.

For those of you who haven’t been following along the first thing you will need to do is grab sqlite. Since I’m working on I windows I grabbed these two downloads: sqlite shell and sqlite-dll. After downloading the files make sure they are in your PATH.

Next, grab leiningen. Leiningen is a Clojure build tool that helps you manage your projects. I use in all of my clojure projects. The install takes no time at all, just follow the instructions on the project’s page and you will be ready for business.

Updating the the project.clj file

Since I am not going to be doing any HTML parsing I've removed the enlive dependency from my project.clj file. All other dependencies I had in the previous version still remain.

Just to be on the safe side I ran lein deps to ensure I have everything ready.

The Code

The UI app consists of four code files: cmdline.clj, main_controller.clj, sqlite.clj, and main_view.clj. The cmdline.clj file contains the main method and starts the app. The main_controller.clj file retrieves the data and calls the function that creates the view. The data is retrieved by calling the functions in the sqlite.clj file. The last file is the main_view.clj file who's job is to create the user interface and display the data.

The cmdline.clj file

The cmdline.clj file’s sole purpose is to start the application. This is done in part by using the :gen-class macro. The macro will create a java class file called cmdline.class in the project’s classes directory. Any methods in the java class will look in the source clj file for a method by the same name but preceded by a –. That is why the –main function exists in cmdline.clj. It is what is called when I execute the app outside of the REPL. Within the main function I call hugo.controller.main-controller/show-list to kick off the data retrieval and displaying of the UI.

The hugo.controllers.main-controller.clj file

The functions in this file are concerned with retrieving the data, formatting it and passing it along to the view.  Before I jump into what the functions are doing I want to point out my use of the defn- macro.  Using this macro allows me to ‘hide’ the function from anyone outside of the namespace that it was created in.  So in the hugo.controllers.main-controller namespace there is only one function that is visible outside of the namespace.  That is the show-list function.

The show-list function is a simple one.  It starts off by calling the get-nominees-data-list function which is where the majority of the data retrieval and formatting takes place.  It starts off by calling the hugo.db.sqlite/get-years function. As you might expect the get-years function retrieves all of the years that are stored in the database.  I use the map function to iterate over the returned sequence of the years.  For each year the format-data function is called.

The format-data function takes a year as its only parameter.  The year is then passed to the hugo.db.sqlite/get-nominees function which returns a sequence of maps for all nominees in the year specified.  The results of the get-nominees call are passed through their own map function that is used to format the data into a string that can be displayed in a list. Finally, the results of the get-nominees map file are concat'ed with a title string and a spacing string to create sequence that is ready to be displayed on the UI.  Now that I have all of the data retrieved and formatted its time to display it.

The hugo.views.main-view file

As I said earlier I am creating a UI using the Java UI libraries Swing and awt.  The :import macro brings in the BorderLayout and ActionListener classes.  The BorderLayout class is used for laying out the objects on the UI.  The ActionListener class is used for the event handler when the Close button is clicked.  The classes I’m bringing in from Swing are all UI objects except for the DefaultListModel class.  That class is used to manage the list of novels that will be displayed in the list box.

The hugo.controllers.main-controller/show-list function calls the create-home-view function after it has retrieved and formatted the data. Inside of create-home-view all of the UI objects are created. First I need to create the main window by creating an instance of the JFrame object passing in the title of the window. Next I create the panel that will contain all of my UI objects using JPanel. The title label and close button objects are pretty self-explanatory.

The last three lines of the let statement are all relate to creating the list. The list-model is used to manage the data in the list. Once I have a list model object I can create the JList object. When the JList object is created I passed in the list-model object to associate the list with the list-model. The last object created is the JScrollPanel object which is used to provide scrolling capabilities for the list box.

Now that I have the UI objects created its time to call the ‘private’ function add-list-items. This function’s job is to add the formatted list data to the list-model object. The add-list-items function uses doseq to loop over the data sequence.  Each item is added to the list-model object using the addElement method.  Basically, the add-list-items takes the items from the data sequence and adds them to the list box via the list-model.

Once the data has been handled its time to dress up the title-label. I can change the look and feel of the label by calling the JLabel class's setBackground, setForeground and setFont methods. I changed the look of the close button by calling the equivalent methods on the JButton class. Since I want something to happen when a user clicks the close button I added the event handler function close-button-handler by calling the addActionListener method.

After the cosmetic steps have been completed it is time to add the objects to the frame and size the main window. The last line of code makes the window visible. Now that I have everything ready its time to run the app. To run the app I used:

lein run

When the app comes up it should look like the picture below. Not too bad for my first Clojure UI.

hugo-ui Summary

Creating a user interface in Clojure is not difficult. If someone like me who makes his living in the .NET world can create a UI like this I would imagine it would be much easier for Java devs. As always I’m looking for any and all feedback on my code.  Do not hesitate to leave a comment.

Help Me Raise Money for The Leukemia and Lymphoma Society!

This November I will be running my first half marathon. I’m running as a member of the Team in Training organization who raises money for the Leukemia and Lymphoma Society. I have pledged to raise $950 that will go to research to help find a cure blood cancers. Please show your support by leaving a donation here. Any amount is greatly appreciated! Thank you for your support.

Monday, May 23, 2011

Creating a Hugo Awards DB with Clojure and Sqlite

In my previous post I wrote code to retrieve the Hugo Award – Best Novel winners and nominees for the 2000s and write the results into a text file. While I was working on that post I thought it would be nice to have a database with this information. In this post I will walk you through the code I wrote to create a sqlite database using clojure.contrib.sql library.

The Goal

By the end of the post I will have code that will retrieve the nominees and winners since 2000 and create then load sqlite database with the parsed data . The nominees table will have the following columns: id, year, title, author, winner, read_it, own_it, want_it.

The Setup

The first thing to do is get sqlite installed on your machine. Since I’m working on I windows I grabbed these two downloads: sqlite shell and sqlite-dll. After downloading the files make sure they are in your PATH.

Next, I created a hugoDB branch to my hugo project on github to keep the code from this post separate from the original hugo post.

You will also need leiningen installed. Leiningen is a Clojure build tool that helps you manage your projects. I use in all of my clojure projects. The install takes no time at all, just follow the instructions on the project’s page and you will be ready for business.

Creating/Updating the HugoDB Project.clj File

Once the hugoDB branch was created I updated the project.clj file to include the sqlitejdbc library. The library allows me to connect to a sqlite database. Here is the updated project file:

If you want to start a fresh project, check out Create the Hugo Project in my previous post. After creating the project add the sqlitejdbc dependency. To ensure you have all necessary dependencies run the following command:

lein deps

The deps command will download and install any dependencies that are not already in the project’s lib directory. Now that the project.clj file has been updated and the dependencies are in place, lets move on to the cmdline.clj file.

The Code

The code consists of an updated cmdline.clj file to support the database generation process and a command line option. I have also updated the hugo/parser.clj removing unnecessary functions and added a new one.  The database related code is in two files.  The first file is hugo/db/createsqlite.clj which handles the database creation and loading.  The second file, hugo/db/sqlite.clj, contains code that retrieves and inserts data. I removed the hugo/text-formatting.clj file from this branch since I am not writing out to a text file.

The cmdline.clj file

The –main function was updated to support command line options using the clojure.contrib.command-line/with-command-line macro. The macro makes it possible to map a command line option to a local variable. For this ‘app’ the only option is the –-drop option. When I run (-main “--drop” “true”) the local variable will contain the string “true”. The macro also has the –-help option built in. It prints the comment that is directly under the with-command-line line and a description of each option you’ve defined. Here’s what (-main “--help”) prints out when I run it in the REPL:

image

After the command line support is in place I need to check to see if –-drop true. The if function is used to see if I need to drop the nominees table using the hugo.db.createsqlite/drop-table function first.

Next, I grab the links from the Hugo Awards History page. Since I’m only interested in the winners from 2000 on I only pass the first 12 links to the create-and-load-db function. That is the function that kicks off the database creation and loading process begins.

The create-and-load-db function calls a wrapper function called get-data. I created the function so I can make use of the map function to retrieve the nominees data. The hugo.parser/parse-best-novel-nominees is called to parse the data from the web pages. I will get into the details parse-best-novel-nominees a little later. After all the data has been parsed I create the database by calling hugo.db.createsqlite/create-db function. The create-db function creates the nominees table in the hugo database. The last line of the create-and-load-db function is where the data is inserted into the newly created nominees table by calling the hugo.db.createsqlite/process-awards function.

That’s it for the cmdline.clj file. This file exists simply to allow me to create the database from the command line. Now lets take a look at the database code by walking you through the hugo.db.createsqlite.clj file.

The hugo.db.createsqlite.clj file

The first three lines are the typical namespace and dependency declarations. I have included dependencies on my hugo.db.sqlite.clj file and the clojure.contrib.sql library giving it a short name of sql. The short name allows me to call functions in the library by prefacing them with sql/. The sql library allows me to use JDBC to access the database.

hugo.db.sqlite/db contains the information needed by JDBC to locate and connect to the database. In order to create the database I needed to add the :create flag to the connection. I was able to do this by using the merge function with the db variable and :create flag allowed me to create the new-db-conn. Now I have a connection that allows me to create the database.

Once I had the new-db-conn connection it was time to get down to business. The ‘gateway’ function into the database creation process is the create-db function.

There really isn’t much to this function but there are a few things I would like to point out. First the with-connection function will ‘wrap’ the code that makes up the body of the call which in our case the line:

(create-tables)

Wrap? What I mean by wrap is any database related code within the ‘with-connection’ body will use the database connection created by the with-connection call. When the body has finished the connection is closed.

The create-db function contains the call to the create-tables call which is where my nominees table is created. The first parameter of the create-table function is the name of the table to create. The vectors that follow the table name are the definition for each column. Each vector has the name of the column followed by its data type. Any special description of the column like primary keys, unique, etc.. is listed after the data type.

Once the database table has been created it's time to parse and load the data into the nominees table. The last call in cmdline/create-and-load-db is to the process-awards function. The process-awards function takes the parsed award data and feeds each nominee to the add-new-nominees function using the map function. The add-new-nominees function doesn’t have much meat to it. I'm finding that with Clojure you can get a lot done with very little code. First the function grabs the year from the category struct which is the first parameter to the add-nominee function. Next, the map function is used to insert a record for each entry in the category struct’s book sequence. Each item is the second parameter for the add-nominee function. When the map call completes there will be a record for each of the year's nominees in the nominees table. When the process-awards function finishes I will have a nominees table loaded with all nominees/winners since 2000.

Running the HugoDB code

To create the database you can either run the code from within the REPL or the command line. Here's how to run it in the REPL. First run the following command from the project’s home directory:

lein repl

The first time you run the app you'll just need to call the -main function like so:

(-main)

After you run it one time you'll need to run it with the --drop true to drop the nominees table before you start creating the database. Here's how to call the main with --drop true

(-main "--drop" "true")

If you choose to run it from the command line run it like this the first time:

lein run

After running the app once you'll need to run it like this:

lein run --drop true

Now that we have a database of the nominees its time to do some querying to ensure that we have loaded the data correctly. There are two ways for us to accomplish this. First I will use the sqlite3 command line tools to run SQL against the database. After that I’ll use Clojure in the REPL to show show a few select functions. Now it’s time to fire up sqlite.

The Sqlite Shell

Jump back to the command prompt and cd into the project’s home page. From there change into the db directory and get a listing of its contents. The directory should have a file named hugo.sqlite3.  That is the database file the app just created. To get to a querying interface run the command (assuming you have added sqlite3 to your PATH):

sqlite3 hugo.sqlite3

To view the tables in our database enter .tables from the sqlite prompt and you should see our single table nominees listed.& Lets make sure that the 2004 records were loaded correctly.

image

The columns we care about here are id, year, title, author and whether or not the book was the winner plus three other columns that are there for my next post. The first record has 1 in the winner column which indicates it was the winner. All of the nominees for the year were also properly saved.

The hugo.db.sqlite.clj file

In addition to the sqlite shell I wrote a few functions that will retrieve the nominees from the database. The first one I will call use is get-nominees. It does what you might expect returns all of the records in the nominees table.

The get-nominees function introduces function overloading in Clojure. If get-nominees is called without parameters it will call get-sql passing in the sql statement defined in the get-all-nominees var which returns all of the records. However, if a year is passed in the function will return all winners/nominees for the given year. Before calling the get-sql function I add a predicate to base sql statement with a place holder. The new string is the first item in the vector. The next item is the value that will replace the placeholder when get-sql calls the with-query-results function. Really not much to get-nominees, most of the work is done in the get-sql function.

The functions that all the ‘get’ type functions are based on the function sqlite.clj/get-sql . It wraps the call to clojure.contrib.sql/with-query-results function. The first parameter is a sequence that will contain the results of the query. The next parameter is the sql statement and parameters to be ran. The doall statement forces the lazy sequence that contains the results into a ‘real’ sequence that is returned to the caller.

Summary

Creating a sqlite database and running queries against it with Clojure is straight forward. Adding support for command line options is trivial. In my next 'Hugo' post I am going create a UI that will allow me to view the data. Stay tuned!

As part of my Clojure learning process I appreciate any and all comments on my code. Following my last post I had great comments that helped improve my code and expand my Clojure knowledge. Please keep the comments coming.

Resources

clojure, clojure-contrib, enlive, leiningen, sqlite, my previous post.

Code

The code for this project can be found on the hugoDB branch of the Hugo project.  You can download the entire hugoDB branch of the project here.  The code and database files I discussed in the post can be viewed here:  cmdline.clj, createsqlite.clj, sqlite.clj and the database.

Tuesday, May 10, 2011

Parsing Web Pages with Clojure and enlive

As my infatuation with Clojure grows I thought I would write some code to retrieve all of the works that have either won or been nominated for the Hugo award's Best Novel category. I know it’s geeky but it is information that I can use so why not use it as a source to learn more Clojure?

Please keep in mind that I am writing this blog from my perspective as a Clojure noob. Any and all feedback on the post or the code is welcome.  Even if you think it is minor, please pass it along. With that said, lets get on with post!

Goal

By the end of the post is to have code that will retrieve the winners and nominees for the Best Novel category since 2000 and write them to a text file with the following layout:

Year Hugo Awards - Best Novel
         Title – Author (Winner) 
         Title – Author 
         Title – Author ...

The Setup

UPDATE As @Bendlas mentioned below leiningen will install the clojure jars. You can skip the first paragraph and go right to installing leiningen. I have tested it on a Ubuntu VM and when I ran lein repl leiningen downloaded the clojure jars. On Windows you will need to have curl.exe or wget.exe installed to get it to work. Thanks again @Bendlas.

If you don’t already have Clojure installed you can get everything you need from the download page. While you are there go ahead and grab the clojure-contrib.zip as well. Assuming you already have Java on your machine the next step is to add the Clojure and clojure-contrib directories to your CLASSPATH.

Once you have Clojure installed the next thing to install is lein. It is a Clojure build tool that helps you manage your projects. In my short period of time in the Clojure world lein has been a great tool and I have found it has many useful plugins. The install takes no time at all, just follow the instructions on the project’s page and you will be ready for business. Now that Clojure and lein are installed I'm ready to start the ‘Hugo’ project.

Creating the Hugo Project

To create the project using run lein with the parameters below:

lein new hugo

The command will create a directory structure for our project. For a little more information on the project directory structure lein creates take a look at my previous post Getting Started with Ring and Compojure - Clojure Web Programming.  I have a little more detail there.

I updated the project.clj file to include dependencies on enlive and clojure-contrib. A line was added to indicate which namespace my main function is located in. The new line allows me to run this 'app' from the command line.

The project setup is complete, now its time to start parsing!

The Code

The code for this project is in three source files under the project’s src directory.  The cmdline.clj file houses the app’s main function which allows the app to be started from the command line. The hugo/parser.clj file contains the code that retrieves and parses the web pages.  The last file is hugo/text-formatting.clj which contains the code to format the output.  In this post I will walk through the cmdline.clj and hugo/parser.clj files.

The cmdline.clj file

As you can see, there isn’t much to this file.  The file exists to create a class that allows me to run hugo from the command line and provides a concise way to run the code within the REPL. The first five lines set up the namespace, include my code’s namespaces and loads the clojure-contrib.duck-streams library which I will use to write the results out to the output file. 

Since I want to run this application from the command line I need to generate a java class file.  To do this I use the :gen-class macro.  The generated cmdline.class file will be placed in the project’s classes directory. Any methods in the java class will look in the source clj file for a method by the same name but preceded by a –.  That is why the –main function exists.  It is what is called when I execute the app outside of the REPL.

The –main method calls the hugo.parser/get-award-links function to retrieve the links to each year’s awards page from the Hugo awards history page.  The links are returned as a sequence, since I only want the entries for 2000 to 2010 the code grabs the first 12 links, which are passed to the prep-for-file function. 

The prep-for-file function is where the real parsing is kicked off, I will discuss the parsing in more detail later. For now just know that the data retrieved from the URL is formatted by the hugo.text-formatting/format-output function and the map function. The results are converted to a string by using the apply and str functions.

When the parsing and formatting is complete the results are passed to the clojure.contrib.duck-streams/spit function. The spit function, I really like that name, writes the results to a file named hugo_awards_best_novel.txt. That's it. I've given you the 5 second tour of the cmdline.clj file, now its time to take a look at the HTML parsing.

The hugo/parser.clj file

As you might expect this is where all the parsing code lives. The first function called is the get-award-links. The function is responsible for parsing out all of the links to the annual awards pages.

The get-award-links Function

The first task this function does is to retrieve the page’s HTML tags using the fetch-url function which wraps enlive’s html-resource function. The html-resource function retrieves a web page and returns its HTML tags in a sequence that is passed to other enlive functions as input.

Once the page has been parsed, I call enlive’s select function passing the tag sequence as the first parameter. Select's second parameter tells the function which tags I want out of the tags sequence using something similar to CSS selectors. In this case I’m telling select to grab all a tags inside of LI tags that are members of the page_item class and are within DIV tag that with the id of content. The second vector tells select that I want the text for each link so I can use it for the year value later. After the parsing of the tags is done the map function will grab the attrs for each tag that is returned which returns the following for each link:

image

When this function call is completed a map is returned with a title and href for each year that the Hugo awards were given.  The results of this call are passed to the prep-for-file function in the cmdline.clj file. 

The get-awards-per-year Function

Now that I have the links to all the award pages it is time to gather all the data on each category. The function creates a sequence of category structs that contain the award category, the nominees/winners in the category and the year the award was given.

The year’s page is retrieved using the fetch-url function and the results are stored in the page-content variable. Next, the parse-award-page function is called passing in the page-content as its only parameter. It returns a sequence that contains lists for each award given that year that will look like this: ((“Best Novel”) (array maps for each nominee/winner)). I will refer to this sequence as the category sequence from here out. Right now the parse-award-page function looks like a black box I promise to get into the details in a bit.

The results of the parse-award-page call are passed map to create a sequence that contains category structs.  The category struct is defined as:

Getting the award string

In the map function call I am using an anonymous function to create a category struct. When the map call completes it returns a sequence of category structs for the given year. Creating a new struct is easy, just pass in the name of the struct to create and a value for each of the keys in the struct. The category struct's first key is the :award key. The value is parsed with this code: (apply str (first %)). Since the first item in the category sequence is a string in a lazy sequence representing the award's title I need to use apply str instead of just str. If I called (str (first %)) what I would get back is something like this: clojure.lang.LazySeq@5784711f which is obviously not what we want. 

Getting the books sequence

Grabbing the books that represent the nominees/winners is almost as easy as the award. Since I know that the nominee/winners are stored in the second part of the category sequence lists I use the second function to retrieve them.

(get-book-info (rest (second %)))

I’m using rest here because for some reason the first entry in the sequence is “\n” I’m not sure why. In the future I will figure it out but for now I’m using the rest call to get to the ‘guts’ of the book sequence. The results of the rest call are passed to a helper function that returns a sequence of work structs that will be stored in the category struct’s books key.

Getting the year string

The last key in the category struct is the year key. It will store a string that begins with the year and ends with “Hugo Awards”.  The code to retrieve the ‘year’ makes use of the select statement, grabs the first element in the returned sequence and converts the value of :content to a string. Here's what the code looks like:

(apply str (:content (first (html/select page-content #{[:div#content :h2]}))))

Now I have a value for each of the category struct’s keys. The struct provides a much easier way to work with the data. At this point all of the parsing has been completed. All that is left is for the cmdline/prep-for-file function to format the data and write it out to the file. Since that is pretty straight forward I'm going to leave that code out of the post. Before I wrap up this post I’d like to dive into the hugo.parser/parse-award-page function, where the real parsing happens.

The parse-award-page Function

Once the year’s award page has been retrieved, its tag sequence function is passed to parse-award-page. The function grabs the category title and the nominees/winners and creates a sequence of lists. Here’s how it is done. All of the nominees/winners are found in the map function call. The sequence returned from the call to select returns all UL tags found in content DIV tag. Each tag is passed to the anonymous function which just pulls out the :content key from the tag’s array map creating a sequence of book titles.

The category titles are parsed on the line that has the split-at function call. Again, the select function is called to find all P tags that are within the content DIV. The text for the the first child of the P tag is returned creating a sequence of category titles. The split-at function is called to ‘remove’ the first four P tag results since the contain information on where the awards banquet was held.

After both the titles and then nominees/winners sequences are created the interleave function is called. Interleave creates a single sequence by combining the two sequences one item at a time. How the function works is the first item in the titles sequence is added to the new sequence followed by the first item in the nominees sequence, the second from titles is followed by the second nominees item, etc. When interleave returns I have one sequence that looks something like ( “award title” “nominees” “award title” “nominees”….).

Having the sequence provided by interleave is nice but it isn’t going to work for what I want. I need to pair the category title with the nominees/winners for the category. This is where the partition function comes in. According to the partition documentation the function will “create a lazy sequence of n items” which in our cause is 2. When the parse-award-page function completes it returns a sequence of lists that match the category up with it’s nominees/winner which is exactly what I need in the get-awards-per-year function.

How do I run it from the command line?

If you are like me most of the clojure you write is either run through the REPL or as a web app. I had no idea how to run this ‘app’ from the command line. After checking out the leiningen project again I noticed that there is a command called uberjar. What uberjar does is create a jar file that bundles everything up that is needed to run your app from the command line. The jar file uses the naming convention of:

<project name>-<version info>-standalone.jar

Remember I’m a .NET guy by day so I don’t have a real in-depth knowledge of jar files yet. I just know that they allow my to run the app from the command line. Once the jar file has been created I can run the app from the command line using this command:

java -jar hugo-0.0.3-SNAPSHOT-standalone.jar 

Summary

Parsing HTML using clojure is relatively easy using enlive. Using enlive I was able to parse the Hugo Awards information to create a text file with all of the Best Novel category nominees and winners ( hugo_awards_best_novels.txt ) since 2000.

One More Thing…

When you run the project you may encounter an IOException like this:

image

You can resolve the issue by visiting the URL through a web browser. I believe I can get around this issue by setting the user-agent for my enlive html-resource call but I couldn’t figure out how to do it. If anyone has a suggestion please leave me a comment.

Resources

clojure, clojure-contrib, enlive, lein

Code

Download entire project. Code files: cmdline.clj, parser.clj and text_formatting.clj

The output file: hugo_awards_best_novels.txt

Monday, March 28, 2011

Getting Started with Ring and Compojure - Clojure Web Programming

I am currently testing out different technologies to run a small web service in a windows environment at my day job. I have worked with Node.js on Linux but have not had much luck with it on windows. At the same time I am also learning Clojure so I did some checking to see what type of web development options are available in Clojure and I found Ring and Compojure. In this post I will follow the same process I did in my Getting Started with Node.js post, that is to start off with a bare bones ‘Hello World’ sample with Ring and then follow that up with a sample using Compojure to receive parameters by GET and POST.

The Setup

If you don’t already have Clojure installed you can get everything you need from the download page. While you are there go ahead and grab the clojure-contrib zip as well. Assuming you already have Java on your machine the next step is to add the Clojure and Clojure-contrib directories to your CLASSPATH.

Once you have Clojure installed the next thing to install is lein. It is a Clojure build tool that helps you manage your projects. In my short period of time in the Clojure world lein has been a great tool and I have found it has many useful plugins. The install takes no time at all, just follow the instructions on the project’s page and you will be ready for business. Now that I have Clojure and lein installed I'm ready to start the ‘Hello Clojure Web’ project.

Creating the Hello Clojure Web App

Step number one is to create the project using lein by running the command:

lein new hello-clojure-web

When lein has finished there will be a new directory named hello-clojure-web. When you cd into the directory and do dir or ls you should see a directory structure similar to this:

image

Lein creates a directory structure for our project separating the tests from the source code, a .gitignore file and a project.clj file. The project.clj file it handles project dependencies and sets up variables that are used to describe the project. We will just scratch the surface of the project file in this post.

The project.clj File

In the project.clj file the first line simply defines the name and version of the project. The dependencies section is used to list what libraries the project depends on. For our app we need to add the dependencies to the ring-core and ring-jetty-adapter libraries. To get the exact information I needed to add in the dependencies section I turned to the clojars.org site. The clojars site is a place where you can find open source libraries. It provides you with the exact syntax required to add the library to your project. To see what a clojars page looks like visit the ring/ring-core page. The line below adds the two ring dependencies I need for my project. Dependencies are added by using the libraries name followed by the version you wish to use.

[ring/ring-core "0.3.7"][ring/ring-jetty-adapter "0.3.7"]

After we've added to the Ring dependency our project.clj file should look like:

After adding my dependencies to the project.clj file I need to 'install' them. Lein will do it for me when I run this command:

lein deps

Lein will retrieve the necessary files and place them in the primageoject's lib directory. If all goes well you should see something like the image to the right. If you see an error message text similar to this towards the bottom of the message:

1 required artifact is missing.

for artifact:
org.apache.maven:super-pom:jar:2.0

It usually means that you have misspelled the name of a dependency in the project.clj file. To see it for yourself remove a letter from the dependency we just added, save the file and then re-run the lein deps command.

src/hello-clojure-web/core.clj

When I created the project lein created a core.clj file in the src/hello-clojure-web directory. The only line inside of the file is the project's namespace declaration. My first step was to add a reference to the ring.adapter.jetty library which I will us to run our HTTP service. Next, I added the handler function which handles the web requests. The last line of the file starts the web service passing the handler function to the run-jetty function. Here is the completed hello-clojure-web/core.clj file.

The handler function takes one parameter which is the web request and returns the response. It hanldes all web requests that come on on port 8080 returning the same response for a request at ‘/’ and ‘/this/is/a/long/one’.

In order to test our project out we will use the REPL. Starting the REPL session with lein makes it easier to run our code within the REPL session. To start a REPL session I need to jump back to the command prompt in the project's home directory and run:

lein repl

From within the REPL session enter the following line:

(use 'hello-clojure-web.core)

This will starts up my service which is now listening for requests on port 8080. Next, I fired up Chrome and entered http://localhost:8080. If everything works correctly I should see Hello Clojure Web! If you do not see the text then an error has occurred. Usually the REPL gives an error message that will point you in the right direction.

I want to change the text to reflect the fact I'm using Ring. I'm going to keep the REPL running and go back to the source file. I changed the Hello Clojure Web string to read Hello Ring!, saved it and refreshed the browser but I didn't see the changes. When I stop and restart the REPL. Re-enter the use statement and reload the web page I will see my changes.

In order for me to see my changes without restarting the REPL I need to add a reference to the ring-devel library. Since I do not want this library to be apart of my ‘production’ version I will only add the dependency to my dev environment by making use of the dev dependency tag. My updated project.clj file now looks like this.

After the changes make sure to run lein deps again. You will see in the output of that command it places files in the hello-clojure-web/libs/dev directory in addition to the hello-clojure-web/libs directory.

Now that I have the dependencies taken care of it is time to update the code in the core.clj file. I've wrapped the run-jetty call in a function called boot so I can make use of the wrap-reload function. What this call does is reload the namespace of our application before each request is handled. So now we can make changes to our code and refresh the browser to see them. Here's what the updated code looks like:

imageAs you can see there wasn't much change in the code needed to be able to handle our updates. To see it in action, start up REPL and view the app in the browser. Change the response text and then reload the web page. You should see our updated message. Next I'll describe how to parse parameters from a GET and POST requests.

Retrieving Parameters

In this section I'm going to introduce Compojure, a small, open source web framework. To keep things simple I am going to add the Compojure code to our already existing core.clj file. Before I make changes to core.clj I need to update the project.clj file with the Compojure dependencies.

The Updated project.clj file

I added the compojure lib in the 'normal' dependencies section and in the dev-dependencies I added the reference to the lein-ring plugin. The ring plugin allows me to start ring by running the command: lein ring server. For the ring server command to work I need to tell ring where the handler method is which is done on the last line of the file. Ring knows that app will handle the web requests. I ran lein deps again to update the project's dependencies to include lein-ring.

Now that the project.clj file is updated I need to adjust the ns statement in the core.clj file to include the Compojure libraries:

(ns hello-clojure-web.core 
(:use compojure.core)
(:require [compojure.route :as route]
[compojure.handler :as handler])
(:use ring.middleware.reload)
(:use ring.adapter.jetty))

Nothing special going on here, just a few more libraries to include. Next I setup my routes. Using the defroutes macro I create the routes I want my app to respond to. Compojure will take the route definitions and generate a Ring handler function. The routes are processed from the top down, the first match wins. If a request comes in that doesn’t match the three routes I’ve defined the route/not-found will be called.

My last step is to bind app to the handler/site my-routes function. This binding is used to start up the application. After I have updated the file I can start the app by running:

lein ring server

The command starts up the web app and brings up a browser opening it to http://localhost:3000/. To test my app I wrote a batch script that utilizes curl. It tries my three routes plus a non-existing route to ensure they are handled properly, and it worked!

image

The complete core.clj file:

Summary

The Clojure web world offers frameworks at different levels. You can stay at a relatively low level buy using Ring or if you want something at a little higher level you can use Compojure. I am a Clojure noob and I was able to get a web app up and running in no time. I will be expanding on my Compojure knowledge through a project I am working on with my son. We had originally planned on using Node.js but now that I found Compojure we have decided to go with a Clojure based application. As the project progresses I am sure I will have more Clojure web related posts.

Resources

Clojure, Lein, Ring, Compojure

Code

project.clj core.clj test_clojure_web.bat