Monday 11 May 2009

I like Apache HTTP Client over java.net.HttpURLConnection

My last project involved CAS and a lot of integration points, so we ended up with lots of HTTP traffic as part of our integration tests.

We started having problems with our build, spurious exceptions would break it, but we couldn't reproduce the problem. It was not until we did some rudimentary load testing that the problem surfaced: Persistent HTTP connections.

Problem was, we had written our tests using java.net.HttpURLConnection like this:

try {
HttpURLConnection connection = (HttpURLConnection) new URL("http://localhost:8080").openConnection();
connection.connect();
assertEquals(HttpURLConnection.HTTP_OK, connection.getResponseCode());
} catch (...){
...
}
finally {
connection.disconnect();
connection.close();
}


Turns out, because we didn't care about the response body and thus never bothered reading it, and that would tie up TCP connections. The more tests we ran, the more likely it became that we would see an exception like this:

Exception in thread "main" java.net.BindException: Address already in use: connect

So while we thought we were being diligent by writing a finally block to disconnect and close the connection, it turns out that is completely superfluous. What we would have to write was this:

try {
HttpURLConnection connection = (HttpURLConnection) new URL("http://localhost:8080").openConnection();
connection.connect();
assertEquals(HttpURLConnection.HTTP_OK, connection.getResponseCode());
} catch (...){
...
}
finally {
while (connection.getInputStream().read() != -1) ;

connection.disconnect();
connection.close();
}


(Actually, that just works for happy HTTP responses, as this library will throw an exception on 4xx and 5xx's!? But lets not digress...)

Now that's a lot of code for something so simple - look at Apache HTTP Client for the same task:

HttpClient httpClient = new HttpClient();
GetMethod method = new GetMethod("http://localhost:8080");
assertEquals(HttpStatus.SC_OK, httpClient.executeMethod(method));


Notice, no cleanup code! And even better, comparing peak use of TCP connections as a function of number of connections opened (measured with netstat -a on my Windows box):



And look at the number of exceptions we get on a sample run - HTTP Client cleans up the input buffer automatically up and gives zero exceptions:



So while the API library can be made to work, I don't think I will be using it in future. On top being verbose, the way it handles "unhappy" response codes and the fact that it will insert a Content-Type behind your back makes it a poor choice.

Tuesday 5 May 2009

Waldo: Excellent!

Being currently beached and briefly free of family life, I had some time to catch up on my reading. This one really filled a gap in my knowledge of distributed computing and its history.

I was in a conversation once that only now makes proper sense. The phrase "Waldo taught is that" was used in a discussion about REST, which puzzled me - I hadn't heard of Waldo except for the fellow in the striped sweater. Later I was reading some notes from QCon via Steve Vinoski's blog, and again the mysterious Wally was mentioned. And finally a quick googling turns up a reference from Uncle Bob and his friends on their list of reading recommendations. Now I can't ignore Wally any longer.

Waldo et al. described the impedance mismatch of language abstractions and distributed computing in this 1994 thesis. And I am regretting it took 15 years before I read it, as it makes some very straight forward arguments - things I wish I had been more aware of in past. In fact, thinking back, in 2002 I took a course which featured CORBA, and I was very impressed with it at the time.

Qouting from the abstract: "We argue that objects that interact in a distributed system need to be dealt with in ways that are intrinsically different from objects that interact in a single address space. These differences are required because distributed systems require that the programmer be aware of latency, have a different model of memory access, and take into account issues of concurrency and partial failure".

So if you are doing any web/ SOA/ remoting at all, you really should go find Wally and get some extra context.