Monday 11 May 2009

I like Apache HTTP Client over java.net.HttpURLConnection

My last project involved CAS and a lot of integration points, so we ended up with lots of HTTP traffic as part of our integration tests.

We started having problems with our build, spurious exceptions would break it, but we couldn't reproduce the problem. It was not until we did some rudimentary load testing that the problem surfaced: Persistent HTTP connections.

Problem was, we had written our tests using java.net.HttpURLConnection like this:

try {
HttpURLConnection connection = (HttpURLConnection) new URL("http://localhost:8080").openConnection();
connection.connect();
assertEquals(HttpURLConnection.HTTP_OK, connection.getResponseCode());
} catch (...){
...
}
finally {
connection.disconnect();
connection.close();
}


Turns out, because we didn't care about the response body and thus never bothered reading it, and that would tie up TCP connections. The more tests we ran, the more likely it became that we would see an exception like this:

Exception in thread "main" java.net.BindException: Address already in use: connect

So while we thought we were being diligent by writing a finally block to disconnect and close the connection, it turns out that is completely superfluous. What we would have to write was this:

try {
HttpURLConnection connection = (HttpURLConnection) new URL("http://localhost:8080").openConnection();
connection.connect();
assertEquals(HttpURLConnection.HTTP_OK, connection.getResponseCode());
} catch (...){
...
}
finally {
while (connection.getInputStream().read() != -1) ;

connection.disconnect();
connection.close();
}


(Actually, that just works for happy HTTP responses, as this library will throw an exception on 4xx and 5xx's!? But lets not digress...)

Now that's a lot of code for something so simple - look at Apache HTTP Client for the same task:

HttpClient httpClient = new HttpClient();
GetMethod method = new GetMethod("http://localhost:8080");
assertEquals(HttpStatus.SC_OK, httpClient.executeMethod(method));


Notice, no cleanup code! And even better, comparing peak use of TCP connections as a function of number of connections opened (measured with netstat -a on my Windows box):



And look at the number of exceptions we get on a sample run - HTTP Client cleans up the input buffer automatically up and gives zero exceptions:



So while the API library can be made to work, I don't think I will be using it in future. On top being verbose, the way it handles "unhappy" response codes and the fact that it will insert a Content-Type behind your back makes it a poor choice.

No comments: