JoeDog | Open source software

Siege, A History

I was using Lincoln Stein’s Perl script, torture.pl, to stress-test a J2EE architecture. We were experiencing initial stability issues, and the code would break under load. I ran the Perl script while monitoring the J2EE server, which allowed us to identify the point of failure: an open database connection. We patched the code, and life was grand. With the fix in place, I stressed the server again with torture. It performed well under pressure, and most importantly, the code never broke, the server never failed.

But a funny thing happened on the way to user acceptance testing. The code broke again. How could that be? I had stressed the code with infinitely more simulated users than there were acceptance testers. Why did the code break then? Did I miss something? In an attempt to solve the problem, I was back in the lab with torture.pl, and stressing the Java code.

But in the lab, the server wouldn’t break. I steadily increased the number of simulated users in a futile attempt to break the server. The only thing I broke was the Linux workstation running torture.pl, not the HP-UX server hosting the J2EE content. Perl’s overhead was exhausting the Linux workstation’s resources. I didn’t have access to the computing resources necessary to run a high number of simulated users in Perl. Therefore, I needed something leaner.

I was working on a C project that made HTTP requests. Over the course of a weekend, I modified the code to execute multiple simultaneous HTTP requests. I changed the output to match Lincoln Stein’s reporting to make it familiar to my team. Back in the lab on Monday morning, I was able to hit the server with three times as many simulated users. But nothing happened. The code still refused to break.

My new code proved one thing. I was pursuing the problem from the wrong angle. The number of simulated users exceeded that of acceptance testers by many orders of magnitude. Yet, the acceptance testers were able to break the server, whereas the simulated users were not. Torture was pounding the server one URL at a time, the acceptance testers were roaming through the content at will. That discrepancy, I concluded, explained how the code passed one test and failed in the other. I had to recreate the acceptance testing in a lab environment to pinpoint the cause of failure.

I cobbled a Perl script that harvested server links to a file. I rewrote the C code to read the URLs into memory and request them at random. Success! I was able to reproduce the failure on the server.

Continued testing led me to conclude that our code was not failing; the issue was on the server itself. Using the evidence we collected, we demonstrated a problem to the vendor, which ultimately led to a vendor-issued patch. With that in place, the site was stable.

I added a cool name, GNU Autotools support, and published Siege under the GPL because I thought it would be helpful to others.

Joe Dog Software

Siege, A History