Global Web Site Performance Improvement

(published in Sys Admin 2006)

For more than a decade, programmers have been refining the art of web development while systems administrators have been honing configurations, adding hardware and fattening network pipes. Upgrades at peering points and transport backbones have dramatically improved network performance. Over the last five years, most companies and homeowners have improved their access speed. Despite all these enhancements, many users still register the same complaint. “Your site is too slow!”

One source of frustration shared by every administrator is varied perception. One user may deem performance slow while another finds it fantastic. To further complicate the situation, both users may be right. As we’ll see later, your website may be slow and fast. Since few customers contact your company with thank you notes after a wonderful internet experience, your inboxes are more likely to fill with complaints rather than compliments. Those gripes may incorrectly skew the perception of your website’s performance as they darken your boss’s view of your administration skills. A comprehensive picture is necessary to clarify the situation and provide benchmarks against which we can measure future improvements.

In order to gauge the validity of customer complaints and judge the extent to which users experience latency, it is necessary to measure your website’s performance. To ensure an accurate portrait, multiple benchmarking agents should be positioned in a manner that reflects your customer base. If you have customers scattered throughout the globe and a single perl script takes measurements, then your data hardly reflects a random customer’s experience. If your company has numerous access points you may be able to leverage its existing infrastructure in order to build a comprehensive model. For administrators lacking multiple access points, there are other options. You can rent server accounts and collect your own data from various access points or you can hire a web monitoring service that provides data points that reflect your user base. Either way, it’s important to get a comprehensive picture of your site’s performance.

Many administrators are lulled to complacency by either incomplete or inaccurate measurements. Your website may be very responsive inside corporate headquarters, but how does it play in Peoria? One of the more telling views of your data is performance by geographic area. [See Figure 1] The most striking thing about this view is the variance from city to city. Now we can see different perceptions for the same website. For a customer in San Jose, performance is exceptional. For the unlucky Frankfurt resident, performance is “too slow.” How can two people have such very different experiences on the same site? Certainly they may have different hardware, but this data was collected with identical configurations. The answer is distance. The customer in San Jose sits less than thirty miles from the website while the one in Frankfurt is half a world away. Despite our best efforts, geographic discrepancies continue to add latency. Thomas Friedman’s world may be flat, but its continents are separated by thousands of miles. As a result, “your site is still slow.”

Before we continue, it’s important to note what we mean by “latency.” For our purposes, it is the lapse between an HTTP request and content download. It is commonly defined it as the lag between request and response, but that doesn’t serve us well for a number reasons. For one thing, humans generally don’t read HTTP response headers; they read web content. A website customer continues to wait after the headers come over the wire. As far as they’re concerned, the page didn’t load until it rendered inside their browser. Our goal is to improve the experience of customers, not Internet crawlers. The definition was chosen for another reason. It matches our measurement tool. The mean times on Figure 1 represent the time it took to download the web page and all it’s elements.

Latency is added as a result of performance degradation in each of three major segments of an HTTP transaction. They are listed in order from the web server to the customer:

· First mile – page generation to Internet access point.

· Middle mile – the Internet backbone.

· Last mile – from Internet backbone through the user’s ISP.

To understand how latency is accumulated in each of these areas, it’s important to examine them in greater detail.

Most developers and administrators concentrate their efforts on the first mile. It is, after all, the area in which they maintain the greatest control. Any reasonably competent team with a respectable budget can build a site that can render a page in under a second. Given slightly deeper pockets, our reasonably competent team should be able to launch that site inside a quality data center with fast internet access. While there is always room to improve first mile performance, a time will come where you can no longer bleed the stone. The return on investment is not worth the cost.

The middle mile spans the internet “backbone” which is an archaic term for a series of very fast networks linked globally from city to city by a series of high-speed lines. Internet service providers connect to the “backbone” through a series of TCP/IP routers. Throughout the middle mile, latency is added by several factors. The main contributing factor is distance. The further a packet must travel, the longer it takes to complete its round trip time (RTT). Lengthy round trips usually occur through a series of routers. The store and forward nature of routers contributes latency. After the entire packet is read into memory, the device must parse its header then determine where next to send it. Another problem is TCP/IP itself. Each packet must be verified. If delivery verification fails, then the packet is resent after a mandatory timeout. If packet loss is especially high, the middle mile will be characterized by high latency.

The third major area in which we accrue latency is the final mile, from the ISP’s peering point to the end user’s computer. For a website whose clientele consists of thousands or millions of unique visitors per month, it isn’t practical or cost-effective to upgrade each user’s connection speed. While there are actions we can take to improve final mile performance, most latency accrued there is beyond our capacity to improve.

Performance Degradation

Now that we understand where latency is amassed, let’s revisit the data displayed in Figure 1. The website whose performance was measured resides in the San Francisco Bay area. The page we monitored was relatively light; it included just fifteen elements. Given what we know about performance degradation, it is not shocking that customers in California experience the best performance while those on other continents experience the worst. Our unfortunate customers in China were forced to wait six times longer for the same content. According to Jupiter Research, if a page requires more than eight seconds to load, customers will leave the site. As our Asian customers click their way through meatier portions of the site, they will have little trouble reaching that threshold. Since the business sponsor was committed to delivering content to a global audience, it was necessary to get international performance in line with North America.

While global performance improvement was deemed a priority, most latency was added in areas outside of our control. If we were granted the resources and opportunity to stock the middle mile with unlimited bandwidth and the most robust routers available, latency would still be a problem. As the crow flies, the distance between Frankfurt and San Francisco is 5693 miles. The speed of light is capped at 186,000 miles per second. In perfect conditions, a signal could follow our crow’s flight pattern in 32 milliseconds. The round trip is complete in 64 milliseconds. Under such conditions, the multithreaded agent in Frankfurt should complete its task in 1.15 seconds (101 TCP packets handled by four threads). In reality, the task took 5-1/4 seconds. Where did we amass the extra time?

As detailed earlier, packets traverse the middle mile through a series of cables and routers. Each router adds latency. In this scenario, each TCP packet must traverse 12 routers – 6 per leg. If each router adds 2 milliseconds latency, then another 2.2 seconds are added to our total. The final discrepancy is explained with packet loss. TCP is designed to avoid packet loss. The sender must receive acknowledgement for every packet sent. If a sender does not receive verification before a set timeout (200 ms in most cases), then it resends the packet and every one after it. If twenty packets were sent and packet thirteen is lost, then packets 13 through 20 are retransmitted.

Solutions

Products and services that correct Internet latency tend to address it in one of the three areas we described, the first, middle or final mile. While there are products that address final mile solutions, most are beyond our control to implement. We can’t ask customers to upgrade their Internet connection speeds nor increase the sizes of their local cache. For the most part, we will focus on first and middle mile solutions. In some cases, first mile solutions will improve final mile performance.

First Mile Solutions

The products in this category all have one thing in common. They are implemented in the datacenter. The idea is to improve overall performance by greatly improving the first mile. The faster you can get it out the door and the lighter you make it, the quicker you can deliver it.

The competitors in this area provide similar offerings based on similar technologies. There is really only so much you can do to bleed a stone. Appliance devices by F5, NetScaler, Red Line Networks and FineGround provide inline compression and TCP optimization. Some offer HTTP protocol optimization and caching.

According to Gartner Research, you can expect a 2 to 1 performance enhancement provided by inline compression and TCP optimization. With a product such one from FineGround with its added HTTP optimization, you can expect a 3 to 6 times performance improvement. On the surface, those numbers sound impressive but most administrators won’t experience this type of performance enhancement. Why? If you care enough to read articles such as this one, then you are probably already doing many of the same things the appliances are doing.

Good web systems administrators are already using mod_gzip to compress everything that is compressible. They already use mod_expires or mod_headers to set explicit cache directives on heavily requested elements. Their content is quick to generate and light over the wire. If you’ve already implemented many of the features offered by optimization appliances, then your performance enhancement will fall short of the promises. This is not to say these solutions are without merit. With compression offline, you could free CPU cycles and extend the life of your servers. But at the same time, you could add at least one high-end UNIX server for the price of these devices.

Middle Mile Solutions

As mentioned above, latency accrues with distance under perfect conditions. Middle mile solutions that address this matter take one of two forms. One is to move content closer to customers. Another is to correct inherent problems associated with TCP/IP and Internet traffic routing. In this section, we’ll consider several options that allow us to improve performance on the middle mile.

Distance delays can be corrected by simply distributing content closer to users. If our Frankfurt customers pull content from London rather than San Francisco, then 10,598 miles are shaved off their round trip. The theoretical download time is reduced from 1.15 seconds to seven tenths of a second.

Caching reverse proxies can be employed to move content closer to customers. To implement such a solution, you will need to position servers regionally. Geographic DNS will allow you to map host names by region. If a user in Melbourne requests your server in San Francisco, geographic DNS could send them instead to a caching server on the Pacific Rim. The proxy serves static content from its cache and dynamic content directly from San Francisco. The middle mile is dramatically reduced along with its latency. Unfortunately this solution is not inexpensive. If your company has a global infrastructure, you could leverage it to position redundant inexpensive caching servers on all major continents. Geographic DNS servers are not inexpensive. Indeed, their price tag may kill the project before it gets started. Fortunately, there is an open source player in this field, PowerDNS. While it lacks it lacks some features offered by its commercial counterparts, it may be more than adequate for your needs. You may also consider outsourcing CNAMES to a Geographic DNS provider.

Rather than implement a geographic caching system or distributing load over regional web servers, you could acquire the services of a specialist. Two of the biggest players in this field are Netli and Akamai. Both rely on geographic caching to move content closer to end users, but Netli offers a proprietary protocol that reduces some of the inherent weaknesses of TCP/IP. Akamai offers its own networking refinements. It employs routing optimization to improve its network performance. In Figure 2, we see the same web server from Figure 1 but this time its content is delivered through Netli’s web accelerator. The curve appears similar, but now we the average download time from Beijing dropped from over six seconds to little more than two.

Final Mile Solutions

For all practical purposes, final mile solutions will be implemented in the first mile. Customer performance can be improved in the final mile by compressing data in the first. We can reduce the number of server requests with explicit caching directives on the server. The lighter our pages, the faster they’ll move through a customer’s ISP. While you may have implemented many of these solutions, there may still be room for improvement. The savvy administrator will recognize the point of diminished return.

Conclusion

The first step toward global performance improvement is comprehensive monitoring. As indicated earlier, it is important that monitoring agents are dispersed so that they mirror your customer base. Each refinement should be evaluated by its effect on performance data. A comprehensive picture that reflects your customer base will help you make the business case for additional hardware or services. A competent bean counter will never cut a check based on an administrator’s hunches. In the real world, you need data.

With adequate monitoring in place, you are ready to make refinements to the area in which you have the most control. First mile tuning can go a long way toward improving final mile performance. Make sure your systems have adequate resources and they are tuned specifically for delivering web content. O’Reilly’s Web Performance Tuning offers a great place to start. Compress all the content you can. The smaller it is, the faster it travels over the wire. You can decrease load and improve performance by setting explicit caching directives. (See Sys Admin March 2005 • Volume 14 • Number 3) It requires less time to pull an element from a local cache then it does to pull it off a server one half a world away. As you hone your first mile configuration, check it against your monitoring data. If performance measurements meet expectations, then you may not need further refinement. Unfortunately, many administrators provide content to a global audience. Unless your content is very lean, chances are you will need to boost performance for customers half a world away.

In most cases, distance is a primary culprit in the conspiracy to slow down the web. Performance could be greatly enhanced if we simply moved customers closer to the web site. In the real world, they rarely agree to such things. It’s better to move content closer to them. Perhaps you could serve European content in Europe and Asian content in Asia. Such a move creates logistical problems; it requires multiple content replications and it necessitates redundant hardware and data center expenses. If your primary site was expensive, how much will it cost to bring up another two? For this reason, it is often better to rely on geographically distributed caching proxies or accelerator services such as Netli or Akamai.

The world is getting flatter, but it doesn’t have to be slow. The tips in this article will help you deliver content quickly to all your customers, not just the ones next to your data center.

Joe Dog Software