JunLi Yuan, Institute for Infocomm Research [HREF1], School of Computing (NUS) [HREF2], Singapore 119613. junli@i2r.a-star.edu.sg
Xiang Li, School of Information Security, Shanghai JiaoTong University [HREF3], Shanghai China 200030. xiangli@sjtu.edu.cn
Chi-Hung Chi, School of Software, Tsinghua University [HREF4],
Beijing China 100084. chich@comp.nus.edu.sg
(This work was supported in part by 2004CB719400)
Compression is an important technique used to improve web retrieval latency. Many studies report the benefit of compression to be high. However, those studies mainly focus on object retrieval latency. In web access, page retrieval latency is more meaningful because the basic unit of web browsing is page instead of object. Thus it is important to study the impact of compression on page retrieval latency. In this paper, we investigate the factors that affect page latency and systematically study the impact of two different compression mechanisms on them. Our results show that pre-compression outperforms real-time compression in all situations, and the improvement on page latency achieved by compression is much lower than it does on object latency due to some factors specially found in page retrieval such as the dependency among objects in a page and the parallelism used in web retrieval.
Web retrieval latency is always one of the major concerns to web users and content providers. Compression is an important technique used to reduce user-perceived latency in web access, especially for users using narrow band connections such as wireless and mobile systems. The term "compression" in the context of web content delivery refers to applying lossless compression algorithm on textual web objects. The support for such compression has been included in web protocols [1], [2], and most current web browsers can decompress compressed data in real-time without interaction of users [3].
In web systems, the task of compression is typically performed by web servers or reverse proxy servers. Examples of such servers include Microsoft IIS 5.0 [4], Apache [5] and products from NetScaler [6] and Packeteer [7] etc. There are two typical ways of doing compression in web retrieval: pre-compression and real-time compression. Pre-compression compresses objects in advance and uses the compressed copies to serve requests, while real-time compression compresses data on the fly during the actual transmission of the data.
Many studies have reported that compression has good potential in reducing web retrieval latency and network traffic [3], [7], [8], [9], [10], [11]. However, most of those studies are object level based, which is insufficient in revealing the impact of compression on page retrieval latency. In web access, the basic unit of web browsing is page instead of object. So, page retrieval latency is more meaningful to web users than individual object retrieval latency. When whole page latency is concerned, there are still some open issues worth of studying.
Typically, a web page is made up of multiple objects. Among the objects, there is one primary object called Container Object (CO), which contains the definitions of other objects (called Embedded Objects (EO) ) of the page. The CO is usually in the form of an HTML file while EOs are mostly images, audio and video files etc. Since images, audio and video files are usually pre-compressed and thus they are difficult to be compressed further, so the CO is often the only object in a page that is suitable for compression. Since CO only occupies part of the total page size, how effective would it be to just apply compression on the CO?
Figure 1 shows the distribution of pages with respect to the ratio of CO size vs. total page size (the results presented here are obtained from the same traces as described in Section 4.1). From the graph, we see that for more than 60% of web pages, COs occupy less than 50% of total page size. On average, CO occupies only about 44% of total page size. Therefore, no matter how compressible COs are, the data left for transfer would still be more than half of total page size. Considering that there are other latency components such as connection setup time which can not be improved by compression, we would expect that the performance improvement that compression can bring to page latency would be much smaller than 50%. So, although COs might be compressed up to many times smaller, the improvement on page latency would not be that significant.
Figure 1. Distribution of pages w.r.t. the ratio of "CO size vs. whole
page size"
Furthermore, when talking about whole page latency, two critical factors must be taken into consideration: (1) Dependency between CO and EOs of a page, and (2) Parallelism width for simultaneous object fetching. The retrieval processes of EOs highly depend on the retrieval process of CO because EOs are defined in CO. When compression is applied to CO and thus affects CO's latency, EOs' retrieval latency would be affected as well and this would in turn affect whole page latency. Parallel fetching of objects in a page is commonly used in most current web browsers. This further complicates the relationship between object latency and whole page latency as the overlapping of object latency makes it possible for the fetching of one or more objects to virtually have no effect on the whole page latency. Taking this into consideration, it would be doubtful if compression is still effective when parallel fetching of objects is employed.
Finally, for the two different compression mechanisms in web retrieval, most existing studies only concentrate on the complexity of file management, the real-time feature, and the coverage of static and dynamic objects of them. But whether they have different impact on whole page retrieval latency still remains to be seen.
The above issues are not well studied in the existing literature. In this paper, we conduct comprehensive studies on these issues. The results of this study could help people to get deeper understanding about web compression and make appropriate decision when they are going to employ compression in their systems.
The web retrieval process typically consists of a number of operations and a sequence of data chunks transfer. We propose a Web Retrieval Dependency Model (WRDM) to study web retrieval latency. The basic idea of the model is to use a directed graph to capture the detailed operations and the relationship between them. Such directed graph are referred to as Web Retrieval Dependency Graph (WRDG). In a WRDG graph, each vertex represents an operation and the arc (i.e. directed edge) connecting two vertices represents the precedence relationship between the two operations. Each arc carries a weight which represents the time spent in completing the operation represented by the target vertex. Currently, we include six operations in the model. They are:
Figure 2 (a) gives an example WRDG graph demonstrating the retrieval process of an object which contains 3 data chunks. The retrieval latency of the object is represented by the distance of the path from the vertex r to the vertex e. Note that the distance of a path is the sum of the weights of the arcs along the path, not the number of the arcs.
Figure 2. WRDG graphs for object retrieval and page retrieval
Multiple individual object WRDG graphs can be connected together to capture the retrieval process of a page. As we stated earlier, the retrieval processes of EOs depend on the retrieval process of CO. In WRDG graph, we capture this dependency between CO and EOs by using an arc to connect a data chunk vertex di of the CO to the request initiation vertex r of an EO, where the data chunk represented by di contains the definition of the EO. Furthermore, current web system uses certain parallelism width N to fetch multiple of objects of a page concurrently. We capture this requirement by limiting the width of the WRDG graph to be not larger than N.
Figure 2 (b) gives an example WRDG graph showing the retrieval process of a page with five EOs. From it, we see that objects will suffer from two more latency components when they are put together to form a page.
The first latency component is the Definition Time (DT) of EOs. Since the URLs of EOs are defined in the CO of the page, the retrieval process of an EO can not be started until the CO's data chunk containing the definition of that EO has been transferred from server to client. Since user's perceived latency is counted from the time when he/she submits the page request, the DT times of EOs should be considered as part of the EOs' total latency, and this delay in notifying the existence of EOs can significantly postpone the finish points of EOs' retrieval.
On the other hand, most common web browsers use a fixed parallelism width for the parallel retrieval of objects, e.g. the default parallelism width in Microsoft IE and Netscape is four. With this limited parallelism width, some EOs of a page may be held in waiting state due to the unavailability of parallelism. The time spent by an object in waiting state is referred to as Waiting Time (WT). Again, this latency should also be considered as part of the EOs' total latency, and it can also significantly postpone the finish points of EOs' retrieval.
Figure 2 (b) shows the DT and WT times for the object in grey color.
The two different compression mechanisms, i.e. pre-compression and real-time compression, could have different impacts on object and page retrieval latency.
Since pre-compression compresses objects in advance, the objects may be delivered in a lesser number of chunks than the original objects. But for real-time compression, it compresses each data chunk on the fly before sending it out. So, it may not reduce the number of chunks in the objects transfer. However, reducing the size of each chunk may not help much in reducing object latency according to our study. Figure 3 plots the transfer time of chunks with respect to their sizes. From this graph, we see that chunk size does not seem to have much influence on chunk latency. The latency for smaller chunks is often comparable to that of much bigger chunks. This could be due to the random nature of network and server workload. Based on this observation, we may deduce that "reducing the number of chunks" could be more effective than "reducing the size of chunks" in terms of reducing object retrieval latency.
Figure 3. Average latencies for delivering chunks with different sizes
We also note an extreme phenomenon in Figure 3, that is: the latency for very large chunks (e.g. those with size greater than 30k) is indeed much bigger than that of smaller chunks. This suggests that reducing size for such chunks may still be helpful in reducing their latency.
Since compression is mainly applied on COs and there is dependency between CO and EOs, EOs' latency will also be affect by compression, which would in turn affect whole page latency. Figure 4 demonstrates the possible impact of the two different compression mechanisms on page retrieval latency. From it, we infer that real-time compression may not be as effective as pre-compression in reducing whole page latency. In the next section, we will further examine this inference through experiments.
Figure 4. Different impact of two compression mechanisms on page retrieval latency
Our experiments rely on very detailed information about web retrieval, such as the number and times of data chunks, definition points of EOs etc. However, such information is not available in most existing traces. In order to obtain the information, we conducted real retrieval for a large number of web pages and recorded detailed operation and chunk level logs for all compression mechanisms, including the normal situation "No Compression" which is used as a reference in our study.
The web pages used in our experiments are those having their URLs recorded in the trace dated 5th March 2004 from the National Laboratory for Applied Network Research (NLANR) [12]. We replicate those pages including EOs on our web server, which is Apache version 2.0.39. For every page, we made a pre-compressed version for it using gzip. For real-time compression, we built a reverse proxy with real-time compression capability based on Squid [13] and zlib [8] to perform the task. To emulate the real web environment, we direct all the retrievals to go through a remote proxy in another country. That is, all retrievals are done through real internet connections. So, the retrieval latency we got in our experiments will well reflect the latency in real situations. We instrumented the client, proxy, and web server systems in our experiment to have them record detailed traces for our study.
First of all, we would like to study the effect of the two different compression mechanisms on object latency.
Figure 5 shows relative object latencies with respect to object size for different compression mechanisms. Here, the normal situation "No Compression" is used as the reference. From this graph, we see that both pre-compression and real-time compression have improvements on object latency and the improvement is quite big. For pre-compression, the improvement ranges from 16.4% to 88.1%, with an average of 57.2%. For real-time compression, the performance gain is from 8.1% to 51.1% and the average gain is 32.3%. The result shows that pre-compression always gives higher improvement than real-time compression does. This could be due to the reason we inferred earlier that pre-compression reduces the number of chunks of an object transfer which is more effective than reducing the size of chunks attained by real-time compression. We will further look into this reason in the later part of this section.
Figure 5. Effect of different compression mechanisms on object latency
It is a little surprising to see that real-time compression also has big improvement on object latency since it tends to reduce the size of every chunk instead of reducing the number of chunks. Further study shows that real-time compression also reduces the number of chunks in certain situation. This is due to a special phenomenon found in real-time compression. We call this special phenomenon "delay-and-merge" effect and we will discuss it further in the later part of this section.
As object size increases, the performance of pre-compression generally gets better. This could be because of these two reasons: first, the compression ratio is normally higher for bigger files; second, when object is small, other latency components (such as connection time) and the TCP slow-start effect are relatively more significant, which makes the effect of compression marginal.
The situation for real-time compression is more complicated. As object size increases, the performance of real-time compression first gets better and then lower, and for the last object size range "Other" (i.e. >128 KBytes), it gets better again. The reasons for this result are related to the "delay-and-merge" effect and we also put the explanations at the later part of this section.
Now, we investigate the reason for compression's effect on object latency at chunk-level by examining how different compression mechanisms affect chunk sizes and the number of chunks in an object transfer.
Figure 6 shows the distribution of chunk sizes under different compression mechanisms. As expected, we see that real-time compression shifts the curve to the left significantly. As many as 78% of chunks are compressed to sizes smaller than 1 KBytes by real-time compression. However, this shifting is ineffective as the latency for 1-KByte chunks is similar to or even bigger than that for bigger chunks according to Figure 3. On the other hand, we note that real-time compression shifts the 10k+ chunks to smaller chunks. The percentage of chunks belonging to 10k+ group under real-time compression is significantly lower than that of other mechanisms. As we learnt from Figure 3, the latency for very large chunks is much bigger than that of smaller chunks. So, to compress such chunks would be helpful in reducing the chunk latency.
Figure 6. Distribution of chunks w.r.t. chunk sizes sent out from server
For pre-compression, the curve is also shifted to the left a little. The reason could be that, after being pre-compressed, more objects become smaller objects and they could be delivered by a smaller number of chunks.
Figure 7 plots the number of chunks with respect to object size for different compression mechanisms. We see the number of chunks for pre-compression is smaller than normal situation and the difference between them becomes bigger as object size increases. This is actually instinctive to understand because pre-compressed objects are smaller than the original ones so they could be delivered by lesser number of chunks, and, the compression ratio is usually higher for bigger objects, so the difference between pre-compression and normal situation becomes bigger. With much smaller number of chunks to transfer, it is easy to understand why pre-compression could improve object latency so significantly (see Figure 5).
Figure 7. Number of chunks w.r.t. object size under different compression mechanisms
It is surprising to see that real-time compression also reduces the number of chunks in some situation since real-time compression is supposed to reduce the size of every chunk rather than the number of chunks. Further study revealed a special phenomenon behind.
In our experimental system, the real-time compression is performed by a reverse proxy. The reverse proxy receives data chunks from the web server next to it and compresses each chunk before sending them out. During the time when the reverse proxy is busy compressing current chunk, the rest chunks would continuously arrive. Since the reverse proxy is busy, those incoming chunks would be buffered in its buffer and merged into one. Therefore, the size of the following chunk becomes bigger. We name this phenomenon the "delay-and-merge" effect.
The "delay-and-merge" effect has a "warming-up" stage and "mature" stage. During the "warming-up" stage, chunk size would become bigger and bigger as reverse proxy takes longer and longer time to compress each bigger chunks. However, since the buffer size in reverse proxy is fixed (64 KBytes in our experimental system), this effect will "mature" when the chunk size grows close to the buffer size. In the "mature" stage, chunk size would stop growing no matter how many more chunks are still in the transfer sequence.
As chunks would grow bigger due to the "delay-and-merge" effect in real-time compression, the number of chunks for a given object would become smaller than the normal situation. This could explain the result of real-time compression in Figure 7.
For small objects, real-time compression does not seem to reduce the number of chunks. This is because the number of chunks is too small for the "delay-and-merge" to "warm-up". As object size increases, the "delay-and-merge" effect starts to "warm-up" so the number of chunks becomes smaller. However, this trend stops when object size is big enough. That is because the "delay-and-merge" effect has matured.
With this knowledge, we could give explanation on the performance of real-time compression shown in Figure 5. Since real-time compression also reduces the number of chunks due to the "delay-and-merge" effect, it is understandable that it also improves object latency. As object size grows from 1 KBytes to 4 KBytes, the performance of real-time compression gets better. This could be because that the "delay-and-merge" effect is in the "warming-up" stage. For objects with sizes between 4 KBytes to 128 KBytes, the "delay-and-merge" effect would get mature so that the performance of real-time compression stops getting better. However, for very big objects with size greater than 128 KBytes, the performance of real-time compression becomes better again. Further study reveals the following reason: for the objects in this group, their mature stage of the "delay-and-merge" effect is longer. Since the chunks in this stage is quite big, they can be effectively compressed to smaller chunks by real-time compression (see Figure 6). With reference to Figure 3, the latency for very large chunks is much bigger than that of smaller chunks. So, reducing the size of very big chunks would result in reduction in transfer time. Therefore, the performance of real-time compression for this group of objects gets better again.
As we stated earlier, page retrieval latency is more meaningful to web users than object latency. In this section, we would like to investigate the impact of compression on whole page latency.
As explained earlier, page latency is determined by more complicated factors. So, the improvement on single object latency achieved by compression may not be translated into the improvement on page latency directly.
Figure 8 plots relative page latency with respect to page sizes. Comparing it with Figure 5, we see that the improvement on whole page latency achieved by compression is significantly much lower than it does on object latency. The average performance gain on whole page latency is about 12.2% by pre-compression and 7.4% by real-time compression, as compared to the 57.2% and 32.3% gain on object latency by pre-compression and real-time compression respectively.
Figure 8. Compression's effect on whole page latency (Parallelism = 4)
Although the fact that compressible objects in a page (mainly COs) only occupy part of the total page size could be partially the reason, the big difference between compression's performance on object latency and page latency may also indicate that the two factors, i.e. (1) dependency among objects in a page and (2) parallelism width for simultaneous object fetching, play an important role in determining page latency. In the following subsections, we would study these factors in detail to get in-depth understanding about compression's effect on whole page latency.
We know that EOs are defined in CO and the retrieval processes of EOs highly depend on the retrieval process of CO. Here, we first look into the distribution of EOs in web pages.
Figure 9 shows distribution of pages with respect to the number of EOs per page. We see that while about 9% of pages do not contain any EOs, the majority of pages are made of multiple EOs. On average, a page contains about 13.5 EOs. With so many EOs contained in a page, the dependency between CO and EOs would be very important in determining the whole page latency.
Figure 9. Distribution of pages w.r.t. number of EOs per page
Figure 10 shows the definition points of EOs in terms of chunk sequence number in the CO's transfer sequence under different compression mechanisms. From this graph, we see that many EOs are defined in the chunks with large sequence number. But pre-compression shifts the curve to the left significantly. In other words, pre-compression makes more EOs known to client in earlier chunks, which would mean smaller DT times for EOs. For real-time compression, it also shifts the curve to the left, but much less significantly.
Figure 10. Average number of EOs w.r.t. chunk sequence number in CO transfer
under different compression mechanisms
Figure 11 shows the importance of DT times by plotting the relative values of DT time and actual retrieval time for EOs. We see that DT times are considerably big in all situations. In the normal "No Compression" situation, the DT times are often bigger than the actual retrieval times of EOs. This indicates that DT is a very important latency component for EOs. Thus, reducing DT could be an effective way to reduce the retrieval latency for EOs, which could in turn reduce whole page latency.
Figure 11. Relative DT times and actual retrieval latency of EOs under different
compression mechanisms
Since compression shifts the definition points of EOs to earlier chunks, the DT times under compression become smaller. From Figure 11, we see that compression indeed reduces DT times of EOs considerably. On average, the DT time can be reduced by 43.6% by pre-compression and 10.7% by real-time compression. The reduction achieved by pre-compression is higher than real-time compression. This could be due to the reasons discussed earlier on.
Based on the previous studies, we see that compression could reduce page latency from two aspects. First, compression reduces the size of the CO so that it could be delivered faster (see Figure 5). Second, compression would also reduce DT times of EOs since they are dependent on CO's retrieval (see Figure 11).
Figure 12 shows relative page latency under different compression mechanisms with respect to the number of EOs in a page. When the number of EOs in a page is small, the improvement by compression is generally higher. This is because, for pages with few EOs, the retrieval latency of the CO would be the dominating factor of the whole page latency, since compression have significant improvement on CO's retrieval latency (see Figure 5), it would also improve page latency considerably for such pages.
Figure 12. Whole page latency w.r.t. number of EOs per page under different
compression mechanisms (Parallelism = 4)
When the number of EOs in a page is big, the improvement achieved by compression becomes much smaller. This could be due to the following two reasons:
Firstly, when the number of EOs in a page is big, the page latency would be dominated by the of EOs' latency. Among EOs' latency components, compression could only reduce the DT times, and the reduction in DT time is not as big as the reduction in CO's retrieval latency (see Figure 11 and Figure 5).
Secondly, page latency is also affected by parallelism. When a limited parallelism width is used, some EOs may be held in waiting state due to the unavailability of parallelism. With this constrain, reducing DT times to smaller values would only results in more EOs to be held in waiting state. In other words, when DT times are reduced, the parallelism width will become the performance bottleneck which prevents further improvement on page retrieval latency.
When the number of objects made known for retrieval exceeds the default parallelism width of web browser, some of the objects will be held in waiting state. Figure 13 shows the percentage of EOs being held in waiting state under different compression mechanisms. The default parallelism width defined in most current web browsers is 4. So, there is no object being held in waiting state when the number of EOs in a page is less than 4. Beyond this point, objects being held in waiting state starts to occur, and the percentage increases significantly with the rise in the number of EOs in a page. We see that pre-compression always yields higher percentage than other mechanisms. This is because pre-compression makes more EOs to be known for retrieval faster and earlier than other two mechanisms (refer to Figure 10 and Figure 11). This leads to more objects being held in waiting state since the default parallelism width is fixed.
Figure 13 Percentage of EOs in waiting state under different compression mechanisms
The limited parallelism width adds WT times to object latency. Figure 14 plots the relative values of WT against the actual retrieval latency of EOs under different compression mechanisms. From it, we see that WT times increase very quickly as the number of EOs increases. For pages with 16 EOs and above, the WT times are even bigger than the actual object retrieval latency. With this significant contribution of WT time to object latency, we expect that increasing parallelism width would effectively reduce object latency and page latency, especially when compression is used.
Figure 14 Relative values of WT and actual retrieval latency of EOs under different
compression mechanisms
Figure 15 shows the relative performance improvement on page latency from different compression mechanisms under different parallelism width. From it, we see that increasing parallelism width indeed improves page latency considerably, and compression constantly gives better performance than "No compression" in all situations, which indicates that variation in parallelism width does not affect the relative effectiveness of compression.
Figure 15. Relative performance of different compression mechanisms under different
parallelism width
For different compression mechanisms, we note that the relative improvement of pre-compression is slightly higher as parallelism width increases. When parallelism width increases from 1 to 32 or greater, the relative improvement of pre-compression increases from 9% to 15%. This may indicate that compression is more efficient when parallelism width is big. The reason for this could be due to the higher demand and usage rate that pre-compression imposes on parallelism. Refer back to Figure 10, we see that pre-compression shifts significantly large number of definitions of EOs to the earlier chunks of a CO transfer, so more EOs will be made known to client faster and earlier (also refer to Figure 11). Thus the demand on parallelism width is higher. Increasing parallelism width in this situation would right meet the need, so pre-compression becomes more effective.
This paper presents our detailed chunk-level study on web compression. We see that compression is also effective in reducing page retrieval latency, although the improvement on page retrieval latency is lower than the improvement it achieves on object retrieval latency. Compression reduces page latency from two aspects, i.e. reducing the size of the CO of the page and reducing the DT times of EOs in the page. However, after the DT times of EOs are reduced, the limited parallelism width in web retrieval will become a performance bottleneck which prevents further improvement on page retrieval latency. Our study also shows that reducing the number of chunks is more effective in improving web retrieval latency than reducing the size of every chunk. While pre-compression tends to always reduce the number of chunks in web retrieval, our study reveals that real-time compression can also reduce the number of chunks in some situation due to the special "delay-and-merge" effect. But in general, pre-compression outperforms real-time compression in almost all situations. Therefore, pre-compression can be a better mechanism for reducing user-perceived latency in web retrieval.
[1] T. Berners Lee, R. Fielding, H. Frystyk, Hyper-text Transfer Protocol -- HTTP/1.0, IETF RFC1945, May 1996.
[2] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L., Leach, P., Berners-Lee, T., "Hyper-text Transfer Protocol -- HTTP/1.1," IETF RFC2616, June 1999.
[3] HTTP Compression Speeds up the Web, http://www.webreference.com/internet/software/servers/http/compression/
[4] Using HTTP Compression On Your IIS 5.0 Web Site: http://www.microsoft.com/technet/treeview/default.asp?url=/TechNet/prodtechnol/iis/maintain/featusability/httpcomp.asp
[5] Apache Gzip Module from Mozilla, http://www.mozilla.org/projects/apache/gzip/
[6] NetScaler, Inc., http://www. netscaler.com/
[7] Packeteer's PacketShaper Xpress, http://www.packeteer.com/prod-sol/products/xpress.cfm
[8] zlib home page: http://www.gzip.org/zlib/
[9] Henrik Frystyk Nielsen, James Gettys, Anselm Baird-Smith, Eric Prud'hommeaux, Hakon Wium Lie, and Chris Lilley. Network Perform-ance Effects of HTTP/1.1, CSS1, and PNG. In-Proc. SIGCOMM'97. Cannes, France, Septem-ber, 1997.
[10] Jeffrey Mogul, Fred Douglis, Anja Feldmann, and Balachander Krishnamurthy Potential Bene-fits of Delta-encoding and Data Compression for HTTP In Proceedings of ACM SIGCOMM, pages 181--194, September 1997. An extended and corrected version appears as Research Re-port 97/4a, Digital Equipment Corporation West-ern Research Laboratory, December, 1997.
[11] The Effect of HTML Compression on a LAN and a PPP Modem Line, http://www.w3.org/Protocols/HTTP/Performance/Compression/LAN.html , http://www.w3.org/Protocols/HTTP/Performance/Compression/PPP.html
[12] IRCACHE Proxy Traces, http://ircache.nlanr.net.
[13] Squid Web Proxy Cache, http://www.squid-cache.org/