web prefetching between low-bandwidth clients and proxies : potential and performance li fan, pei...
TRANSCRIPT
Web Prefetching Between Low-Bandwidth Clients and Proxies : Potential and Performan
ce
Li Fan, Pei Cao and Wei LinQuinn Jacobson
(University of Wisconsin-Madsion)SIGMETRICS 1999
20003611 황호영 , CNR Lab.
Web prefetching 중에서도 Low-Bandwidth Modem Client에 대한 논문으로 , Client 와 Proxy 사이의 prefetching 에 대해서 언급하고 있습니다 .
Originality 가 떨어지는 논문으로 크게 추천하지는 않습니다 .
Web prefetching 중에서도 Low-Bandwidth Modem Client에 대한 논문으로 , Client 와 Proxy 사이의 prefetching 에 대해서 언급하고 있습니다 .
Originality 가 떨어지는 논문으로 크게 추천하지는 않습니다 .
2Communication Networks Research Lab.
1. Introduction 2. Proxy-Initiated Prefetching 3. Traces and Simulator 4. Reducing Client Latency 5. Prediction Algorithm 6. Performance 7. Implementation Experience 8. Conclusion and Critique
Content
3Communication Networks Research Lab.
One approach to reduce latency prefetching between caching proxies and browsers.
The majority of the Internet population access the WWW via dial-up modem connections.
The low modem bandwidth is a primary contributor to client latency.
Investigate one technique to reduce latency for modem users.
1. Introduction
4Communication Networks Research Lab.
Proxy-Initiated PrefetchingThe proxy can often predict what objects a user might access nex
t.
The modem link to the user often has idle periods as the user is reading the current Web document.
If the objects are cached at the proxy, the proxy can utilize the idle periods to push them to user, or to have the browser pull them.
Since the proxy only initiates prefetches for objects in its cache, there is no extra Internet traffic.
2. Proxy-Initiated Prefetching (1/3)
5Communication Networks Research Lab.
AssumptionsUsers have idle times between requests, because users often read
some parts of one document before jumping to the next one.The proxy can predict which Web pages a user will access in the
near future based on reference patterns observed from many users
The proxy has a cache that hold recently accessed Web pages.
Proxy maintain a history structureEvery time the proxy services a request, it updates the history
structure, establishing the connection between past accesses made by the same user and the current request.
In browser cache, assume LRU(Least-Recently-Used) algorithm
2. Proxy-Initiated Prefetching (2/3)
6Communication Networks Research Lab.
Performance MetricsRequest Savings : the number of times that a user request hits in
the browser cache or the requested object is being prefetched, in percentages of the total number of user requests.
Prefetched Cached Partially Prefetched
Latency Reduction : the reduction in client latency, in percentages
Wasted Bandwidth : the sum of bytes that are prefetched but are not read by the client
2. Proxy-Initiated Prefetching (3/3)
7Communication Networks Research Lab.
TracesWe use the HTTP traces gathered from the University of
California at Berkeley home dial-up populations from November 14~19, 1996.
SimulatorThe simulator uses timing information in the traces to estimate
latency seen by each modem client.
The simulator assumes that each modem link has a bandwidth of 21kb/s.
The simulator assumes the existence of a proxy between the modem clients and the Internet. (16GB : proxy cache size)
3. Traces and the Simulator
8Communication Networks Research Lab.
Increase the size of browser cache Use delta compression to transfer modified Web
pages between the proxy and clients Apply application-level compression to HTML
pages
4. Reducing Client Latency (1/2)
9Communication Networks Research Lab.
Cumulative distribution of user idle time in the UCB tracesAbout 40% of the requests are preceded by 2 to 128 seconds of i
dle time, indicating plenty of prefetching opportunities.
4. Reducing Client Latency (2/2)
10Communication Networks Research Lab.
The realistic prediction algorithm is based on the Prediction-by-Partial-Matching (PPM)
PPM Predictorsm : prefix depth (# of past accesses that are used to predict futur
e ones)
l : search depth (# of steps that the algorithm tries to predict into the future)
t : threshold (only candidates whose probability of access is higher than t, 0 t 1, is considered for prefetching)
Past m references are matched against the collection of trees to produce the set of URLs for the next l steps. Only URLs whose frequencies of accesses are larger than t are included.
5. Prediction Algorithms (1/4)
11Communication Networks Research Lab.
PPM PredictorsFinally, the URLs are sorted first by giving preferences to longer
prefixes, and then by giving preferences to URLs with higher probability within the same prefix.
Previous proposed prefetching algorithmsPapadumanta and Mogul -> m always equal to 1
Krishna and Vitter -> l always equal to 1
m>1 : more contexts might improve the accuracy of the prediction
l>1 : URL is not always requested as the immediate next request after another URL, but rather than within the next few requests.
Best performing : m=2, l=4
5. Prediction Algorithms (2/4)
12Communication Networks Research Lab.
History StructureThe history structure is a forest of trees of a fixed depth K,
where K=m+lThe history encodes all sequences of accesses up to a maximum
length K.The history structure is updated every time a user makes a
request. being updated for user sequence A…B…C (K=3)
5. Prediction Algorithms (3/4)
13Communication Networks Research Lab.
Every time the modem link to a user is idle, the proxy calls the predictor for the list of candidate URLs.
The proxy then initiates prefetching of the objects in the order specified in the list.
When the user makes a new request, the ongoing prefetching is stopped, and a new round of prediction and prefetching starts again next time.
The size of history structure can be controlled using LRU algorithm.
5. Prediction Algorithms (4/4)
14Communication Networks Research Lab.
Performance of Proxy-Initiated Prefetching
6. Performance (1/6)
15Communication Networks Research Lab.
Assumptionprefetch threshold : 50KB, 8 objects
browser cache : 16MB(extended), LRU replacement algorithm
Performance of Proxy-Initiated Prefetchingdecreasing the threshold t increases the wasted bandwidth and h
elps to generate enough candidates. for l>1, t=0.25 is the best choice.
increasing the search depth l increases both the latency reduction and the wasted bandwidth.
l=4 appears the best choice, as larger l makes little difference
increasing the prefix depth m increases both the latency reduction and the wasted bandwidth.
6. Performance (2/6)
16Communication Networks Research Lab.
The accuracy of the prediction algorithm
6. Performance (3/6)
17Communication Networks Research Lab.
The accuracy of the prediction algorithmattempted : the total number of candidates suggested by the predi
ctor
prefetched : the actual number of objects that are prefetched
used : the number of objects that are prefetched and actually accessed by the user
The ratio between used and prefetched is the accuracy of the prediction algorithm
accuracy range : 40% (2,4,0.125) ~ 73% (1,1,0.5)
low threshold configurations appear to sacrifice accuracy for more prefetches.
6. Performance (4/6)
18Communication Networks Research Lab.
Recommendations for the configuration of PPM If the highest latency reduction is the goal and some amount of w
asted bandwidth can be tolerated
(2,4,0.125) is the best choice.
If both high latency reduction and low wasted bandwidth are desired
(2,4,0.5) is the best choice.
If limits on storage requirements make smaller m and l desirable,
(2,1,0.25) and (1,1,0.125) are good choices.
6. Performance (5/6)
19Communication Networks Research Lab.
Effects of Implementation VariationsNo proxy notification upon browser cache hits : no-notice
Prefetching without knowledge of the content of browser caches : oblivious
Limiting the size of history structure
6. Performance (6/6)
20Communication Networks Research Lab.
Proxy-initiated prefetchingwe have implemented proxy-initiated prefetching in the CERN h
ttpd proxy software.
CERN httpd uses a process-based structure and forks a new process each time a new request arrives.
A separate predictor process communicates with other processes via UDP messages.
The predictor runs in an infinite loop, waiting to receive updates and queries messages.
The process checks a shared global array of flags to see whether the modem link is idle.
If it is, starts pushing the URL objects on the existing connection.
7. Implementation Experience (1/2)
21Communication Networks Research Lab.
At the client side, instead of modifying browsers, we set up a copy of the CERN htt
pd proxy.
The browser requests are first sent to the local proxy.
The local proxy manages its own cache, issues requests to the main proxy, and receives pushed objects.
Measurementemulate modem connections on the LAN, and generate workload
s that reflect typical browser behavior and Internet latencies.
We have instrumented the Linux kernel to simulate modem connections on our Ethernet LAN
7. Implementation Experience (2/2)
22Communication Networks Research Lab.
ConclusionWe have investigated the potential and performance of one techn
ique, prefetching between the low-bandwidth clients and caching proxies, and found that combined with delta compression
It can reduce user-visible latency by over 23%
Prediction algorithm based on the PPM compressor perform well.
The technique is easy to implement and can have a considerable effect on user’s Web surfing experience.
8. Conclusion and Critique (1/2)
23Communication Networks Research Lab.
WeaknessWe assumed fixed user request arrival times in simulation
Our calculation of client latency is merely an estimate based on the time stamps recorded in the traces and the modem bandwidth
We also does not model the proxy in detail
We have not investigated the implementation of delta compression.
Didn’t consider CPU overhead and delay
PPM algorithm is a little different from previous proposed prefetching algorithms.
8. Conclusion and Critique (2/2)