evaluating the sitestory transactional web archive with the apachebench tool

32
Evaluating the SiteStory Transactional Web Archive With the ApacheBench Tool Justin F. Brunelle Michael L. Nelson Lyudmila Balakireva Robert Sanderson Herbert Van de Sompel

Upload: michael-nelson

Post on 12-Dec-2014

3.647 views

Category:

Technology


1 download

DESCRIPTION

Evaluating the SiteStory Transactional Web Archive with the ApacheBench Tool Justin F. Brunelle Michael L. Nelson Lyudmila Balakireva Robert Sanderson Herbert Van de Sompel TPDL 2013, September 24, 2013

TRANSCRIPT

Page 1: Evaluating the SiteStory Transactional Web Archive with the ApacheBench Tool

Evaluating the SiteStory Transactional Web Archive With the ApacheBench Tool

Justin F. BrunelleMichael L. Nelson

Lyudmila BalakirevaRobert Sanderson

Herbert Van de SompelTPDL 2013, Sept 24 2013

Page 2: Evaluating the SiteStory Transactional Web Archive with the ApacheBench Tool
Page 3: Evaluating the SiteStory Transactional Web Archive with the ApacheBench Tool

September 7, 2011

Page 4: Evaluating the SiteStory Transactional Web Archive with the ApacheBench Tool

September 12, 2011

Page 5: Evaluating the SiteStory Transactional Web Archive with the ApacheBench Tool

September 16, 2011

Page 6: Evaluating the SiteStory Transactional Web Archive with the ApacheBench Tool

Problem

• People view ABC News all the time• No mementos for “all the time”

– Stories missing or incomplete• Possible solutions:

– archive.org: crawl more often (how often is often enough?)

– abcnews.com: install a Transactional Web Archive

Page 7: Evaluating the SiteStory Transactional Web Archive with the ApacheBench Tool

Agenda

7

Page 8: Evaluating the SiteStory Transactional Web Archive with the ApacheBench Tool

Traditional Web Archiving

• Active crawling• Heritrix

Page 9: Evaluating the SiteStory Transactional Web Archive with the ApacheBench Tool

Issues with Traditional Web Archiving

• Request can be rejected (robots.txt, user-agent, IP)

• Can be deceived (geo-location, user-agent)

• Can be trapped (crawl my calendar!)• Resource-intense (bandwidth)• Recrawl vs. change-rate

Page 10: Evaluating the SiteStory Transactional Web Archive with the ApacheBench Tool

Missed Updates

seen by humans: C1, C3, C4; archived by crawler: C1, C3

Page 11: Evaluating the SiteStory Transactional Web Archive with the ApacheBench Tool

Agenda

11

Page 12: Evaluating the SiteStory Transactional Web Archive with the ApacheBench Tool

for each HTTP response,the Apache web serversends (i.e., HTTP PUT) the same entity to SiteStory web server

Page 13: Evaluating the SiteStory Transactional Web Archive with the ApacheBench Tool

Now we have them all

seen by humans: C1, C3, C4; archived by transactional archive: C1, C3, C4

Page 14: Evaluating the SiteStory Transactional Web Archive with the ApacheBench Tool

Agenda

14

Page 15: Evaluating the SiteStory Transactional Web Archive with the ApacheBench Tool

Benchmark with ab

• ApacheBench: ab– -n [Number of Connections]– -c [Concurrency]

• Benchmarked with SiteStory on & off

Page 16: Evaluating the SiteStory Transactional Web Archive with the ApacheBench Tool

Benchmark with wget

ws-dl-03.cs.odu.edu

x99

,…,,

megalodon.lanl.gov

TWA@AWS

Page 17: Evaluating the SiteStory Transactional Web Archive with the ApacheBench Tool

Agenda

17

Page 18: Evaluating the SiteStory Transactional Web Archive with the ApacheBench Tool

Testing LAN with ab

Page 19: Evaluating the SiteStory Transactional Web Archive with the ApacheBench Tool

Testing LAN with ab

Page 20: Evaluating the SiteStory Transactional Web Archive with the ApacheBench Tool

Benchmark with wget (unburdened)

Page 21: Evaluating the SiteStory Transactional Web Archive with the ApacheBench Tool

Benchmark with wget (unburdened)

Page 22: Evaluating the SiteStory Transactional Web Archive with the ApacheBench Tool

Benchmark with wget (burdened)

Page 23: Evaluating the SiteStory Transactional Web Archive with the ApacheBench Tool

Benchmark with wget (burdened)

Page 24: Evaluating the SiteStory Transactional Web Archive with the ApacheBench Tool

Results

• Negligible difference SiteStory On vs Off• Limited to local LAN• Performance over WAN?

Page 25: Evaluating the SiteStory Transactional Web Archive with the ApacheBench Tool

WAN Testbed Performance

Page 26: Evaluating the SiteStory Transactional Web Archive with the ApacheBench Tool

Agenda

26

Page 27: Evaluating the SiteStory Transactional Web Archive with the ApacheBench Tool

Results

• Distributed: Higher variance• Increased delay due to network• On vs. Off Comparison still comparable

Page 28: Evaluating the SiteStory Transactional Web Archive with the ApacheBench Tool

Conclusions

• Small performance difference

• No gaps in coverage -- archives every HTTP response sent (optimizations possible)

http://mementoweb.github.io/SiteStory/

Page 29: Evaluating the SiteStory Transactional Web Archive with the ApacheBench Tool

get started now by using this piece

Page 30: Evaluating the SiteStory Transactional Web Archive with the ApacheBench Tool

SiteStory Testbed

• Use our SiteStory web archive on your server!

1. Install and configure mod_sitestory on your Apache Server

2. Send an email containing:

1. Your contact info

2. Web server IP address

3. Web server domain name

3. Happy Sitestory’ing!

• mailto: [email protected]

Page 31: Evaluating the SiteStory Transactional Web Archive with the ApacheBench Tool

Backups

Page 32: Evaluating the SiteStory Transactional Web Archive with the ApacheBench Tool

Sample ab output$ ab -n 10 -c 2 "http://www.cs.odu.edu/"This is ApacheBench, Version 2.3 <$Revision: 655654 $>…

Server Software: Apache/2.2.17Server Hostname: www.cs.odu.eduServer Port: 80

Document Path: /Document Length: 62289 bytes

Concurrency Level: 2Time taken for tests: 0.213 secondsComplete requests: 10Failed requests: 0Write errors: 0Total transferred: 624810 bytesHTML transferred: 622890 bytesRequests per second: 47.01 [#/sec] (mean)Time per request: 42.540 [ms] (mean)Time per request: 21.270 [ms] (mean, across all concurrent requests)Transfer rate: 2868.66 [Kbytes/sec] received

…Connection Times (ms) min mean[+/-sd] median maxConnect: 0 1 0.0 1 1Processing: 27 41 10.8 45 62Waiting: 3 3 0.4 4 4Total: 27 41 10.8 45 63

Percentage of the requests served within a certain time (ms) 50% 45 66% 46 75% 46 80% 46 90% 63 95% 63 98% 63 99% 63 100% 63 (longest request)