client-side reconstruction of composite mementos using serviceworker
TRANSCRIPT
Client-side Reconstruction of Composite Mementos
Using ServiceWorker
Sawood Alam, Mat Kelly, Michele C. Weigle, and Michael L. NelsonWeb Science and Digital Libraries Research Group
Old Dominion University, Norfolk, VA, 23529
@ibnesayeed@WebSciDL
Supported in part by NSF III 15267001
JCDL 2017, June 19-23, 2017, Toronto, Ontario, Canada
Sawood Alam <@ibnesayeed>
2008 Memento Seen in 2017
2
● https://ws-dl.blogspot.com/2015/12/2015-12-08-evaluating-temporal.html
?
Sawood Alam <@ibnesayeed>
2008 Memento Seen in 2012
3
● http://ws-dl.blogspot.com/2012/10/2012-10-10-zombies-in-archives.html
Sawood Alam <@ibnesayeed>
XenLand @ Alpha Centauri
4
Sawood Alam <@ibnesayeed>
Zombies in Archive
5
?
Sawood Alam <@ibnesayeed>
Zombies in Archive
6
<img src="http://xenland.alpha/images/map.png">// Is rewritten on replay to become:<img src="http://archive.example.org/1998/http://xenland.alpha/images/map.png">
// URLs constructed by JavaScript are harder to rewrite on replay, e.g.:var base = 'http://xenland.alpha';var imgdir = '/images/';var img = document.createElement('img');img.src = base + imgdir + 'ruler.png';document.getElementById('ruler').appendChild(img);//=>> http://xenland.alpha/images/ruler.png
Sawood Alam <@ibnesayeed>
Replay URL Resolution & Rewriting
7
Reference type Example Resolution after relocation
Relative path images/logo.png Potentially correct
Absolute path /public/images/logo.png Potentially incorrect
Absolute URL http://example.com/public/images/logo.png Potentially live leakage
http://example.com/public/index.html
...<img src="/public/images/logo.png">...
http://archive.example.org/<datetime>/http://example.com/public/index.html
...<img src="/<datetime>/http://example.com/public/images/logo.png">...
Sawood Alam <@ibnesayeed>
Avoiding Zombies
● Ahead-of-time rendering and JS execution○ http://archive.is/
● Archival replay proxy○ https://github.com/ikreymer/pywb/wiki/Pywb-Proxy-Mode-Usage
● Browser extension○ MementoFox (deprecated)
● JS override○ wombat.js in PyWB
● ServiceWorker
8
Sawood Alam <@ibnesayeed>
● New web API (still a working draft)● A standalone JavaScript file● Persists in the browser independent of the window● Acts as a proxy● Installed by a web page under its domain at a specific path (called scope)● Intercepts all requests in scope
○ Resources under the scope path (at any depth)○ Secondary resource requests originated from any resource under scope
● Allows modification in request and response● Primarily used in web applications for offline access and notification support● Requires HTTPS● Growing browser support (73.61% as of June 8, 2017)
ServiceWorker
9● http://caniuse.com/#feat=serviceworkers
Sawood Alam <@ibnesayeed>
reconstructive.js
10● https://github.com/oduwsdl/reconstructive
● A ServiceWorker script written for archival replay● Plug-in for web archives or Memento aggregators● Intercepts all network requests originated from a memento● Reroutes requests to an archive (prevents live leakage & incorrect references)● Optionally rewrites the content to add banner & to fix hyperlinks
Sawood Alam <@ibnesayeed>
Zombies, No More!
11● https://github.com/oduwsdl/ipwb
Sawood Alam <@ibnesayeed>
Rewriting Mementos is Expensive
12
Original capture (without any rewriting)
In our experiment over 500 home pages we observed:
● One-fifth mean data overhead● One-third mean time overhead
15% more data in twice the time
Sawood Alam <@ibnesayeed>
Archival Capture Replay Test Suite (ACRTS)
13
reconstructive.js
● https://ibnesayeed.github.io/acrts/
Sawood Alam <@ibnesayeed>
Reconstruction Winners: PyWB & reconstructive.js
A. OpenWaybackB. PyWBC. Memento
ReconstructD. Memento for
ChromeE. reconstructive.js
14
Sawood Alam <@ibnesayeed>
Future Work
● Use “Prefer” header for original content (when archives support it)● Add a customizable archival banner● Add click handler for lazy rewriting of hyperlinks● Handle archived ServiceWorkers● Write a 404-combat ServiceWorker script for webmasters
15
● http://ws-dl.blogspot.co.uk/2016/08/2016-08-15-mementos-in-raw-take-two.html
Sawood Alam <@ibnesayeed>
● reconstructive.js => no zombies!● Rerouting instead of rewriting (lazy rewriting)● Mean overhead reduction
○ one-fifth data○ one-third time
● 73.61% (and growing) browser support for ServiceWorker○ http://caniuse.com/#feat=serviceworkers
● reconstructive.js○ https://github.com/oduwsdl/reconstructive
● Archival Capture Replay Test Suite○ https://ibnesayeed.github.io/acrts/
Conclusions
16
● In-depth recap: WADL 2017 Thursday, June 22, 3:45pm (https://fox.cs.vt.edu/wadl2017.html)