internet infrastructures for big data (verisign's distinguished speaker series)
DESCRIPTION
Internet Infrastructures for Big Data Talk given at Verisign's Distinguished Speaker Series, 2014 Prof. Philippe Cudre-Mauroux eXascale Infolab http://exascale.info/TRANSCRIPT
Internet Infrastructures for Big Data
Philippe Cudré-Mauroux
eXascale Infolab, University of FribourgSwitzerland
VeriSign EMEAJune 26, 2014
1
eXascale Infolab
• New lab @ U. of Fribourg, Switzerland• Financed by Swiss Federal State / companies / private
foundations • Big (non-relational) data management
(Volume, Velocity, Variety) (… mostly)
2
On the Menu Today
• Big Data!– Big Data Buzz– 3 Big Data projects w/ XI & Verisign
3
Exascale Data Deluge
• Science– Biology– Astronomy– Remote Sensing
• Web companies– Ebay– Yahoo
• Financial services,
retail companies
governments, etc.
© Wired 2009
➡ New data formats➡ New machines➡ Peta & exa-scale datasets➡ Obsolescence of
traditional information infrastructures
4
Big Data “Central Theorem”
Data+Technology Actionable Insight $$
Reporting, Monitoring, Root Cause Analysis, (User) Modelization, Prediction
5
6
Big Data Buzz
Between now and 2015, the firm expects big data to create some
4.4 million IT jobs globally; of those, 1.9 million will be in the
U.S. Applying an economic multiplier to that estimate, Gartner
expects each new big-data-related IT job to create work for three
more people outside the tech industry, for a total of almost 6
million more U.S. jobs.
Growth in the Asia Pacific Big Data market is
expected to accelerate rapidly in two to three years
time, from a mere US$258.5 million last year to in
excess of $1.76 billion in 2016, with highest
growth in the storage segment.
7
Big Data Everywhere!
• The Age of Big Data (NYTimes Feb. 11, 2012)http://www.nytimes.com/2012/02/12/sunday-review/big-datas-impact-in-the-world.html
“Welcome to the Age of Big Data. The new megarich of Silicon Valley, first at Google and now Facebook, are masters at harnessing the data of the Web — online searches, posts and messages — with Internet advertising. At the World Economic Forum last month in Davos, Switzerland, Big Data was a marquee topic. A report by the forum, “Big Data, Big Impact,” declared data a new class of economic asset, like currency or gold.”
The 3-Vs of Big Data
• Volume– amount of data
• Velocity– speed of data in and out
• Variety– range of data types and sources
• [Gartner 2012] "Big Data are high-volume, high-velocity, and/or high-variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization"
Coming up: 3 examples from XI
10
Volume: Fixing the Hadoop Distributed File System
• Hadoop (YARN): “cluster Operating System”• Often synonymous with Big Data• Used everywhere (… even in CH)
11
HDFS Blocks Placement Strategy
Rack 1 Rack 2
● 1st replica on local node or random node
● 2nd replica on a different node in a different rack
● 3rd replica on a different node in same rack as 2nd replica
➡Not hardware-aware➡Block level rather than file
level
Solution: Hadaps File Placement
• Assigns weights to DataNodes–I/O-bound jobs finish earlier on new media–CPU-bound jobs finish earlier on new CPUs
• Uses lower utilization servers first• Moves more blocks to newer generations• Operates on file level
Up to 300% performance improvement by activating all nodes
1
A
1
2
B
1
2
C
1
2
D
2
3
E
2
3
F
2
3
2
34
56
7
8
9
Blocks
Weight
123456
789
1 2
3
4
5
6
7 8
9
10
10
10
Velocity: Real-Time Data Management
• Smart(er) Cities!
– Electricity provisioning– Water Networks
14
Example: Scalable Anomaly Detection
• Detecting leaks / pipe bursts / contamination in real-time for water distribution networks
15
Data at each Vertex!
• Spatial + temporal statistical processing (mini-Lisas)
• Stream processing (Storm) + Array processing (SciDB)
16
Variety: Sharing Data Locally & Globally
• 70+% of the world’s population has no or very limited access to the Web
[Ahmed Shams 2013]
18
Our Solution: ERS, theEntity Registry System
• Three-tier solution to deploy data-powered apps– Flexible
• Seamlessly reconcile entities in local / ad-hoc / global modes
– Collaborative• Transactional consistency,
data versioning
– Scalable• Bridges, scale-out servers,
tunable consistency
– Open-source• https://github.com/ers-devs
19
Ongoing Deployments
• Entity-powered apps for the Sugar Learning Platform
• Ambient Assisted Living of elderly persons in tropical environments
20
Special Thanks to…
• Vincenzo Russo, Benoit Perroud, Matt Thomas, Romain Cholat and the whole Verisign Fribourg office
• Burt Kaliski and his team
• Allison Mankin, Scott Hollenbeck, Debra Anderson & the Internet Infrastructures Grant team
… for their continued support
http://exascale.info
Big thanks to the whole XI crew!
Questions?
VeriSign EMEAJune 26, 2014
22