Download - An Introduction to Big Data
1
An Introduction toBIG DATA
CUSO Seminar on Big Data
Prof. Dr. Philippe Cudré-Mauroux
http://exascale.info
May 22, 2014
Fribourg–Switzerland
2
On the Menu Today
• Big Data: Context• Big Data: Buzzwords
– 3 Vs of Big Data
• Big Data Landscape• Hadoop• Big Data in Switzerland
3
Instant Quizz
• 3 Vs of Big Data?• CAP?• Hadoop?• Spark?
Exascale Data Deluge
• Science– Biology– Astronomy– Remote Sensing
• Web companies– Ebay– Yahoo
• Financial services,
retail companies
governments, etc.
© Wired 2009
➡ New data formats➡ New machines➡ Peta & exa-scale
datasets➡ Obsolescence of
traditional information infrastructures
5
The Web as the Main Driver
© Qmee
6
Big Data Central Theorem
Data+Technology Actionable Insight $$
7
Big Data Buzz
Between now and 2015, the firm expects big data to create
some 4.4 million IT jobs globally; of those, 1.9 million will
be in the U.S. Applying an economic multiplier to that
estimate, Gartner expects each new big-data-related IT job
to create work for three more people outside the tech
industry, for a total of almost 6 million more U.S. jobs.
Growth in the Asia Pacific Big Data market is
expected to accelerate rapidly in two to three
years time, from a mere US$258.5 million last
year to in excess of $1.76 billion in 2016,
with highest growth in the storage segment.
8
Big Data as a New Class of Asset
• The Age of Big Data (NYTimes Feb. 11, 2012)http://www.nytimes.com/2012/02/12/sunday-review/big-datas-impact-in-the-world.html
“Welcome to the Age of Big Data. The new megarich of Silicon Valley, first at Google and now Facebook, are masters at harnessing the data of the Web — online searches, posts and messages — with Internet advertising. At the World Economic Forum last month in Davos, Switzerland, Big Data was a marquee topic. A report by the forum, “Big Data, Big Impact,” declared data a new class of economic asset, like currency or gold.”
9
10
The 3-Vs of Big Data
• Volume– Amount of data
• Velocity– speed of data in and out
• Variety– range of data types and sources
• [Gartner 2012] "Big Data are high-volume, high-velocity, and/or high-variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization"
11
What can you do with the data
• Reporting– Post Hoc– Real time
• Monitoring (fine-grained)• Exploration• Finding Patterns• Root Cause Analysis• Closed-loop Control• Model construction• Prediction• …
© Mike Franklin
12
10 ways big data changes everything
• Some concrete examples – http://gigaom.com/2012/03/11/10-ways-big-data-is-changing-everything/2/
1. Can gigabytes predict the next Lady Gaga?
2. How big data can curb the world’s energy consumption
3. Big data is now your company’s virtual assistant
4. The future of Foursquare is data-fueled recommendations
5. How Twitter data-tracked cholera in Haiti
6. Revolutionizing Web publishing with big data
7. Can cell phone data cure society’s ills?
8. How data can help predict and create video hits
9. The new face of data visualization
10. One hospital’s embrace of big data
13
Typical Big Data Success Story
• Modeling users through Big Data– Online ads sale / placement [e.g., Facebook]– Personalized Coupons [e.g., Target]– Product Placement [Walmart]– Content Generation [e.g., NetFlix]– Personalized learning [e.g., Duolingo]– HR Recruiting [e.g., Gild]
14
More Data => Better Answers?
• Not that easy…• More Rows: Algorithmic complexity kicks in• More Columns: Exponentially more hypotheses
• Another formulation of the problem:– Given an inferential goal and a fixed computational budget,
provide a guarantee that the quality of inference will increase monotonically as data accrue (without bound)
• In other words:
=> Data should be a resource, not a load
© Mike Jordan
15
Big Data Infrastructures
16
A Concrete Example: Zynga
Leading the Pack of Wolves: Hadoop
• Google: Map/Reduce paper published 2004• Open source variant: Hadoop
• Map-reduce = high-level programming model and implementation for large-scale parallel data processing
• Right now most overhyped system in CS
17
18
What about Swiss Big Data?
• Competitive Research Groups
• Swiss Big Data User Group
• Swiss companies playing catch-up– Productized Big Data systems at leading telcos & financial
companies– Big Data is not a new technology: it's a fact;
• Deal with it POCs in most banks, insurance companies, retailers
19
Tasty Bites of Big Data (1)
Thursday afternoon
• 13:30-15:00: Big Data ProfilingFelix Naumann (Hasso Plattner Institute)
• 15:15-16:45: Realtime AnalyticsChristoph Koch (EPFL)
• 16:45-17:45: Current Trends and Challenges in Big Data BenchmarkingKais Sachs (SAP / Spec)
20
Tasty Bites of Big Data (2)
Friday
• 9:00 - 10:30: Structured Data in Web Search Alon Halevy (Google)
• 10:45 - 12:15: Human Computation for Big DataGianluca Demartini (UNIFR)
• 13:30-15:00: Analysing and Querying Big Scientific DataThomas Heinis (EPFL)
• 15:00-16:30: The Evolution of Big Data FrameworksCarlo Curino (Microsoft Research)
Social Event, Friday – Beer Tasting!Basse-Ville Fribourg / 15 CHF per Person
Everything You Always Wanted to Know About Beer. * But Were
Afraid to Ask!18:00 @ Café du Belvédère, Grand-Rue 3619:00 @ Fri-Mousse, Rue de la Samaritaine 19 Limited Places, Inscription is mandatory at:
http://xr.si