big data : risks and opportunities

62
Big Data Risks and Opportunities Kenny Huang, Ph.D. 黃勝雄博士 Executive Council Member, APNIC 亞太網路資訊中心董事 [email protected] 2015.06.09

Upload: kenny-huang

Post on 04-Aug-2015

1.083 views

Category:

Technology


4 download

TRANSCRIPT

Page 1: Big Data : Risks and Opportunities

Big Data

Risks and Opportunities

Kenny Huang, Ph.D. 黃勝雄博士

Executive Council Member, APNIC

亞太網路資訊中心董事

[email protected]

2015.06.09

Page 2: Big Data : Risks and Opportunities

Agenda

2

1

2

3

4

Big Data Technologies

Big Data Market

Opportunities, Risks,& Capital Trends

Algorithmic Accountability & Privacy

Page 3: Big Data : Risks and Opportunities

Internet of Things Definition

3

The Internet of Things (IoT) is the network of physical objects or "things"embedded with electronics, software, sensors and connectivity to enableit to achieve greater value and service by exchanging data with themanufacturer, operator and/or other connected devices. Each thing isuniquely identifiable through its embedded computing system but is ableto interoperate within the existing Internet infrastructure.

IoT = Devices (RFID tags, Sensors, ..) +Networks + Services + Data + Analytics

“In the world of IoT, even cows will be connected”

Source : The Economist 2010

Page 4: Big Data : Risks and Opportunities

IoT Size & Potential

4

Source : CISCO

Urban Planning Smart Cities Sustainable Environments Healthcare Emergency Response Waste Management Intelligent Shopping

Smart Product Management Smart Meters Smart Homes Smart Automobiles Smart Agriculture (cows) Smart Grid Intelligent Business Decisions

IoT Potential Applications

Page 5: Big Data : Risks and Opportunities

Examples of IoT

5

Google Glass Weight Scale

Sense Mother Jawbone Up

SkyBell

Light bulb

Nest Thermostat Belkin Wemo Firefox iKettle

Page 6: Big Data : Risks and Opportunities

IoT Architectural Design

6

Question : How to build systems that work well ?

Breaking them into tractable components. “Modularity based on abstraction is the way things get done.” – Liskov If you can’t manage, evolve, or understand a system, probably don’t have

the right abstraction.

Cloud Data Centre

Core Network

Access Network

Things Network

Web server hosting

IP/MPLS

Ethernet; Mobile; WiFi

RFID; NFC; Bluetooth

Page 7: Big Data : Risks and Opportunities

Energy Administration Architecture

7

Application servers; Database servers

IP/MPLS

WiFi 3G

Asset managementReliability improvement

Smart metering

Page 8: Big Data : Risks and Opportunities

Water Administration Architecture

8

Application servers; Database servers

IP/MPLS

WiFi 3G

Field devices

Smart metering

Quality monitoring andWater control

Page 9: Big Data : Risks and Opportunities

Smart City Architecture

9

Application servers; Database servers

IP/MPLS

WiFi 3G

Public lightingIntelligenttransportation

Public safetymanagement

Page 10: Big Data : Risks and Opportunities

IoT Technologies

10

detect anomalies analyze a mix of structured, semi-structured and unstructured data

Sensor Analytics

Stream Analytics

Big Data Analytics

Real-time Analytics

Machine Learning Analytics

Statistical Analyticsenables business users to get up-to-the-minute data by directly accessing OS

enables developers to combine streams of data with historic records to derive business insights

find patterns and identify trends

building of the predictive models by using machine learning techniques

Page 11: Big Data : Risks and Opportunities

Big Data

11

ASA2005 – Roger Magoulas uses the term “Big Data”

McKinsey Global InstituteBig Data: The next frontier for innovation, Competition, and productivity.

White HouseBig Data Initiative : $200 Million in New R&D Investment on Big Data for scientific discovery, environmental and biomedical research, education, and national security

The New OilAs far back as 2006, market researcher Cliver Humby declared data “thenew oil.” Just as oil once fired dreams a century or more ago, data is todaydriving a vision of economic and technical innovation. If “crude” data can be extracted, refined, and piped to where it can impact decisions in realtime, its value will soar.

International Year of Statistics - 2013

McKinsey Global Institute, May 2011

Press Release. White House of OSTP. March 29 , 2012

CISCO ISBG, June 2012

Page 12: Big Data : Risks and Opportunities

Big Data : Size

12

Page 13: Big Data : Risks and Opportunities

The 3Vs of Big Data

13

90% of the data in the worldtoday was created withinthe last two years

People to PeoplePeople to MachineMachine to Machine

2.9 emails sentevery second20 hours of video uploadedevery minute50 million tweets per day

Volume

Variety

Velocity

Page 14: Big Data : Risks and Opportunities

Big Data Ecosystem

14

Generation

Data Types Structured

(relational) Unstructured

(adhoc)

Data Classes Human Machine

Data Velocity Batch Streaming

Data Class Types

Data Mgmt. & Storage Store Secure Access Network

Engines Hadoop MapReduce Apache Tools Cloudera/IBM/EMC Visualization

Prepare Data ForAnalytics ETIL / Data Integration Workflow Scheduler System Tools

Data Analytics Algorithmics automation In Real Time

Business Analytics Visualization Interoperate with

SQL -RDBMs BI/EDW

Business Analysis Decision Support Just in Time

Business Model

Business User Market

PenetrationEnhancement

Cash Flow/ROI

Operational IT

Store Access Prepare

Analytics

Analyze Visualize Analyze Business

Usage

Source : Sybase

Page 15: Big Data : Risks and Opportunities

15

Analytics: Static Data vs. Streaming DataStatic Data Streaming Data

Multiple Passes Single Pass

Persistent Inherently Temporal

Offline Analytics Online as well as Offline Analytics

Analytics Based on All the Data Analytics Based on a Subset of Data

Only the current state is relevant Consideration of the order of the input

Relatively low update rate Potentially high update rate

Little or no time requirements Real-time requirements

Assumes exact data Assumes outdated/inaccurate data

Plannable query processing Variable data arrival and data characteristics

DBMS (Database Management System) DSMS (Data Stream Management System)

Persistent relational data Volatile transient data streams

Random access Sequential access

One-time queries Continuous queries

Unlimited secondary storage Limited main memory

Only the current state is relevant Consideration of the order of the input

Relatively low update rate Potentially high update rate

Little or no time requirements Real-time requirements

Assumes exact data Assume outdated / inaccurate data

Standing queries Ad-hoc queries

Page 16: Big Data : Risks and Opportunities

Big Data Challenges & Data Life of Cycle

16

Input Raw Data

Collection

Cleaning, Validationand Serialization

Transformation & Augmentation

Output

Interpretation &Presentation

Mining & Analytics

DB Storage &Management

Sensor data brings numerous challenges with it in the context of data collection, storage and processing. This is because sensor data processing often requires efficient in-network and real-time data stream processing from massive volumes of possibly uncertain data from various sources. The data generated from these sensors arrives in the form of streams.

At every phase of the big data life cycle, there are research issues along each steps To handle these streaming sensor data model-based techniques are employed, such as :

statistical, signal processing, regression-based, machine learning, probabilistic, time series.

Page 17: Big Data : Risks and Opportunities

17

Page 18: Big Data : Risks and Opportunities

Example of Model-based Technique : Kalman Filter

18

Probabilistic Models: In sensor data cleaning, inferring sensor values is perhaps the most import task, since systems can then detect and clean dirty sensor values by comparing raw sensor values with the corresponding inferred sensor values.The Kalman filter is perhaps on of the most common probabilistic models to compute inferred values corresponding to raw sensor values.

Page 19: Big Data : Risks and Opportunities

19

In the sliding window model, only the recent past is the objective concern of stream processing. The fundamental sliding windows are of fixed size, which are similar to first-in, first-out data structure. The input is still a stream of data values or elements. A data value arrives at each time instant; it later expires after a number of time stamps

equal to the window size n The current window at any time instant is the set of data elements that have not yet

expired.

The Sliding Window Model

Page 20: Big Data : Risks and Opportunities

Hadoop

20

Processing Platform for Big Data Processing Using the “MapReduce” processing technique

MapReduce is the processing part of Hadoop HDFS is the data part of Hadoop

Attributes Highly scalable Commodity HW-based Open source: low cost Batch processing centric

MapReduce

HDFS

Machine

Hive

HBase

Mahout

Pig

Oozie

Flume

Scoop

Projects

Set of open source projects

Page 21: Big Data : Risks and Opportunities

Map->Reduce and HDFS Architecture

21

TaskTracker

DataNode

Machine

JobTracker

NameNode

TaskTracker

DataNode

Machine

TaskTracker

DataNode

Machine

JobTracker keeps track of jobs being run

NameNode keeps information on data location

Master

Slave Slave Slave

Page 22: Big Data : Risks and Opportunities

22

1. The network is reliable2. Latency is zero3. Bandwidth is infinite4. The network is secure5. Topology doesn’t change6. There is one administrator7. Transport cost is zero8. The network is homogeneous

The Eight Fallacies of Distributed Computing

Source: Peter Deutsch

Page 23: Big Data : Risks and Opportunities

23

Source : Ray Kurzweil

10−51

1051010

1015

102010251030103510401045105010551060

1900 1920 1940 1960 1980 2000 2020 2040 2060 2080 2100

Year

Cal

cula

tio

ns

pe

r Se

con

d p

er

$1

,00

0Exponential Growth of Computing

Logarithmic Plot

By 2020s, computers have the same power as the human brain

Page 24: Big Data : Risks and Opportunities

Deep Learning

24

Iterative Algorithm Learning at different levels of abstraction Non-linear transforms Typically neural nets

Genetic programming Neural networks Quantum computers Wisdom of Crowds

Examples of Iterative Algorithm

What is Deep Learning

Page 25: Big Data : Risks and Opportunities

Google First Quantum Computer

25

“We actually think quantum machine learning may provide the most creative problem-solving process under the known laws of physics.” – Google Blog

Page 26: Big Data : Risks and Opportunities

26

Ginni Rometty, CEO of IBM

“In the future, every decision that mankind makes is going to be informed by a cognitive system like Watson.”

Deep Learning Application Areas

Page 27: Big Data : Risks and Opportunities

27

1

2

3

4

Big Data Technologies

Big Data Market

Opportunities, Risks,& Capital Trends

Algorithmic Accountability & Privacy

Page 28: Big Data : Risks and Opportunities

Rainmaker IProphet

28

Viktor Mayer-Schönberger

Eric Schmidt

“we now uncover as much data in 48 hours – 1.8 zettabytes – as humans gathered from the dawn of civilization to the year 2003”

Page 29: Big Data : Risks and Opportunities

Rainmaker IIKnowledge Marketer

29

Big Data Hype Circle Big Data Maturity Model

Page 30: Big Data : Risks and Opportunities

HypeCreate Perception with Correlation

30

source: Google

* correlation doesn’t prove causation

Page 31: Big Data : Risks and Opportunities

Formulate Illusion

31

source: Sybase

Page 32: Big Data : Risks and Opportunities

UmbrellaBusiness Marketer

32

Page 33: Big Data : Risks and Opportunities

ProblemWhose Problem ?

33

Avoid Fallacy of Irrelevancy

Questions :1 You want to solve IT giants' (Google/FB) problems ?2 You want to solve future problems with today’s technologies and price ?3 Forging illusive needs immediately to leverage technology trends ?

“Excel is very powerful. The fact is that programmers generally don't realize this.” (Jay, LinkedIn)

Page 34: Big Data : Risks and Opportunities

34

1

2

3

4

Big Data Technologies

Big Data Market

Opportunities, Risks,& Capital Trends

Algorithmic Accountability & Privacy

Page 35: Big Data : Risks and Opportunities

35

Create new service offerings

Satisfy customers

Provide contextual relevance

Information based differentiation

Sell raw information

Provide benchmarking

Deliver analysis and insights

Information based brokering

Foster marketplaces

Drive deal making

Enable advertising

Information based delivery networks

Big Data Business Model

(HBR, 2012)

What happened ?(Reporting)

How and why did it happen?(Modeling experimental design)

What is happening ?(Alert)

What’s the next best action?(recommendation)

What will happen ?(Extrapolation)

What’s the best/worst than can happen?(prediction, optimization)

Information

Insight

Past Present FutureQuestions Addressed by Data Analytics

(Harris & Morrison)

Page 36: Big Data : Risks and Opportunities

Target used data mining to predict buying habits of customer going through major life events

Target was able to identify 25 products that when analyzed together helped determine a “pregnancy prediction” score

Sent baby-related promotions to women based score

Case Studies

36

Outcome Sales of Target’s Mom and Baby products sharply increased soon after

advertising campaigns Privacy concerns: Target had to adjust how it communicated the new

promotions

General Electric using Big Data to optimize the service contracts & maintenance.

Netflix used Big Data to predict if a TV show will be successful – “Houseof Cards” series, Director & promotion.

LinkedIn used Big Data to develop “People You May Know” products -30% higher click-thru-rates

Page 37: Big Data : Risks and Opportunities

37

Buying opportunity

#2Vis

ibili

ty in

Me

dia

Time

TechnologyTigger

Peak ofInflated

Expectations

Trough ofDisillusionment

Slope ofEnlightenment

Plateau ofProductivity

Education

Buying opportunity

#1

DangerZone

Reality

CheckReal Value

2 years 5 years

Source : Gartner; Dr. Kenny Huang Revised

Gartner Hype Cycle

Page 38: Big Data : Risks and Opportunities

38

Buying opportunity

#2

Vis

ibili

tyin

Me

dia

Time

Buying opportunity

#1

DangerZone

2 years 5 years

Source : Dr. Kenny Huang Revised

Time

Innovators

2.5%Early Majority

34%Late Majority

34%Laggards

16%

EarlyAdapters

13.5%

Chasm

TechnologyTigger

Peak ofInflated

Expectations

Trough ofDisillusionment

Slope ofEnlightenment

Plateau ofProductivity

Hype Cycle and Technology Adoption Cycle Plotted Together

Page 39: Big Data : Risks and Opportunities

Big Data Visibility and Demand

39

“Big Data” Google Trends @2015.06.04

US TW

26% piloting11% may invest in 1 year7% may invest in 2 years

2015 Gartner research on adoption of Hadoop Technology

”Future demand for Hadoop looks fairly anemic over at least the next 24 months“. Merv Adrian, Gartner Research. (2015)

Page 40: Big Data : Risks and Opportunities

Big Data Buying Opportunity for Taiwan

40

Vis

ibili

ty in

Me

dia

Time

DangerZone

Source : Gartner; Dr. Kenny Huang Revised

Big Data Visibilityas of June 2015

NextBuying

Opportunity

2017 < Time *

* Ref revised Hype cycle diagram, Google trends 2015, Gartner research 2015

Page 41: Big Data : Risks and Opportunities

41

Risk (Standard Deviation)

Exp

ect

ed

Ret

urn

Government Bonds

Corporate Bonds

Common Stock

Real Estate

Futures

Big Data ?

New Innovation !!

Source : Dr. Kenny Huang Illustration

Risk Acceptance

Risk-Return Tradeoff

Page 42: Big Data : Risks and Opportunities

Investment Risking Model

42

Risk Acceptance

Risk Mitigation

Risk Avoidance

Startup; Series A

Due Diligence

Change Investment Objects

Risk Acceptance

Risk Mitigation

Risk Avoidance

Don’t Use Taxpayers’ Money

Pilot Projects; Research

Change Technology Policy

Business Entity

Government Institution

Page 43: Big Data : Risks and Opportunities

Big Data Adoption Strategy

43

Focus on your own business

Adopt and separate; or Adopt keep internal; or Attack back and disrupt

the disruption

Focus on your own business

Attack back and disrupt the disruption; or

Embrace the innovation and scale it up

Source : MIT Sloan Motivation To Response

Ab

ility

To

Re

spo

nse

Low High

Low

Hig

h

Page 44: Big Data : Risks and Opportunities

Financial Model Quizzes

44

Big Data Technology

Provider

Big Data Solution

Integration

Big Data As A Service

Fixed cost

BEP

Fixed cost

BEP

Fixed cost

BEPsales

cost

cost

cost

salessales

A B C

[ ] [ ] [ ]

*BEP : Breakeven Point

Page 45: Big Data : Risks and Opportunities

Global Capital Market Trends

45Source : Designed by Dr. Kenny Huang

MM USD

Page 46: Big Data : Risks and Opportunities

46

IPOs and Private Financing Deals in the Tech Sector since 2000 (United States)

Source: PwC

Source: Techcrunch

If there is a bubble, investors would recover their investment and perhaps walk away with positive return, the biggest losers for sure would be the employees and founders.

Game Rule : You Pick The Valuation, I Pick The Terms

Page 47: Big Data : Risks and Opportunities

47

Page 48: Big Data : Risks and Opportunities

48

1

2

3

4

Big Data Technologies

Big Data Market

Opportunities, Risks,& Capital Trends

Algorithmic Accountability & Privacy

Page 49: Big Data : Risks and Opportunities

Algorithms Rule The World

49

We should interrogate the architecture of cyberspace as we interrogate the code of Congress.

- Lawrence Lessig, Code is Law, 2000

Page 50: Big Data : Risks and Opportunities

Algorithmic Accountability

50

Algorithms Are Everywhere

Algorithmic Accountability How can we characterize the bias or power of an algorithm? When might algorithms be wronging us, or making

consequential decisions? What role should be involved in holding algorithmic power

to account ?

Algorithmic Confusing Algorithms are not transparent Technical complexity is a barrier

(Nick Diakopoulos)

Page 51: Big Data : Risks and Opportunities

Algorithmic Power : Decisions

51

3 2 1Prioritization

Classification

Association

Filtering

Page 52: Big Data : Risks and Opportunities

52

Input Output

Input / Output of An Algorithm

Algorithm

Input OutputAlgorithm

WSJ Price Discrimination

Do different people pay different prices depending on their geography or browser history ? Yes

Source: WSJ, Dec 2012

Staples.com

Page 53: Big Data : Risks and Opportunities

e

53

Transparency Voluntary incentives for self-disclosure about algorithms Trade secrets Gaming / manipulation

Goodhart’s Law: “ When a measure becomes a target, it ceases to be a good measure.”

Cognitive complexity Transparency information needs to be accessible and

understandable

Page 54: Big Data : Risks and Opportunities

sdfdsf

54

Other Stories from Algorithms Discriminatory / Unfair Mistake that denies a service Censorship Breaks law or social norm False Prediction

sdfdsf

Next Step Teaching algorithmic accountability

It will be messy and hard Legal issues

Computer Fraud and Abuse Act Ethical implications of publishing more information Transparency policy

What factors to expose, frequency, format of disclosure

Page 55: Big Data : Risks and Opportunities

Critical Considerations for Big Data Practices

55

Customers will want to know

that you are collecting data

why and what you are

collecting

that their confidentiality is

preserved

that their data is accessible

Privacy

Customers will want

an unique URL where they

can see what you’ve

collected

to know what sensors you

are using

that an API is interrogating

the data

Transparency

Customers will expect

to be the owner of the

data & be the copyright

holder.

To decide who they allow

access to (might not even

be you)

Ownership

Page 56: Big Data : Risks and Opportunities

Concern with Big Data Practices

56

Source : Whitehouse Big Data Review

Page 57: Big Data : Risks and Opportunities

PRISM Tasking Process

57

Page 58: Big Data : Risks and Opportunities

Massive Surveillance vs. Human Rights

58

Article 12:No one shall be subjected to arbitrary interference with his privacy, family, home or correspondence, nor to attacks upon his honour and reputation.

Page 59: Big Data : Risks and Opportunities

59

Source : appledaily, 2015.04.25Source : The Truman Show

Page 60: Big Data : Risks and Opportunities

60

USA Today2015.06.03

The New York Times2015.06.03

Page 61: Big Data : Risks and Opportunities

61

Page 62: Big Data : Risks and Opportunities

62

".... The big question is this: how do we designsystems that make use of our data collectively tobenefit society as a whole, while at the same timeprotecting people individual? Or..... how do we finda "Nash equilibrium" for data collection..........."