the good, the bad, and the ugly

Post on 28-Jan-2015

567 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

High Performance Computing and Big Data Conference

TRANSCRIPT

1© Bull, 2014

Andrew CarrCEO, Bull UK & Ireland

2© Bull, 2014

High Performance Computingand Big Data Conference

Data: the Good, the Bad, and the Ugly

3© Bull, 2014

5© Bull, 2014

Distributed IT

IT as-a-Service

Information as-a-Service

Centralised IT

TECHNOLOGY

USAGE

Its main driver transitioning from

to

The IT market is at an inflection point:

T E C H N O L O G Y1970 2020USAGE2010

6© Bull, 2014

IT INFRASTRUCTURE

COMPLEX INTEGRATION

HIGH PERFORMANCE COMPUTING

SECURITY

M2M

CLOUD

BIG DATA

Information as-a-Service

Distributed IT

IT as-a-Service

Centralised IT

TRANSPARENT PLATFORMS

VALUE FROM DATA enabling

The IT market is at an inflection point:

7© Bull, 2014

Time to results…

Speed has Value Greater than Size

Think Fast Data more than Big Data

8© Bull, 2014

A real Big Data problem…but Fast Results?

14 Jan 2014 - Illumina Announces the Thousand Dollar Genome• $800 for reagents, $60 for sample preparation, $137 for ‘hardware’ over lifetime• Assuming you can afford 10 HiSeq X machines at $1 Million each You will be able to process 5 whole genomes/day – 18,000 a year for X10

So just 30 systems non-stop 24/7 to meet Genomics England 100K 2017 goal !

9© Bull, 2014

A real Big Data problem…but Fast Results?

But when you’ve done that, how to process the results?• You now have 30-50 Terabytes of raw data per machine per week• HiSeq X10 cluster will require ~ 175,000 CPU core hours just to align results

and even more to perform variant analysis to detect cancer anomalies Delivering 250,000 core hours/week 24/7 and storing results is not trivial

10© Bull, 2014

Why is data important?

12© Bull, 2014

Turning Fans into Customers…

13© Bull, 2014

Smart Stadiums…..

• 90% Increase in RESPECT services & ‘Report an incident’.• 12% New revenue £1 per bet ‘Man of the match’ /First Sub betting • 85% Increase In Social Media usage• 35% increase in Stadium sponsored betting • 8% -15% increase in Club Merchandising• Discounts on food & beverage to remove wastage• Twitter wall for live interactions (advertorials)• Real time non-contentious replays• Access to secure club content (premium)

Smart Stadiums Value:Become aware:

Traffic managementSecurity challengesWeatherCrowd controlFoot-fall management

14© Bull, 2014

Professor Stephen JarvisDirector for Computing Research University of Warwick

15© Bull, 2014

Smart Cities

RetailPolice

Telecoms

Government

Healthcare

Forensic science

InterpolOpinion polls

16© Bull, 2014

Source cameraidentification

Used by Interpol to classify and group

explicit images

Fingerprint analysis Used in UNHCR camps

Biometric solutions

FBI certified

Performance tuningand debugging tools

Used on the world’s Largest supercomputers

17© Bull, 2014

1. Characteristics of the problem domain

2. Characteristics of the solution

Volume – terabytes to exabytes of existing data to process

Variety – structured, unstructured, text multimedia

Veracity – uncertainty due to incompleteness or ambiguities

Velocity – streaming data, milliseconds to seconds response time

Storage – should this increase your data storage requirements?

Processing – should data processing be done sequentially or in parallel?

Let’s investigate some case studies …

Speed – where should you maximise latency: memory, network, both?

18© Bull, 2014

Case study 1: You like pink milk

19© Bull, 2014

• 1993, Tesco’s CEO was looking to replace Green Shield trading stamps

• DunnHumby, a small London start-up, introduced the notion of a clubcard

“you know more about my customers after three months, than I know after 30 years”

Lord MacLaurin, Tesco Chairman

Case study 1: You like pink milk

20© Bull, 2014

• Single most significant factor in the success of the company

• 43M clubcard holders worldwide• Allows Tesco to stock unpopular brands for big

spending customers

• 6M transactions per day presents significant volume

• Wide application: Calorie counting with Diabetes UK

Case study 1: You like pink milk

21© Bull, 2014

Case study 1: You like pink milk

BIG DATA

Characteristics: Terabytes to exabytes of existing data is processed

Processing: Batch and in parallel

Storage: Very large volumes of datastored

Speed: Access of data from disk; transfer of data to / from memory; delivery of results potentially slow

22© Bull, 2014

Case study 2: Take heart

23© Bull, 2014

• Some problems are not so much volume as velocity, as you want to analyse data in motion

• Non-relational data, such as email, text, voice, video, data from instruments

Case study 2: Take heart

24© Bull, 2014

• Monitoring needs to be real-time and continuous• Not so much a question of storage, as of spotting outliers

Case study 2: Take heart

25© Bull, 2014

Case study 2: Take heart

• Streaming analytic solutions being deployed into intensive care and mobile continuous health monitoring

• Text analysis of social media for flu

26© Bull, 2014

• Health analytics market estimated to be worth $21.3B by 2020

• Compound annual growth rate of 25%

Case study 2: Take heart

27© Bull, 2014

BIG DATA

Characteristics: Streaming data; could be from heterogeneous sources from multiple sites

Processing: Real-time and in parallel; may alert further batch

Storage: Minimal storage requirements;

Speed: Transfer ‘from the pipe’ toregisters for processing; results often delivered as alerts

Case study 2: Take heart

28© Bull, 2014

Case study 3: We built this city

29© Bull, 2014

• Annual global market for Smart Cities solutions is £200B

• Over 1,000 cities in the world with populations >500,000

• Smart Cities research shows us the variety of data

Case study 3: We built this city

• Transport cards (oyster)• Sensors (traffic, pollution, weather)• Camera data (security, traffic)• GIS (people, vehicles)• Buildings (temperature,

occupation)

30© Bull, 2014

Case study 3: We built this city

Click here to play the video

31© Bull, 2014

Case study 3: We built this city

What 100 million callsto NYC 311 reveal

32© Bull, 2014

BIG DATA

Characteristics: Streaming and/orbatch analytics; from heterogeneous sources from multiple sites

Processing: Real-time and in parallel; may alert further batch

Storage: Minimal storage requirements;

Speed: Transfer ‘from the pipe’ toregisters for processing; results often delivered as alerts

Case study 3: We built this city

33© Bull, 2014

Case study 4: The Blackberry Riots

34© Bull, 2014

Case study 4: The Blackberry Riots

• Between 6 and 10 August 2011, thousands of people took to the streets in London

• The disturbances began after a police shooting on 4 August in Tottenham

• The resulting chaos required mass police deployment

• The rioting soon spread to Birmingham, Bristol, Liverpool and Manchester

• “Everyone watching these horrific actions will be struck by how they were organised with social media” David Cameron, Prime Minister

35© Bull, 2014

Case study 4: The Blackberry Riots

• Professor Rob Procter and a team from LSE and The Guardian set about investigating this claim

• One of the largest studies of social media analytics

• What can we learn from use of social media during times of crisis?

• What does this tell us about veracity of data?

36© Bull, 2014

Case study 4: The Blackberry Riots

9pm on 8th August @Twiggy_Garcia circulatesunconfirmed reports thatrioters releasing animals at London Zoo

Re-tweeted by influential users with many followers. Rumours spread in viral-like way over non-hierarchical network

Opposition seeds within 13 minutes. Pictures are identified as fake

Click here to play the video

37© Bull, 2014

BIG DATA

Characteristics: Uncertainty and Incompleteness exists in all data; streaming has the advantage of ‘in-flight correction’.Processing: Real-time and in parallel; inc. background analysis

Storage: Minimal additional storage requirements;

Speed: Inevitably impacts speed

Case study 4: The Blackberry Riots

38© Bull, 2014

• Identifying characteristics of problem domain• Working with experts, formulate technology (hardware/software) needs• ‘Big data’ solutions are commonplace; ‘Fast data’ solutions are not

39© Bull, 2014

® Copyright 2011 Gigaspaces Ltd. All Rights Reserved39

Conclusion…….

40© Bull, 2014

Discussion

Andrew.Carr@bull.co.uk

Stephen.Jarvis@warwick.ac.uk

Robert.J.Maskell@intel.com

41© Bull, 2014

information@bull.co.uk@Bull_UKBull-Information-Systems

0870 240 0040www.bull.co.ukHemel Hempstead HP2 7DZ

top related