welcome & aws big data solution overview

Post on 21-Jan-2018

298 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

AWS Big Data Solution Overview

Ivan Cheng (鄭志帆)

AWS Solutions Architect

What is Big Data?

When your data sets become so large and complex

you have to start innovating around how to

collect, store, process, analyze, and share them.

GBTB

PB

ZB

EB

Big Data: Unconstrained Growth

Unstructured data growth is explosive

95% of the 1.2 zettabytes of data in the digital universe is unstructured

Machine data and IoT will only steepen the curve

70% of this data is user-generated content

Source: IDC, The Internet of Things: Getting Ready to Embrace Its Impact on the Digital Economy, March 2016.

The Cloud Was Built for Big Data

Elastic and highly scalable

No upfront capital expense

Only pay for what you use+

+

Available on-demand+

= the Cloud removes constraints

Ingest/

Collect

Consume/

visualizeStore Process/

analyze

Data1 4

0 95 Answers &

insights

START HEREWITH A BUSINESS CASE

Time to answer (Latency)

Cost

Evolution of Analytics

Retrospective

analysis and

reporting

Here-and-now

real-time processing

and dashboards

Predictions

to enable smart

applications

AWS Big Data Benefits

Immediate Availability. Deploy instantly. No hardware to procure,

no infrastructure to maintain & scale.

Broad & Deep Capabilities. Over 50 services and 100s of features

to support virtually any big data application & workload.

Trusted & Secure. Designed to meet the strictest requirements.

Continuously audited, including certifications such as ISO 27001,

FedRAMP, DoD CSM, and PCI DSS.

Hundreds of Partners & Solutions. Get help from a consulting partner

or choose from hundreds of tools and applications across the entire data

management stack.

AWS Data PipelineAWS Database Migration Service

EMR

Analyze

Amazon

GlacierS3

StoreCollect

Amazon Kinesis

Direct Connect

Amazon

Machine

Learning

Amazon

Redshift

DynamoDB AWS IoT

AWS Snowball

QuickSight

Amazon Athena

EC2Amazon

Elasticsearch

Service

Lambda

AWS Glue

Key AWS Certifications and Assurance Programs

AWS Big Data Customer Success

AWS Big Data Partners

AWS Big Data Service Overview

AWS Database Migration Service

AWS Direct

ConnectAWS

Import/Export

& Snowball

AWS

Storage

Gateway

Data Movement

Storage and Databases

• Store unlimited number of objects

• Designed for 99.999999999% durability

• As Data Lake with integration with other AWS services

(Amazon Kinesis, Amazon Redshift, Amazon EMR, etc.)

• Low cost with tired-storage (Standard, IA, Amazon Glacier)

via life-cycle policy

• Secure – SSL, client/server-side encryption at rest

Amazon S3

• Fully Managed NoSQL Database

• Fast consistent performance (single-digit millisecond latency

at any scale)

• Highly scalable - automatic scaling of throughput capacity

• Highly available and durability

• Store unlimited number of data

Amazon

DynamoDB

• Fully Managed Relational Database Service

• MySQL and PostgreSQL compatible relational database with up to

5x better performance running on the same hardware

• Security, availability, and reliability of commercial databases at

1/10th the cost

• Designed to offer greater than 99.99% availability.

• Automatically grows storage as needed, from 10GB up to 64TB

• Achieve up to 500,000 reads and 100,000 writes per second

Amazon

Aurora

• Fully managed petabyte-scale relational, MPP, data warehousing

• Built-in end-to-end security, including SSL connections and cluster

encryption

• Fault-tolerant - automatically recovers from disk and node failures

• Data automatically backed up to Amazon S3

• $1,000/TB/Year; start at $0.25/hour. Provision in minutes; scale

from 160 GB to 2 PB of compressed data with just a few clicks

Amazon

Redshift

Analytic Frameworks

• Managed Hadoop framework

• Apache Hadoop, Hive, Spark, Zeppelin, Presto, HBase, Phoenix,

Tez, Flink, etc.

• Auto Scaling clusters with support for on-demand and spot pricing

• Support for end-to-end encryption, IAM/VPC, S3 client-side

encryption with customer managed keys and AWS KMS

• Integrates with Amazon S3, Amazon DynamoDB, Amazon Kinesis

and Amazon Redshift

Amazon

EMR

PIG

Amazon

EMR

Amazon

S3

EMRFS

Amazon EMR

• Fully managed, reliable, and scalable Elasticsearch service

• Support for ELK

• Integration options with other AWS services (CloudWatch

Logs, Amazon DynamoDB, Amazon S3, Amazon Kinesis)

• Use Case: log analytics, full text search, application

monitoring, and more.

Amazon

Elasticsearch

• Serverless query service for querying data in S3 using

standard SQL with no infrastructure to manage

• Support for multiple data formats include text, CSV, TSV,

JSON, Avro, ORC, Parquet

• Pay per query only when you’re running queries based on

data scanned. If you compress your data, you pay less and

your queries run faster

Amazon

Athena

Familiar Technologies Under the Covers

Used for SQL Queries

In-memory distributed query engine

ANSI-SQL compatible with extensions

Used for DDL functionality

Complex data types

Multitude of formats

Supports data partitioning

• Fast and cloud-powered Business Analytics

• Easy to use, no infrastructure to manage

• Quick calculations with SPICE

• 1/10th the cost of legacy BI software

• Accessed from any browser or mobile device

Amazon

Quicksight

• Fully managed ETL (extract, transform, load) service

• Integrated data catalog, automatic schema discovery, ETL

code generation, flexible job scheduler

• Integrated across a wide range of AWS services (Amazon

RDS, Database running on Amazon EC2, Amazon Athena,

etc.)

AWS Glue

1. Build your data catalog

2. Generate and Edit Transformations

3. Schedule and Run Your Jobs

How AWS Glue Works

Real-time Analytics

• Fully managed streaming application

• Scalable – handle any amount of streaming data

• Ingest, buffer and process data in real-time

• React quickly – derive insight in seconds

Amazon

Kinesis

Amazon Kinesis

Amazon Kinesis

Streams

Build your own custom

applications that process or

analyze streaming data

Amazon Kinesis

Firehose

Easily load massive volumes

of streaming data into

Amazon S3, Amazon

Redshift, and Amazon

Elasticsearch

Amazon Kinesis

Analytics

Easily analyze data streams

using standard SQL queries

Amazon Kinesis Streams

• Reliably ingest and durably store streaming data at low

cost

• Build custom real-time applications to process

streaming data

Amazon Kinesis Firehose

Reliably ingest and deliver batched, compressed, and encrypted

data to S3, Amazon Redshift, and Amazon Elasticsearch Service

Amazon Kinesis Analytics

Interact with streaming data in real time using SQL

Modern Data Analytics Architecture on AWS

Speed (Real-time)

Ingest ServingData

sources

Scale (Batch)

Modern data architectureInsights to enhance business applications, new digital services

Data analysts

Data scientists

Business users

Engagement platforms

Automation / events

Speed (Real-time)

Ingest ServingData

sources

Scale (Batch)

Modern data architectureInsights to enhance business applications, new digital services

Transactions

Web logs /

cookies

ERP

Data analysts

Data scientists

Business users

Engagement platformsConnected

devices

Social media Automation / events

Speed (Real-time)

Ingest ServingData

sources

Scale (Batch)

Modern data architectureInsights to enhance business applications, new digital services

Data analysts

Data scientists

Business users

Engagement platforms

Automation / events

Transactions

Web logs /

cookies

ERP

AWS Database

Migration

AWS Direct

Connect

Internet

Interfaces

Amazon

Kinesis

Connected

devices

Social media

Speed (Real-time)

Ingest ServingData

sources

Scale (Batch)

Modern data architectureInsights to enhance business applications, new digital services

Data analysts

Data scientists

Business users

Engagement platforms

Automation / events

Transactions

Web logs /

cookies

ERP

AWS Database

Migration

AWS Direct

Connect

Internet

Interfaces

Amazon

Kinesis

Connected

devices

Social media

Speed (Real-time)

Scale (Batch)

Amazon S3

Staged Data

(Data Lake)Amazon S3

Raw DataAmazon EMR

ETL

AWS Glue

AWS

Cloud Trail

AWS

IAMAmazon

CloudWatch

AWS

KMS

Speed (Real-time)

Ingest ServingData

sources

Scale (Batch)

Modern data architectureInsights to enhance business applications, new digital services

Data analysts

Data scientists

Business users

Engagement platforms

Automation / events

Transactions

Web logs /

cookies

ERP

AWS Database

Migration

AWS Direct

Connect

Internet

Interfaces

Amazon

Kinesis

Connected

devices

Social media

Speed (Real-time)

Scale (Batch)

Amazon S3

Staged Data

(Data Lake)Amazon S3

Raw DataAmazon EMR

ETL

Advanced

Analytics

MLlib

Deep LearningAmazon ML

Serving

AWS

Cloud Trail

AWS

IAMAmazon

CloudWatch

AWS

KMS

Speed (Real-time)

Ingest ServingData

sources

Scale (Batch)

Modern data architectureInsights to enhance business applications, new digital services

Data analysts

Data scientists

Business users

Engagement platforms

Automation / events

Transactions

Web logs /

cookies

ERP

AWS Database

Migration

AWS Direct

Connect

Internet

Interfaces

Amazon

Kinesis

Connected

devices

Social media

Speed (Real-time)

Scale (Batch)

Amazon S3

Staged Data

(Data Lake)Amazon S3

Raw DataAmazon EMR

ETL

Advanced

Analytics

MLlib

Deep LearningAmazon ML

Serving

Data WarehouseAmazon Redshift

Legacy AppsAmazon RDS

SchemalessAmazon ElasticSearch

Direct QueryAmazon Athena

Near-Zero LatencyAmazon DynamoDB

Semi/UnstructuredAmazon EMR

AWS

Cloud Trail

AWS

IAMAmazon

CloudWatch

AWS

KMS

Speed (Real-time)

Ingest ServingData

sources

Scale (Batch)

Modern data architectureInsights to enhance business applications, new digital services

Data analysts

Data scientists

Business users

Engagement platforms

Automation / events

Transactions

Web logs /

cookies

ERP

AWS Database

Migration

AWS Direct

Connect

Internet

Interfaces

Amazon

Kinesis

Connected

devices

Social media

Speed (Real-time)

Scale (Batch)

Amazon S3

Staged Data

(Data Lake)Amazon S3

Raw DataAmazon EMR

ETL

Advanced

Analytics

MLlib

Deep LearningAmazon ML

Serving

Data WarehouseAmazon Redshift

Legacy AppsAmazon RDS

SchemalessAmazon ElasticSearch

Direct QueryAmazon Athena

Near-Zero LatencyAmazon DynamoDB

Semi/UnstructuredAmazon EMR

Amazon

QuickSight

Amazon

API Gateway

AWS

Cloud Trail

AWS

IAMAmazon

CloudWatch

AWS

KMS

Speed (Real-time)

Ingest ServingData

sources

Scale (Batch)

Data analysts

Data scientists

Business users

Engagement platforms

Automation / events

Transactions

Web logs /

cookies

ERP

AWS Database

Migration

AWS Direct

Connect

Internet

Interfaces

Amazon

Kinesis

Connected

devices

Social media

Speed (Real-time)

Scale (Batch)

Amazon S3

Staged Data

(Data Lake)Amazon S3

Raw DataAmazon EMR

ETL

Advanced

Analytics

MLlib

Deep LearningAmazon ML

Serving

Data WarehouseAmazon Redshift

Legacy AppsAmazon RDS

SchemalessAmazon ElasticSearch

Direct QueryAmazon Athena

Near-Zero LatencyAmazon DynamoDB

Semi/UnstructuredAmazon EMR

Amazon

QuickSight

Amazon

API Gateway

Event CaptureAmazon Kinesis

Stream AnalysisAmazon EMR Event Scoring

Amazon AI

Event HandlerAWS Lambda Response Handler

AWS Lambda

Modern data architectureInsights to enhance business applications, new digital services

AWS

Cloud Trail

AWS

IAMAmazon

CloudWatch

AWS

KMS

Speed (Real-time)

Ingest ServingData

sources

Scale (Batch)

Data analysts

Data scientists

Business users

Engagement platforms

Automation / events

Transactions

Web logs /

cookies

ERP

AWS Database

Migration

AWS Direct

Connect

Internet

Interfaces

Amazon

Kinesis

Connected

devices

Social media

AWS

Cloud Trail

AWS

IAMAmazon

CloudWatch

AWS

KMS

Speed (Real-time)

Scale (Batch)

Amazon S3

Staged Data

(Data Lake)Amazon S3

Raw DataAmazon EMR

ETL

Advanced

Analytics

MLlib

Deep LearningAmazon ML

Serving

Data WarehouseAmazon Redshift

Legacy AppsAmazon RDS

SchemalessAmazon ElasticSearch

Direct QueryAmazon Athena

Near-Zero LatencyAmazon DynamoDB

Semi/UnstructuredAmazon EMR

Amazon

QuickSight

Amazon

API Gateway

Event CaptureAmazon Kinesis

Stream AnalysisAmazon EMR Event Scoring

Amazon AI

Event HandlerAWS Lambda Response Handler

AWS Lambda

Modern data architectureInsights to enhance business applications, new digital services

Reference Architecture

Sample Reference Architecture: Data Lake

AthenaGlue

Data Marts

(Amazon

Redshift)

Query Cluster

(EMR)

Query Cluster

(EMR)

Auto Scaling

EC2

Analytics

App

Normalization

ETL Clusters

(EMR)

Batch Analytic

Clusters

(EMR)

Ad Hoc Query

Cluster (EMR)

Auto Scaling

EC2

Analytics

App

Users Data

ProvidersAuto Scaling

EC2

Data

Ingestion

Services

Optimization

ETL Clusters

(EMR)

Shared Metastore

(RDS)

Query Optimized

(S3)

Auto Scaling EC2

Data

Catalog

& Lineage

Services

Reference Data

(RDS)

Shared Data Services

Auto Scaling

EC2

Cluster Mgt

& Workflow

Services

Source of

Truth (S3)

>5 PB, up to 75 billion events per day

Amazon

S3

Amazon

EMR

Amazon

S3

Amazon

Redshift

Amazon

QuickSightData

Sources

Enterprise Data Warehouse

Amazon

Athena

Amazon

Athena

Ingest/

Collect

Consume/

visualizeStore

Process/

analyze

Data

1 40 9

5

Outcomes

& insights

Personalized

recommendations within

seconds (from 15-20 min)

Scale the expertise of

stylists to all shoppers

Reduce costs by 2X order

of magnitude

Mobile Users

Desktop Users

Analytics

Tools

Online Stylist

Amazon

Redshift

Amazon

Kinesis

AWS

Lambda

Amazon

DynamoDBAWS

Lambda

Amazon S3

Data Storage

NORDSTROM

Big Data on AWS:

https://aws.amazon.com/big-data/

Thank you!

top related