[よくわかるamazon redshift]amazon redshift最新情報と導入事例のご紹介

52
AWSプロダクトシリーズ よくわかるAmazon Redshift 2014/02/19 アマゾン データ サービス ジャパン株式会社

Upload: amazon-web-services-japan

Post on 26-Jan-2015

107 views

Category:

Technology


0 download

DESCRIPTION

2014/02/19 東京開催セミナー資料

TRANSCRIPT

Page 1: [よくわかるAmazon Redshift]Amazon Redshift最新情報と導入事例のご紹介

AWSプロダクトシリーズ

よくわかるAmazon Redshift

2014/02/19 アマゾン データ サービス ジャパン株式会社

Page 2: [よくわかるAmazon Redshift]Amazon Redshift最新情報と導入事例のご紹介

Amazon Redshift Fast, simple, petabyte-scale data warehousing for less than $1,000/TB/Year

Rahul Pathak |Senior Product Manager

Page 3: [よくわかるAmazon Redshift]Amazon Redshift最新情報と導入事例のご紹介

Petabyte scale

Massively parallel

Relational data warehouse

Fully managed; zero admin

Amazon Redshift

a lot faster a lot cheaper a whole lot simpler

Page 4: [よくわかるAmazon Redshift]Amazon Redshift最新情報と導入事例のご紹介

Amazon Redshift Quick Overview

Amazon Redshift 概要のおさらい

Page 5: [よくわかるAmazon Redshift]Amazon Redshift最新情報と導入事例のご紹介

Amazon Redshift architecture

• Leader Node – SQL endpoint

– Stores metadata

– Coordinates query execution

• Compute Nodes – Local, columnar storage

– Execute queries in parallel

– Load, backup, restore via Amazon S3

– Parallel load from Amazon DynamoDB

• Hardware optimized for data processing

• Two hardware platforms – DW1: HDD; scale from 2TB to 1.6PB

– DW2: SSD; scale from 160GB to 256TB

10 GigE

(HPC)

Ingestion Backup Restore

JDBC/ODBC

Page 6: [よくわかるAmazon Redshift]Amazon Redshift最新情報と導入事例のご紹介

Amazon Redshift has security built-in

• SSL to secure data in transit

• Encryption to secure data at rest – AES-256; hardware accelerated

– All blocks on disks and in Amazon S3 encrypted

– HSM Support

• No direct access to compute nodes

• Audit logging & AWS CloudTrail integration

• Amazon VPC support

10 GigE

(HPC)

Ingestion

Backup

Restore

Customer VPC

Internal

VPC

JDBC/ODBC

Page 7: [よくわかるAmazon Redshift]Amazon Redshift最新情報と導入事例のご紹介

Amazon Redshift is easy to use

• Provision in minutes

• Monitor query performance

• Point and click resize

• Built in security

• Automatic backups

Page 8: [よくわかるAmazon Redshift]Amazon Redshift最新情報と導入事例のご紹介

Provision a data warehouse in minutes

Page 9: [よくわかるAmazon Redshift]Amazon Redshift最新情報と導入事例のご紹介

Monitor query performance

Page 10: [よくわかるAmazon Redshift]Amazon Redshift最新情報と導入事例のご紹介

Point and click resize

• Resize while remaining online via AWS

Console or API

• Provision a new cluster in the background

and copy data in parallel from node to

node

• Only charged for source cluster until SQL

endpoint has automatically been switched

over via DNS

Page 11: [よくわかるAmazon Redshift]Amazon Redshift最新情報と導入事例のご紹介

Amazon Redshift continuously backs up your data and

recovers from failures

• Replication within the cluster and backup to Amazon S3 to maintain multiple

copies of data at all times

• Backups to Amazon S3 are continuous, automatic, and incremental

– Designed for eleven nines of durability

• Continuous monitoring and automated recovery from failures of drives and nodes

• Able to restore snapshots to any Availability Zone within a region

• Easily enable backups to a second region for disaster recovery

Page 12: [よくわかるAmazon Redshift]Amazon Redshift最新情報と導入事例のご紹介

Amazon Redshift integrates with multiple data sources

Amazon S3

Amazon EMR

Amazon Redshift

DynamoDB

Amazon RDS

Corporate Datacenter

Page 13: [よくわかるAmazon Redshift]Amazon Redshift最新情報と導入事例のご紹介

New Features That Introduced After re:Invent 2013

re:Invent 2013以降の主なアップデート

Page 14: [よくわかるAmazon Redshift]Amazon Redshift最新情報と導入事例のご紹介

Feature Delivery in 2013

Service Launch (2/14)

PDX (4/2)

Temp Credentials (4/11)

Unload Encrypted Files

DUB (4/25)

NRT (6/5)

JDBC Fetch Size (6/27)

Unload logs (7/5)

4 byte UTF-8 (7/18)

Statement Timeout (7/22)

SHA1 Builtin (7/15)

Timezone, Epoch, Autoformat (7/25)

WLM Timeout/Wildcards (8/1)

CRC32 Builtin, CSV, Restore Progress (8/9)

UTF-8 Substitution (8/29)

JSON, Regex, Cursors (9/10)

Split_part, Audit tables (10/3)

SIN/SYD (10/8)

HSM Support (11/11)

Kinesis EMR/HDFS/SSH copy, Distributed Tables, Audit

Logging/CloudTrail, Concurrency, Resize Perf., Approximate Count

Distinct, SNS Alerts (11/13)

SOC1/2/3 (5/8)

Sharing snapshots (7/18)

Resource Level IAM (8/9)

PCI (8/22) Distributed Tables, Single Node Cursor Support, Maximum Connections to 500

(12/13)

EIP Support for VPC Clusters (12/28)

Page 15: [よくわかるAmazon Redshift]Amazon Redshift最新情報と導入事例のご紹介

Summary of Updates after re:Invent

• Amazon Redshift - New Features Galore (2013/11/11) – Distributed Tables - You now have more control over the distribution of a table's rows across compute

nodes.

– Remote Loading - You can now load data into Redshift from remote hosts across an SSH connection.

– Approximate Count Distinct - You can now use a variant of the COUNT function to approximate the number of matching rows.

– Workload Queue Memory Management - You can now apportion available memory across work queues.

– Key Rotation - You can now direct Redshift to rotate keys for an encrypted cluster.

– HSM Support - You can now direct Redshift to use an on-premises Hardware Security Module (HSM) or AWS CloudHSM to manage the encryption master and cluster encryption keys.

– Database Auditing and Logging - You can log connections and user activity to Amazon S3.

– SNS Notification - Redshift can now issue notifications to an Amazon SNS topic when certain events occur.

• Automated Cross-Region Snapshot Copy for Amazon Redshift (2013/11/14) • Faster & More Cost-Effective SSD-Based Nodes for Amazon Redshift(2014/01/24) • AWS CloudFormation Adds Support for Redshift and More (2014/02/10)

Page 16: [よくわかるAmazon Redshift]Amazon Redshift最新情報と導入事例のご紹介

Amazon Redshift Node Types

• Optimized for I/O intensive workloads

• High disk density

• On demand at $0.85/hour

• As low as $1,000/TB/Year

• Scale from 2TB to 1.6PB

DW1.XL: 16 GB RAM, 2 Cores 3 Spindles, 2 TB compressed storage

DW1.8XL: 128 GB RAM, 16 Cores, 24 Spindles 16 TB compressed, 2 GB/sec scan

rate

• High performance at smaller storage size

• High compute and memory density

• On demand at $0.25/hour

• As low as $5,500/TB/Year

• Scale from 160GB to 256TB

DW2.L *New*: 16 GB RAM, 2 Cores, 160 GB compressed SSD storage

DW2.8XL *New*: 256 GB RAM, 32 Cores, 2.56 TB of compressed SSD storage

Page 17: [よくわかるAmazon Redshift]Amazon Redshift最新情報と導入事例のご紹介

Amazon Redshift is priced to let you analyze all your data

DW1 (HDD) Price Per Hour for

DW1.XL Single Node

Effective Annual

Price per TB

On-Demand $ 1.250 $ 5,475

1 Year Reservation $ 0.750 $ 3,283

3 Year Reservation $ 0.452 $ 1,981

DW2 (SSD) Price Per Hour for

DW2.L Single Node

Effective Annual

Price per TB

On-Demand $ 0.330 $ 18,068

1 Year Reservation $ 0.211 $ 11,570

3 Year Reservation $ 0.130 $ 7,127

• Number of nodes x cost per

hour

• No charge for leader node

• No upfront costs

• Pay as you go

Page 18: [よくわかるAmazon Redshift]Amazon Redshift最新情報と導入事例のご紹介

Security, visibility and control

• Audit logging

• SNS Alerts

Redshift

Page 19: [よくわかるAmazon Redshift]Amazon Redshift最新情報と導入事例のご紹介

Visibility and control

• Audit logging

• SNS Alerts

Amazon S3

Amazon Redshift

Database Activity

Logins, Login failures,

Queries, Loads

System Activity

Creates, Changes,

Deletes, Resizes

AWS

CloudTrail

Page 20: [よくわかるAmazon Redshift]Amazon Redshift最新情報と導入事例のご紹介

Visibility and control

• Audit logging

• SNS Alerts

Amazon

Redshift SNS

Topic

Monitoring

Security

Maintenance

Errors

Page 21: [よくわかるAmazon Redshift]Amazon Redshift最新情報と導入事例のご紹介

Batch operations

• Cluster Creation

• Faster Resize

Amazon

Redshift

Amazon S3

Amazon

EMR

Amazon

EC2

Corporate

Data Center

Page 22: [よくわかるAmazon Redshift]Amazon Redshift最新情報と導入事例のご紹介

Batch operations

• Cluster Creation

• Faster Resize

Amazon

Redshift

Amazon S3

Amazon

EMR

Amazon

EC2

Corporate

Data Center

Page 23: [よくわかるAmazon Redshift]Amazon Redshift最新情報と導入事例のご紹介

Batch operations

• Cluster Creation

• Faster Resize

15-20 min

3 min

Page 24: [よくわかるAmazon Redshift]Amazon Redshift最新情報と導入事例のご紹介

Batch operations

• Cluster Creation

• Faster Resize

29 hours

7 hours

Page 25: [よくわかるAmazon Redshift]Amazon Redshift最新情報と導入事例のご紹介

Performance & Concurrency

Page 26: [よくわかるAmazon Redshift]Amazon Redshift最新情報と導入事例のご紹介

Performance & Concurrency

692.8s

34.9s

< 2%

Page 27: [よくわかるAmazon Redshift]Amazon Redshift最新情報と導入事例のご紹介

Performance & Concurrency

5,951.7s

2,151.9s

Page 28: [よくわかるAmazon Redshift]Amazon Redshift最新情報と導入事例のご紹介

Performance & Concurrency

15

50

Page 29: [よくわかるAmazon Redshift]Amazon Redshift最新情報と導入事例のご紹介

How Customers Leverage Amazon Redshift

Amazon Redshift 活用事例

Page 30: [よくわかるAmazon Redshift]Amazon Redshift最新情報と導入事例のご紹介

Common Customer Use Cases

• Reduce costs by extending DW rather than adding HW

• Migrate completely from existing DW systems

• Respond faster to

business; provision in minutes

• Improve performance by an order of magnitude

• Make more data available for analysis

• Access business data via standard reporting tools

• Add analytic functionality to applications

• Scale DW capacity as demand grows

• Reduce HW & SW costs by an order of magnitude

Traditional Enterprise DW Companies with Big Data SaaS Companies

Page 31: [よくわかるAmazon Redshift]Amazon Redshift最新情報と導入事例のご紹介

Amazon Redshift Customers

Page 32: [よくわかるAmazon Redshift]Amazon Redshift最新情報と導入事例のご紹介

Japanese Redshift Customer – ALBERT

• Business Challenge

– Given their data volumes, RDBMS tuning and archiving was causing them a lot of

operational pain and costing them money

• Why AWS?

– Amazon Redshift’s performance and ability to handle large data sets allowed them to

make it the core engine of their analytics, enabling them to provide a private DMP (Data Management Platform) for their customers on the Cloud

– PostgreSQL is their primary RDBMS, and connectivity by PostgreSQL drivers is big technical advantage to choose Redshift.

• Benefits for their business

– Ability to start small and scale as needed

– Scalability and flexibility dramatically lowered the cost of ownership

Page 33: [よくわかるAmazon Redshift]Amazon Redshift最新情報と導入事例のご紹介

Japanese Redshift Customer – Sansan

• Business Challenge – Since “Eight” is business card management solution for consumers, they

needed infrastructure that could start small and scale as needed

• Why AWS? – When they tried out AWS first, they were surprised with the ease of use. AWS

functionality and elasticity were critical factors

• Benefits for their business – Lower costs substantially using reserved instances

– Automation is a key to reduce operational and administration costs. They utilize services such as Amazon SES and Amazon SWF.

– They use Redshift for KPI analytics of their services.

Page 34: [よくわかるAmazon Redshift]Amazon Redshift最新情報と導入事例のご紹介

Growing ecosystem

Page 35: [よくわかるAmazon Redshift]Amazon Redshift最新情報と導入事例のご紹介

Multiple Data Loading Options

• Parallel upload to Amazon S3

• AWS Direct Connect

• AWS Import/Export

• ETL Software

• Systems integrators

Data Integration

Systems Integrators

Page 36: [よくわかるAmazon Redshift]Amazon Redshift最新情報と導入事例のご紹介

Customers on Performance

“Redshift is twenty times faster than Hive” (5x – 20x reduction in query times) link

“Queries that used to take hours came back in seconds. Our analysts are orders of magnitude

more productive.” (20x – 40x reduction in query times) link

…[Redshift] performance has blown away everyone here (we generally see 50-100x speedup

over Hive). link

“Did I mention it's ridiculously fast? We'll be using it immediately to provide our analysts

an alternative to Hadoop.”

“We saw…2x improvement in query times and a 50% reduction in costs”

We regularly process multibillion row datasets and we do that in a matter of hours. link

Page 37: [よくわかるAmazon Redshift]Amazon Redshift最新情報と導入事例のご紹介

Customers on Cost

“[Amazon Redshift] took an industry famous for its opaque pricing, high TCO and unreliable

results and completely turned it on its head.” link

“[Redshift] cost saving is even more impressive…Our analysts like [Redshift] so much they

don’t want to go back.” (4x reduction in cost over HIVE) link

“We saw 50% reduction in costs”

“[Redshift] has reduced our storage and processing costs significantly, helping us to realize

another 60-70 percent savings.” link

“We found that Amazon Redshift offers the performance we needed while freeing us from

the licensing costs of our previous solution” link

“Not only did we avoid 3 months of development work [we] saved approximately $80,000 in

labor…Competitive Advantage realized with just a few clicks.”

Page 38: [よくわかるAmazon Redshift]Amazon Redshift最新情報と導入事例のご紹介

Customer on Ease of Use

“We can spin up an Amazon Redshift cluster, take a snapshot, and scale servers in minutes

instead of days.” link

“With Amazon Redshift and Tableau, anyone in the company can set up any queries they

like…It’s very flexible.” link

“Customers can get consistent, accurate, and useful data fast - in weeks not months or years.”

link

“Compared to Hadoop [Redshift] is much easier for analysts to use. What may have been a

Hadoop project can become just a query in Redshift.” link

“Amazon Redshift is simple to use and reliable. With one click, we can rapidly scale up or down

in real time in alignment with business requirements.” link

“…our team was able to provision Redshift in a matter hours vs. weeks with on-premises

servers.”

Page 39: [よくわかるAmazon Redshift]Amazon Redshift最新情報と導入事例のご紹介

AWS Marketplace

• Find software to use with Amazon

Redshift

• One-click deployments

• Flexible pricing options

http://aws.amazon.com/marketplace

Page 40: [よくわかるAmazon Redshift]Amazon Redshift最新情報と導入事例のご紹介

Questions?

Page 41: [よくわかるAmazon Redshift]Amazon Redshift最新情報と導入事例のご紹介

APPENDIX

Page 42: [よくわかるAmazon Redshift]Amazon Redshift最新情報と導入事例のご紹介

Resources

• Detail Pages

– http://aws.amazon.com/redshift

– https://aws.amazon.com/marketplace/redshift/

• New Features – http://docs.aws.amazon.com/redshift/latest/dg/doc-history.html

– http://docs.aws.amazon.com/redshift/latest/mgmt/document-history.html

• Best Practices

– http://docs.aws.amazon.com/redshift/latest/dg/c_loading-data-best-practices.html

– http://docs.aws.amazon.com/redshift/latest/dg/c_designing-tables-best-practices.html

– http://docs.aws.amazon.com/redshift/latest/dg/c-optimizing-query-performance.html

• Presentations & Webinars: – http://www.youtube.com/watch?v=JxLpj_TnisM (2013 SF Summit Presentation)

– http://www.youtube.com/watch?v=R1m-fwzXMow (Best Practices 1 of 2)

– http://www.youtube.com/watch?v=7ySzRTOyK6o (Best Practices 2 of 2)

Page 43: [よくわかるAmazon Redshift]Amazon Redshift最新情報と導入事例のご紹介

Amazon Redshift dramatically reduces I/O

• Column storage

• Data compression

• Zone maps

• Direct-attached storage

• With row storage you do

unnecessary I/O

• To get total amount, you have to

read everything

ID Age State Amount

123 20 CA 500

345 25 WA 250

678 40 FL 125

957 37 WA 375

Page 44: [よくわかるAmazon Redshift]Amazon Redshift最新情報と導入事例のご紹介

• With column storage, you only

read the data you need

ID Age State Amount

123 20 CA 500

345 25 WA 250

678 40 FL 125

957 37 WA 375

Amazon Redshift dramatically reduces I/O

• Column storage

• Data compression

• Zone maps

• Direct-attached storage

Page 45: [よくわかるAmazon Redshift]Amazon Redshift最新情報と導入事例のご紹介

analyze compression listing;

Table | Column | Encoding

---------+----------------+----------

listing | listid | delta

listing | sellerid | delta32k

listing | eventid | delta32k

listing | dateid | bytedict

listing | numtickets | bytedict

listing | priceperticket | delta32k

listing | totalprice | mostly32

listing | listtime | raw

Slides not intended for redistribution.

Amazon Redshift dramatically reduces I/O

• Column storage

• Data compression

• Zone maps

• Direct-attached storage

• COPY compresses automatically

• You can analyze and override

• More performance, less cost

Page 46: [よくわかるAmazon Redshift]Amazon Redshift最新情報と導入事例のご紹介

Amazon Redshift dramatically reduces I/O

• Column storage

• Data compression

• Zone maps

• Direct-attached storage

10 | 13 | 14 | 26 |…

… | 100 | 245 | 324

375 | 393 | 417…

… 512 | 549 | 623

637 | 712 | 809 …

… | 834 | 921 | 959

10

324

375

623

637

959

• Track the minimum and maximum

value for each block

• Skip over blocks that don’t contain

relevant data

Page 47: [よくわかるAmazon Redshift]Amazon Redshift最新情報と導入事例のご紹介

Amazon Redshift dramatically reduces I/O

• Column storage

• Data compression

• Zone maps

• Direct-attached storage

• Use local storage for performance

• Maximize scan rates

• Automatic replication and

continuous backup

• HDD & SSD platforms

Page 48: [よくわかるAmazon Redshift]Amazon Redshift最新情報と導入事例のご紹介

Amazon Redshift parallelizes and distributes everything

• Query

• Load

• Backup/Restore

• Resize

Page 49: [よくわかるAmazon Redshift]Amazon Redshift最新情報と導入事例のご紹介

• Load in parallel from Amazon S3 or

Amazon DynamoDB or any SSH

connection

• Data automatically distributed and

sorted according to DDL

• Scales linearly with number of nodes

Amazon Redshift parallelizes and distributes everything

• Query

• Load

• Backup/Restore

• Resize

Page 50: [よくわかるAmazon Redshift]Amazon Redshift最新情報と導入事例のご紹介

• Backups to Amazon S3 are automatic, continuous

and incremental

• Configurable system snapshot retention period. Take

user snapshots on-demand

• Cross region backups for disaster recovery

• Streaming restores enable you to resume querying

faster

Amazon Redshift parallelizes and distributes everything

• Query

• Load

• Backup/Restore

• Resize

Page 51: [よくわかるAmazon Redshift]Amazon Redshift最新情報と導入事例のご紹介

• Resize while remaining online

• Provision a new cluster in the background

• Copy data in parallel from node to node

• Only charged for source cluster

Amazon Redshift parallelizes and distributes everything

• Query

• Load

• Backup/Restore

• Resize

Page 52: [よくわかるAmazon Redshift]Amazon Redshift最新情報と導入事例のご紹介

• Automatic SQL endpoint switchover

via DNS

• Decommission the source cluster

• Simple operation via Console or API

Amazon Redshift parallelizes and distributes everything

• Query

• Load

• Backup/Restore

• Resize