aws for big data experts

41
AWS for Big Data Experts @LynnLangit Nov 2013

Upload: lynn-langit

Post on 14-May-2015

996 views

Category:

Technology


1 download

DESCRIPTION

presentation for BigDataCampLA

TRANSCRIPT

Page 1: AWS for Big Data Experts

AWS for Big Data Experts

@LynnLangit

Nov 2013

Page 2: AWS for Big Data Experts

Data Expertise / Lynn Langit

Practicing Architect• Cloud Deployments (Azure, AWS, Google)

Technical author / trainer• Google Cloud Developer Series• SQL Server 2012 Developer Series • Cloudera Certified Developer• 2 books on SQL Server BI

Industry awards• Microsoft – MVP for SQL Server • Google – GDE for Cloud Platform• 10Gen – Master for MongoDB

Former MSFT FTE• 4 years

Page 3: AWS for Big Data Experts

What and Why AWS?

AWS Amazon’s cloud

Large Set of services• Compute• Data• More

Market leader• In market

longest• Usually

cheapest • Most often used

in production

Page 4: AWS for Big Data Experts

Amazon Web Services

Page 5: AWS for Big Data Experts

5

How to Work with AWS

• Web Console

• Command Line Tools

• AWS SDK and IDE Tools

Page 6: AWS for Big Data Experts

EC2 – Virtual Machines (AMIs)

Page 7: AWS for Big Data Experts

EC2 – VMs (AMIs) from AWS Marketplace

Page 8: AWS for Big Data Experts

8

Demo - EC2Virtual Machines

Page 9: AWS for Big Data Experts

Understanding EC2 storage options

Page 10: AWS for Big Data Experts

S3 -- Storage

Page 11: AWS for Big Data Experts

S3 – bucket properties

Page 12: AWS for Big Data Experts

12

Demo – S3 Storage

Page 13: AWS for Big Data Experts

Glacier -- storage & archiving

Page 14: AWS for Big Data Experts

14

Demo – GlacierArchival Storage

Page 15: AWS for Big Data Experts

RDS – partially managed SQL Server and more…

Page 16: AWS for Big Data Experts

16

Demo – RDSPartially managed MySQL, Oracle or SQL Server

Page 17: AWS for Big Data Experts

RDS vs. EC2 for SQL Server

• Provisioned IO – performance guarantees

• Scheduled backups • Point in time restores• Scheduled

maintenance windows• Full use of all SQL

tools, SSMS, Profiler, DTA, etc…

• Supports Availability Groups (requires 2012 Enterprise)

• Cross-regional snapshots

Why RDS costs more

Page 18: AWS for Big Data Experts

Redshift – Warehouse as a Service

Page 19: AWS for Big Data Experts

19

Demo – RedshiftData Warehousing with PostgreSQL

Page 20: AWS for Big Data Experts

DynamoDBfor fast NoSQL with SSDs

Page 21: AWS for Big Data Experts

21

Demo – DynamoDBNoSQL (wide-column store) on SSD

Page 22: AWS for Big Data Experts

Elastic MapReducefor easy Hadoop

Page 23: AWS for Big Data Experts

23

Demo – MapReduceHadoop on AWS

Page 24: AWS for Big Data Experts

24

New Services - AWS:Invent

Kinesis – real-time processing of streaming Big Data (into AppStream – deliver streaming applications to clients from AWSCloudTrail – capture AWS API callsRDS addition – now supports PostgreSQLWorkspaces – Virtual Desktops for PC or Mac

Page 25: AWS for Big Data Experts

Data Pipelines – automated data transfer

Page 26: AWS for Big Data Experts

26

Demo – Data Pipeline

Build data flows on AWS

Page 27: AWS for Big Data Experts

Elastic Beanstalkfor application scalability

Page 28: AWS for Big Data Experts

28

Demo – BeanstalkPaaS on AWS

Page 29: AWS for Big Data Experts

29

AWS SDK for Visual Studio

Page 30: AWS for Big Data Experts

30

Demo – AWS SDKAdd-in for Visual Studio and .NET

Page 31: AWS for Big Data Experts

Cloud Database Services by Vendor

AWS Google MicrosoftVirtual Machines EC2 GCE – Linux only Azure VM

Cloud RDBMS RDS - SQL Server, MySQL, OracleRedshift - Postgres

mySQL > MariaDB SQL Azure

NoSQL bucketsKey-Value stores

EBSS3GlacierDynamoDB

Cloud Storage HR Datastore on GAE

Azure Blobs Azure Tables

Pipelines Data Pipelines Via APIs only SSIS (on-premises)

Document MongoDB on EC2 None MongoDB on Windows Azure

Hadoop MapReduce or Dremel

MapReduce on EC2 using S3

Big Query HDInsight (HDFS)

OtherDatasetsStreamingMachine Learning

KinesisEBS volumes w/datasets

FreebaseTranslation APIFull-text searchPrediction API

StreamInsightAzure Marketplace

Page 32: AWS for Big Data Experts

How much does it cost?

Page 33: AWS for Big Data Experts

Getting Started – Free Tier

Page 34: AWS for Big Data Experts

Creative Financing

• Use what you need and no more, i.e. instance size, storage size…

• Watch for price drops – RDS price decrease this week

Regular Pricing

• Pause EC2 instances to reduce compute charges• Delete EC2 instances to reduce storage charges

Smart EC2 Instance Usage

• Set pricing alerts• Use spot pricing• Re-selling compute / storage

Vanity Pricing

Page 35: AWS for Big Data Experts

35

Example: EC2 Spot Pricing

Page 36: AWS for Big Data Experts

36

Example: EC2 Reserved Pricing

Page 37: AWS for Big Data Experts

37

Tip: Use AWS ‘Trusted Advisor’

Page 38: AWS for Big Data Experts

38

Tip: Use Pricing Calculators

Example – from RightScale ‘PlanForCloud’

Page 39: AWS for Big Data Experts

Conclusions

EC2 for testing, training and production (IaaS)

S3 for archiving R/W

Glacier for archiving W fast & cheap, R slow & expensive

RDS for HA SQL Server

Redshift for Data Warehousing on demand

DynamoDB for fast NoSQL – on SSDs

Elastic Map Reduce for easy Hadoop MapReduce

Page 40: AWS for Big Data Experts

www.TeachingKidsProgramming.org• Free Courseware (Java, SmallBasic or C# / Pluralsight)• Do a Recipe Teach a Kid (Ages 10 ++)• Dec 2013 – Code.org – ‘Hour of Code’ education

partner

• recipes)

Page 41: AWS for Big Data Experts

Keep Learning

Twitter: @LynnLangitYouTube:http://www.youtube.com/user/SoCalDevGal

Hire me• To help build your BI/Big Data

solution• To teach your team next gen BI• To learn more about using NoSQL

solutions