aws for big data experts

Post on 14-May-2015

996 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

presentation for BigDataCampLA

TRANSCRIPT

AWS for Big Data Experts

@LynnLangit

Nov 2013

Data Expertise / Lynn Langit

Practicing Architect• Cloud Deployments (Azure, AWS, Google)

Technical author / trainer• Google Cloud Developer Series• SQL Server 2012 Developer Series • Cloudera Certified Developer• 2 books on SQL Server BI

Industry awards• Microsoft – MVP for SQL Server • Google – GDE for Cloud Platform• 10Gen – Master for MongoDB

Former MSFT FTE• 4 years

What and Why AWS?

AWS Amazon’s cloud

Large Set of services• Compute• Data• More

Market leader• In market

longest• Usually

cheapest • Most often used

in production

Amazon Web Services

5

How to Work with AWS

• Web Console

• Command Line Tools

• AWS SDK and IDE Tools

EC2 – Virtual Machines (AMIs)

EC2 – VMs (AMIs) from AWS Marketplace

8

Demo - EC2Virtual Machines

Understanding EC2 storage options

S3 -- Storage

S3 – bucket properties

12

Demo – S3 Storage

Glacier -- storage & archiving

14

Demo – GlacierArchival Storage

RDS – partially managed SQL Server and more…

16

Demo – RDSPartially managed MySQL, Oracle or SQL Server

RDS vs. EC2 for SQL Server

• Provisioned IO – performance guarantees

• Scheduled backups • Point in time restores• Scheduled

maintenance windows• Full use of all SQL

tools, SSMS, Profiler, DTA, etc…

• Supports Availability Groups (requires 2012 Enterprise)

• Cross-regional snapshots

Why RDS costs more

Redshift – Warehouse as a Service

19

Demo – RedshiftData Warehousing with PostgreSQL

DynamoDBfor fast NoSQL with SSDs

21

Demo – DynamoDBNoSQL (wide-column store) on SSD

Elastic MapReducefor easy Hadoop

23

Demo – MapReduceHadoop on AWS

24

New Services - AWS:Invent

Kinesis – real-time processing of streaming Big Data (into AppStream – deliver streaming applications to clients from AWSCloudTrail – capture AWS API callsRDS addition – now supports PostgreSQLWorkspaces – Virtual Desktops for PC or Mac

Data Pipelines – automated data transfer

26

Demo – Data Pipeline

Build data flows on AWS

Elastic Beanstalkfor application scalability

28

Demo – BeanstalkPaaS on AWS

29

AWS SDK for Visual Studio

30

Demo – AWS SDKAdd-in for Visual Studio and .NET

Cloud Database Services by Vendor

AWS Google MicrosoftVirtual Machines EC2 GCE – Linux only Azure VM

Cloud RDBMS RDS - SQL Server, MySQL, OracleRedshift - Postgres

mySQL > MariaDB SQL Azure

NoSQL bucketsKey-Value stores

EBSS3GlacierDynamoDB

Cloud Storage HR Datastore on GAE

Azure Blobs Azure Tables

Pipelines Data Pipelines Via APIs only SSIS (on-premises)

Document MongoDB on EC2 None MongoDB on Windows Azure

Hadoop MapReduce or Dremel

MapReduce on EC2 using S3

Big Query HDInsight (HDFS)

OtherDatasetsStreamingMachine Learning

KinesisEBS volumes w/datasets

FreebaseTranslation APIFull-text searchPrediction API

StreamInsightAzure Marketplace

How much does it cost?

Getting Started – Free Tier

Creative Financing

• Use what you need and no more, i.e. instance size, storage size…

• Watch for price drops – RDS price decrease this week

Regular Pricing

• Pause EC2 instances to reduce compute charges• Delete EC2 instances to reduce storage charges

Smart EC2 Instance Usage

• Set pricing alerts• Use spot pricing• Re-selling compute / storage

Vanity Pricing

35

Example: EC2 Spot Pricing

36

Example: EC2 Reserved Pricing

37

Tip: Use AWS ‘Trusted Advisor’

38

Tip: Use Pricing Calculators

Example – from RightScale ‘PlanForCloud’

Conclusions

EC2 for testing, training and production (IaaS)

S3 for archiving R/W

Glacier for archiving W fast & cheap, R slow & expensive

RDS for HA SQL Server

Redshift for Data Warehousing on demand

DynamoDB for fast NoSQL – on SSDs

Elastic Map Reduce for easy Hadoop MapReduce

www.TeachingKidsProgramming.org• Free Courseware (Java, SmallBasic or C# / Pluralsight)• Do a Recipe Teach a Kid (Ages 10 ++)• Dec 2013 – Code.org – ‘Hour of Code’ education

partner

• recipes)

Keep Learning

Twitter: @LynnLangitYouTube:http://www.youtube.com/user/SoCalDevGal

Hire me• To help build your BI/Big Data

solution• To teach your team next gen BI• To learn more about using NoSQL

solutions

top related