대용량 데이타 쉽고 빠르게 분석하기 :: 김일호 솔루션즈 아키텍트 :: gaming...
TRANSCRIPT
대용량데이타쉽고빠르게분석하기
Demo Day.
Amazon RedshiftAmazon Elastic MapReduce
Amazon Glacier
Amazon DynamoDB
Amazon Machine Learning
Amazon Kinesis
Combinational Services for Data analytics
Data WarehouseSemi-structured NoSQL Predictive Models Other AppsStreaming
Amazon Simple Storage Service
Data Lake Archive
Log Generator
Create EC2 instance to generate logs
• AMI -> Public Images -> AMI Name : da-hands-on
• Select the AMI and Click Launch
• Instance Type: t2.medium
• Tag: Name - myname-dev
• Security group with SSH ingress opened
$ aws ec2 create-security-group --group-name andy-ssh-sg --description "open SSH only" --vpc-id vpc-33d27056 {
"GroupId": "sg-7f3dd918"}
$ aws ec2 authorize-security-group-ingress --group-id sg-7f3dd918 --protocol tcp --port 22 --cidr 0.0.0.0/0
$ aws ec2 run-instances --image-id ami-5c2beb3d --count 1 --instance-type t2.medium --key-name ilho_tokyo --security-group-ids sg-7f3dd918 --subnet-id subnet-1a7bad43 --associate-public-ip-address{
"OwnerId": "806506827877", "ReservationId": "r-a58c5e2a", "Groups": [], "Instances": [
{"Monitoring": {
…..................
Create S3 bucket
• Bucket Name: myname-game-log
• Region: Tokyo
$ aws s3 mb s3://andy-game-log --region ap-northeast-1make_bucket: andy-game-log
Amazon RedshiftAmazon Elastic MapReduce
Amazon Glacier
Amazon DynamoDB
Amazon Machine Learning
Amazon Kinesis
Generating Logs to stream them to Kinesis
Data WarehouseSemi-structured NoSQL Predictive Models Other AppsStreaming
Amazon Simple Storage Service
Data Lake Archive
Log Generator
Create Kinesis Steam
• Stream Name: myname-game-stream
• Number of Shards: 1
$ aws kinesis create-stream --stream-name andy-game-stream --shard-count 1
$ aws kinesis list-streams
Amazon RedshiftAmazon Elastic MapReduce
Amazon Glacier
Amazon DynamoDB
Amazon Machine Learning
Amazon Kinesis
Combinational Services for Data analytics
Data WarehouseSemi-structured NoSQL Predictive Models Other AppsStreaming
Amazon Simple Storage Service
Data Lake Archive
Log Generator
Launch Redshift
• Cluster Identifier: myname-game-dw
• Database Name: mynamegame
• Database Port: 5439 (default)
• Node Type: dc1.large
• Cluster Type: Single Node
• Number of Compute Nodes: 1 (required for multi-node)
$ aws redshift create-cluster --cluster-identifier andy-game-dw --db-name mydb --node-type dc1.large --cluster-type single-node --publicly-accessible --master-username admin --master-user-password GamingonAWS2016 {
"Cluster": {"IamRoles": [], "ClusterVersion": "1.0", "NodeType": "dc1.large", "PubliclyAccessible": true, "Tags": [], "MasterUsername": "admin", "ClusterParameterGroups": [
{"ParameterGroupName": "default.redshift-1.0", "ParameterApplyStatus": "in-sync"
}], "Encrypted": false,
…....................
Let’s connect to EC2 instances
$ ssh -i [$mykey].pem [email protected]
Prepared python demo scripts
[ec2-user@ip-10-10-0-13 data_analytics_demo]$ ls -1amazon_kclpyamazon_kclpy_helper.pyconfig.jsonconfig.pyconfig.pycconsumer.propertiesconsumer.pydemo_util.pydemo_util.pycinserter.pykclkinesis_helper.pykinesis_helper.pycLICENSElogsreader.pyrun_consumer.shsimulator.pysummarizer.py
Generating Logs to Kinesis stream
$ Python Simulator.py
https://github.com/awslabs/kinesis-poster-worker
Consuming Logs from Kinesis stream
$ python amazon_kclpy_helper.py --print_command --java $(which java) --properties ./consumer.properties
https://github.com/awslabs/amazon-kinesis-client-python
Checking Logs files in S3 bucket
$ aws s3 ls myname-game-log
Amazon RedshiftAmazon Elastic MapReduce
Amazon Glacier
Amazon DynamoDB
Amazon Machine Learning
Amazon Kinesis
What we’ve done so far.
Data WarehouseSemi-structured NoSQL Predictive Models Other AppsStreaming
Amazon Simple Storage Service
Data Lake Archive
Log Generator
Copy Log data from S3 to Redshift
$ python inserter.py
Checking log tables in Redshift
$ psql -h hostname -p 5439 -U username -d dbname
Dbname=# select * from log limit 10;
….............
Amazon RedshiftAmazon Elastic MapReduce
Amazon Glacier
Amazon DynamoDB
Amazon Machine Learning
Amazon Kinesis
Creating a new table in Redshift
Data WarehouseSemi-structured NoSQL Predictive Models Other AppsStreaming
Amazon Simple Storage Service
Data Lake Archive
Log Generator
Creating summary tables from log table
Creating a summary table from log table
$ python summarizer.py
Run Business Intelligence Tools
Amazon RedshiftAmazon Elastic MapReduce
Amazon
Glacier
Amazon DynamoDB
Amazon Machine Learning
Amazon Kinesis
Data WarehouseSemi-structured NoSQL Predictive Models Other AppsStreaming
Amazon Simple Storage Service
Data Lake Archive
Log Generator
Adding ElasticSearch
Loading Streaming Data into Amazon Elasticsearch Service
Amazon RedshiftAmazon Elastic MapReduce
Amazon
Glacier
Amazon DynamoDB
Amazon Machine Learning
Amazon Kinesis
Data WarehouseSemi-structured NoSQL Predictive Models Other AppsStreaming
Amazon Simple Storage Service
Data Lake Archive
Log Generator
Creating summary tables from log table
Amazon ElasticsearchService
Launch Elasticsearch
• Go to AWS management console
• Launch Elasticsearch domain
• Set access policy to public open for Demo only
Loading Streaming Data into Amazon Elasticsearch Service
Amazon RedshiftAmazon Elastic MapReduce
Amazon
Glacier
Amazon DynamoDB
Amazon Machine Learning
Amazon Kinesis
Data WarehouseSemi-structured NoSQL Predictive Models Other AppsStreaming
Amazon Simple Storage Service
Data Lake Archive
Log Generator
Creating summary tables from log table
Amazon ElasticsearchService
AWSLambda
Creating and configuring Lambda function
• https://github.com/awslabs/amazon-elasticsearch-lambda-samples
• Download a sample JS file
• Install required Nodejs packages
• Modify ElasticSearch endpoint
• Zip all files including node_modules
• Upload zip file to Lambda function
• Set lambda role to access Elasticsearch
Checking result at Kibana
• Query
• Result
Querying Amazon Kinesis Streams Directly with SQL and Spark Streaming?
EMR and Spark Streaming
Amazon RedshiftAmazon Elastic MapReduce
Amazon
Glacier
Amazon DynamoDB
Amazon Machine Learning
Amazon Kinesis
Data WarehouseSemi-structured NoSQL Predictive Models Other AppsStreaming
Amazon Simple Storage Service
Data Lake Archive
Log Generator
Creating summary tables from log table
Amazon ElasticsearchService
AWSLambda
Creating EMR Cluster with Spark is a simple job
$ aws emr create-cluster --release-label emr-4.2.0 --applications Name=Spark Name=Hive --ec2-attributes KeyName=myKey --use-default-roles --instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m3.xlarge InstanceGroupType=CORE,InstanceCount=2,InstanceType=m3.xlarge --bootstrap-actions Path=s3://aws-bigdata-blog/artifacts/Querying_Amazon_Kinesis/DownloadKCLtoEMR400.sh,Name=InstallKCLLibs
Managing resources is easy. But building logics is complicating.
• A fully managed service for continuously querying streaming data using standard SQL
• Use cases: Preprocessing streams / Most frequently occurring values Counting distinct values / Simple alerts / Detecting anomalies on a stream / Post processing in application stream
Real-time Log Analytics
Amazon Kinesis Analytics
Amazon Kinesis Analytics
Amazon RedshiftAmazon Elastic MapReduce
Amazon
Glacier
Amazon DynamoDB
Amazon Machine Learning
Amazon Kinesis
Data WarehouseSemi-structured NoSQL Predictive Models Other AppsStreaming
Amazon Simple Storage Service
Data Lake Archive
Log Generator
Creating summary tables from log table
Amazon ElasticsearchService
AWSLambda
Amazon Kinesis
Analytics
Demo.
Adding Amazon Machine Learning
Adding Amazon Machine Learning is your homework.
Amazon Machine Learning 게임에서활용해보기:: 김일호 :: AWS Summit Seoul 2016https://www.youtube.com/watch?v=Bs1QZMlwmLM&feature=youtu.be
A hint
Amazon RedshiftAmazon Elastic MapReduce
Amazon
Glacier
Amazon DynamoDB
Amazon Machine Learning
Amazon Kinesis
Data WarehouseSemi-structured NoSQL Predictive Models Other AppsStreaming
Amazon Simple Storage Service
Data Lake Archive
Log Generator
Creating summary tables from log table
Amazon ElasticsearchService
AWSLambda
Amazon Kinesis
Analytics
Thank you!