collecting big data with s3/cloudfront logging

of 22 /22
COLLECTING BIG DATA WITH S3/CLOUDFRONT LOGGING Moty Michaely, VP R&D Xplenty Data Integration-as-a- Service

Upload: xplenty

Post on 11-Aug-2014

475 views

Category:

Data & Analytics


0 download

DESCRIPTION

There are several ways of collecting big data, one the most promising is S3/CloudFront logging. It’s low cost and quick to implement. Let's dive in and see how to setup S3/CloudFront logging with your application.

TRANSCRIPT

Page 1: Collecting Big Data with S3/CloudFront Logging

COLLECTING BIG DATA WITH S3/CLOUDFRONT LOGGINGMoty Michaely, VP R&DXplenty Data Integration-as-a-Service

Page 2: Collecting Big Data with S3/CloudFront Logging

In our recent article, “Scale Your Data Collection on the Cloud Like a Champ”, we reviewed several ways of collecting big data, the most promising of which was S3/CloudFront logging. It’s low cost and quick to implement. Now we’d like to dig deeper and show how to setup S3/CloudFront logging with your application.

Page 3: Collecting Big Data with S3/CloudFront Logging

DEFINE APP DATASit back and think - which data would you like to collect? Which app events should be logged? These could be page visits, mouse clicks, logins, errors, etc. Some of them may include parameters such as the page visit URL. Write them all down. Be as thorough as possible so you don’t lose any precious data.

Page 5: Collecting Big Data with S3/CloudFront Logging

CREATE AN S3 BUCKETGo to the S3 dashboard and create a bucket for saving the logs. Note that the bucket must have a unique name across Amazon’s service and adhere to DNS rules: 3-63 characters, only letters numbers and periods, shouldn't look like an IP address, and no underscores. Don’t turn on logging - we will do so via CloudFront.(See the screenshot on the next slide for a visual explanation)

Page 6: Collecting Big Data with S3/CloudFront Logging

CREATE AN S3 BUCKET (SCREENSHOT)

Page 7: Collecting Big Data with S3/CloudFront Logging

CREATE EVENT IMAGESSet up directories in the image bucket, for example /mouse, to organize events by categories, and create 1x1 pixel images (see previous post) for all the events that you defined in the first step, e.g. click.png, login.png, error.png. Don’t worry about event parameters at the moment, we will deal with them shortly.

All files uploaded to S3 are set as private, so make sure to change the file permissions to public. You may use tools such as CloudBerry Explorer or S3 Browser to do so and much more.

Page 8: Collecting Big Data with S3/CloudFront Logging

CREATE EVENT IMAGES CONT.Set HTTP headers for all the images so that they will be cached by CloudFront, thus saving GET requests from CloudFront edge locations to S3. Go to the relevant bucket, check the image files on the left, click Actions at the top, choose Properties, and open the Metadata section.

Add the following metadata line and click save:

▪ Cache-Control: max-age=31536000

Page 9: Collecting Big Data with S3/CloudFront Logging

CREATE EVENT IMAGES (SCREENSHOT)

Page 10: Collecting Big Data with S3/CloudFront Logging

CREATE A CLOUDFRONT DISTRIBUTIONCreating a CloudFront distribution costs extra, but it’s mandatory - it logs the query string, adds extra log info such as edge locations, and helps to deliver files via Amazon’s CDN to shorten load times. Access the CloudFront dashboard and create a web distribution for the image S3 bucket. Make sure that Use Origin Cache Headers is set under Object Caching (it’s the default setting).

Page 11: Collecting Big Data with S3/CloudFront Logging

CREATE A CLOUDFRONT DISTRIBUTION CONT.Note that the distribution gets a random domain name. It could take a while before it starts working because the DNS servers need to be updated to support it. You can also set a more friendly domain using the Alternate Domain Names (CNAMEs) option under Distribution Settings, though it requires configuring your DNS settings so that your domain points to CloudFront’s domain name. See Amazon’s documentation for more info.

Page 12: Collecting Big Data with S3/CloudFront Logging

CREATE A CLOUDFRONT DISTRIBUTION (SCREENSHOT 1)

Page 13: Collecting Big Data with S3/CloudFront Logging

CREATE A CLOUDFRONT DISTRIBUTION (SCREENSHOT 2)

Page 14: Collecting Big Data with S3/CloudFront Logging

TURN LOGGING ONStill in the CloudFront dashboard, check the distribution on the left, click Distribution Settings at the top, click Edit under the General tab, enable logging, and insert the bucket where you want to store the logs.

Page 15: Collecting Big Data with S3/CloudFront Logging

TURN LOGGING ON (SCREENSHOT 1)

Page 16: Collecting Big Data with S3/CloudFront Logging

TURN LOGGING ON (SCREENSHOT 2)

Page 17: Collecting Big Data with S3/CloudFront Logging

CODE A FUNCTION TO CALL EVENTSTime to get your hands dirty and write a method that registers events, or call one of your app’s developers to do it for you. The code could be on the client side, server side, or both depending on the architecture. The method should simply send an asynchronous HTTP GET request to the relevant image URL, e.g. to http://logs.xplenty.com/mouse/click.png (links in this format for demo purposes only, not operational).

If you need to send additional event parameters, use the query string (don’t forget URL encoding), e.g. http://logs.xplenty.com/mouse/click.png?id=login&url=http%3A%2F%2Fwww.example.com%2Flogin

Page 18: Collecting Big Data with S3/CloudFront Logging

EXAMPLE CODE TO CALL EVENTS$.CloudFrontLog = function (attr) {

var url = 'http://logs.xplenty.com/' + attr.category + '/' + attr.action + '.png',

data = {

id: attr.id,

url: attr.url

};

return $.get(url, data);

};

Page 19: Collecting Big Data with S3/CloudFront Logging

CALL THE EVENTSDig through your app’s code and add event calls using the method that you’ve just written. This will collect the data that you defined in step 1. Here’s a jQuery code sample for logging client-side button clicks:

$('.btn').click(function(e) { var id = $(this).attr('id'); $.CloudFrontLog({ action: 'click', category: 'mouse', id: id, url: location.href });});

Page 20: Collecting Big Data with S3/CloudFront Logging

TESTUse your staging environment to call events via the application and check that the logs are generated accordingly. Patience young padawan, it may take an hour or so until Amazon writes them.

Page 21: Collecting Big Data with S3/CloudFront Logging

GO LIVE!Everything should be ready for you to collect big data like a champ - update the production environment and let the logging begin. Don't know what to do with the data? See how to analyze AWS logs in 15 minutes.