big data - hadoop and mapreduce for qa and testing by aditya garg

46
Big Data - Hadoop and MapReduce - new age tools for aid to testing and QA by Aditya Garg Confidential | Copyright © QAAgility Technologies

Upload: qaoth

Post on 12-Jan-2017

377 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: Big Data - Hadoop and MapReduce for QA and testing by Aditya Garg

Big Data - Hadoop and MapReduce -

new age tools for aid to testing and QA

by Aditya Garg

Confidential | Copyright © QAAgility Technologies

Page 2: Big Data - Hadoop and MapReduce for QA and testing by Aditya Garg

Aditya Garg @Adigindia

Co-Founder and Director QAAgility.com Co-founder & Steering Committee Member of Agile Testing

Alliance – run meetup groups across multiple cities Co-creator and licensed trainer of Agile Testing Alliance’s

certifications CP-BAT, CP-MAT, CP-AAT, CP-SAT Co-Author of a book on Selenium Co-Author of a book on Selenium Love Cooking Indian Dishes – From Rajasthan Tasting (Testing) World food Travelling and meeting testers (Get inspired and may be inspire a few)

@adigindiahttps://www.linkedin.com/in/adigarg

Page 3: Big Data - Hadoop and MapReduce for QA and testing by Aditya Garg

Big Data - Hadoop and MapReduce - new age tools for aid to testing and QA

Topic for the presentation

for aid to testing and QA

Page 4: Big Data - Hadoop and MapReduce for QA and testing by Aditya Garg

What is this

Confidential | Copyright © QA Agility Technologies

Page 5: Big Data - Hadoop and MapReduce for QA and testing by Aditya Garg

1. How to test Big Data applications ?2. How can QA and Testing

What are we going to discuss ?

2. How can QA and Testing team use Big Data tools for their testing needs ?

Page 6: Big Data - Hadoop and MapReduce for QA and testing by Aditya Garg

1. How to test Big Data applications ?2. How can QA and Testing

What are we going to discuss ?

2. How can QA and Testing team use Big Data tools for their testing needs ?

Page 7: Big Data - Hadoop and MapReduce for QA and testing by Aditya Garg

What is Big Data ?Is it just too much Hype or

Confidential | Copyright © QA Agility Technologies

Is it just too much Hype or reality ?

Page 8: Big Data - Hadoop and MapReduce for QA and testing by Aditya Garg

Let us start with what exactly is BigData

Confidential | Copyright © QA Agility Technologies

Page 9: Big Data - Hadoop and MapReduce for QA and testing by Aditya Garg

Which Search Engine do you use ?

http://s

earchs

torage.t

echtarg

et.com

/defini

tionall-t

hat

How much data does Google store ?

https://www.cirrusinsight.com/blog/how-much-data-does-google-store

http://s

earchs

torage.t

echtarg

et.com

/defini

tion/Kil

o-mega

-giga-te

ra-peta

-and-a

ll

Page 10: Big Data - Hadoop and MapReduce for QA and testing by Aditya Garg

On Search Engines – Anyone using DuckDuckGo?

Page 11: Big Data - Hadoop and MapReduce for QA and testing by Aditya Garg
Page 12: Big Data - Hadoop and MapReduce for QA and testing by Aditya Garg

Data Explosion

Page 13: Big Data - Hadoop and MapReduce for QA and testing by Aditya Garg

Key Points in Big Data

1.Volume – Data Explosion2.Velocity3.Variety4.Veracity

Page 14: Big Data - Hadoop and MapReduce for QA and testing by Aditya Garg

Key Points in Big Data

Ref: IBM.com

Page 15: Big Data - Hadoop and MapReduce for QA and testing by Aditya Garg

Definition

Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional

Ref: goo.gl/iWZhjJ

management tools or traditional data processing applications. The challenges include capture, curation, storage, search, sharing, transfer, analysis, and visualization. http://www.forbes.com/sites/gilpress/2014/09/03/12-big-data-definitions-whats-yours/#379879e621a9

Page 16: Big Data - Hadoop and MapReduce for QA and testing by Aditya Garg

Big Data Application

1. Finance2. Insurance3. Health Care4. Agriculture5. Defense5. Defense6. Manufacturing7. Aero Space8. Oil and Gas9. Advertisement and Marketing10.Election Campaigns11. List goes on --- applicability across industries

Page 17: Big Data - Hadoop and MapReduce for QA and testing by Aditya Garg

Big Data Application

http://www.forbes.com/sites/bernardmarr/2016/02/03/how-the-super-bowl-uses-big-data-to-change-the-game/?

Page 18: Big Data - Hadoop and MapReduce for QA and testing by Aditya Garg

Big Data Application

http://andrewshamlet.com/2015/12/03/who-will-win-the-2016-us-presidential-nominations/

Page 19: Big Data - Hadoop and MapReduce for QA and testing by Aditya Garg

Ref: http://www.

Big Data Application

http://www.forbes.com/sites/bernardmarr/2016/02/02/this-is-why-dictators-love-big-data/2/#4d413e005844

Page 20: Big Data - Hadoop and MapReduce for QA and testing by Aditya Garg

Lets go back to definition

Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing tools or traditional data processing applications. The challenges include capture, curation, storage, search, sharing, transfer, analysis, and visualization.

Page 21: Big Data - Hadoop and MapReduce for QA and testing by Aditya Garg

Tools solving Big Data Challenge

Confidential | Copyright © QA Agility Technologies

Page 22: Big Data - Hadoop and MapReduce for QA and testing by Aditya Garg

Tool solving the Big Data Challenge

Page 23: Big Data - Hadoop and MapReduce for QA and testing by Aditya Garg

Hadoop – Key components HDFS and MR

*Source Udacity

Page 24: Big Data - Hadoop and MapReduce for QA and testing by Aditya Garg

1. Sqoop takes data from regular RDBMS and puts it into HDFS2. Flume ingests data into HDFS as it is generated by external systems3. HBASE is real time

Hadoop Ecosystem

*Source Udacity

3. HBASE is real time database on top of HDFS4. Hue is a graphical front end to the cluster5. Oozie is workflow management tool6. Mahout is Machine Learning library

Page 25: Big Data - Hadoop and MapReduce for QA and testing by Aditya Garg

HDFS

• HDFS stands for Hadoop Distributed File System, which is the storage system used by Hadoop. The following is a high-level architecture that explains how HDFS architecture that explains how HDFS works.

Page 26: Big Data - Hadoop and MapReduce for QA and testing by Aditya Garg

Map Reduce

Ref: Emanuele Della Valle@manudellavalle

Page 27: Big Data - Hadoop and MapReduce for QA and testing by Aditya Garg

Understanding MapReduceDemo – Word Count

Confidential | Copyright © QA Agility Technologies

Demo – Word CountGiven an input file, count unique words

Page 28: Big Data - Hadoop and MapReduce for QA and testing by Aditya Garg

WordCount – Map Reduce

Reference : http://wearecloud.cz/media/files/prezentace-biz/Big%20Data%20v%20Cloudu.ppt

Page 29: Big Data - Hadoop and MapReduce for QA and testing by Aditya Garg

How can QA and Testing team use Big Data tools

Confidential | Copyright © QA Agility Technologies

team use Big Data tools for their testing needs ?

Page 30: Big Data - Hadoop and MapReduce for QA and testing by Aditya Garg

Problem Statement and Solution using Hadoop and MapReduce

Confidential | Copyright © QA Agility Technologies

and MapReduce

Page 31: Big Data - Hadoop and MapReduce for QA and testing by Aditya Garg

Problem Statement and Solution using Hadoop and MapReduce

Confidential | Copyright © QA Agility Technologies

and MapReduce

Page 32: Big Data - Hadoop and MapReduce for QA and testing by Aditya Garg

MTBT – Multicast Tick by Tick Adapter

Input was exchange feed – Output given to HFT Engine

Legacy Adaptor (3rd Party) connects to the TAP – and converts to a format which can be used by HFT

MTBT - Adaptor

Exchange TAP – Co-location servers listen to it at high speed

can be used by HFT Platforms (Algorithmic Trading Platforms)New Adaptor – being made Inhouse – to increase the speed by 10 Times

HFT Engine

Page 33: Big Data - Hadoop and MapReduce for QA and testing by Aditya Garg

MTBT - Adaptor

Input Output Output over time

Page 34: Big Data - Hadoop and MapReduce for QA and testing by Aditya Garg

MTBT - Adaptor GOAL--------------------------------------------------1. Testing of Fast & dynamic nature of multicast TBT, it is in micro seconds and on an average around 20,000 data points/sec & on expiry/ volatility day, it goes upto 40,000

MTBT – Testing Objective

Input Output Output over timevolatility day, it goes upto 40,000 data points/ sec.2. To check if there is any packet drop.3. To test the generated fresh & accurate order book upto level 20 (configurable)

Page 35: Big Data - Hadoop and MapReduce for QA and testing by Aditya Garg

MTBT - AdaptorSample

Sample

Sample

Sample

Sample

MTBT – Testing Strategy - Sampling

Input Output Output over time

Do A Reverse Comparison

Page 36: Big Data - Hadoop and MapReduce for QA and testing by Aditya Garg

MTBT - Adaptor Challenges--------------------------------------------------1. Manually next to impossible2. Even few seconds samples were running into large MegaBytes (MB) files3. Manually impossible to compare

MTBT – Challenges

Input Output Output over time3. Manually impossible to compare the legacy records with the New code processed records 4. Daily processed data ran into 150 Giga Bytes (GB) plus files

Page 37: Big Data - Hadoop and MapReduce for QA and testing by Aditya Garg

MTBT - Adaptor BIG DATA Problem--------------------------------------------------1. LARGE 150 GB files (legacy and New applications) – VOLUME2. Testing to compare the output and measure the functional

MTBT – It was a BIG DATA Testing problem

Input Output Output over timemeasure the functional effectiveness in real time data environment – VELOCITY3. Packet drops may happen –(VERACITY)4. Variety was not there – except the format of the output file generated was not in similar format – the content/information was there

Page 38: Big Data - Hadoop and MapReduce for QA and testing by Aditya Garg

MTBT – SOLUTION

1 Reduce LEGACY MTBT - Output file into a standard format

2 Reduce NEW INHOUSE MTBT output file into a standard format

3 Compare the two files

4 Generate Report

Page 39: Big Data - Hadoop and MapReduce for QA and testing by Aditya Garg

DEMO

Confidential | Copyright © QA Agility Technologies

Page 40: Big Data - Hadoop and MapReduce for QA and testing by Aditya Garg

1. Record by Record Comparison being done on 8 GB normal Linux server in less than 2 hours2. Automated report generation3. Automated Result shared with

Outcome

Confidential | Copyright © QA Agility Technologies

3. Automated Result shared with stakeholders4. Used again for regression testing and for NFT testing5. Huge Benefits to the client (Time and Money both)

Page 41: Big Data - Hadoop and MapReduce for QA and testing by Aditya Garg

QA team can use the tools in multiple scenarios1. Beta Testing2. Repeated execution effectiveness –applying analytics ( R)3. Capturing Customer feedback and

Other scenarios – Big Data Tool implementation

Confidential | Copyright © QA Agility Technologies

3. Capturing Customer feedback and channeling the same for smarter test execution4. Extracting relevant information from repeated regression cycles from QC5. Adding intelligence on the data generated by the testing team

Page 42: Big Data - Hadoop and MapReduce for QA and testing by Aditya Garg

Other Way to use Big Data (BETA TESTING)

Confidential | Copyright © QA Agility Technologies

Challenge – Tweet on@qaagility@adigindia

Page 43: Big Data - Hadoop and MapReduce for QA and testing by Aditya Garg

Other Way to use Big Data

Confidential | Copyright © QA Agility Technologies

- Effective Regression Testing - Effective Sanity Testing

Page 44: Big Data - Hadoop and MapReduce for QA and testing by Aditya Garg

Thank you and Jai HindQuestions ?

@adigIndia@adigIndia@AgileTA#GTR2016

Page 45: Big Data - Hadoop and MapReduce for QA and testing by Aditya Garg

If Interested – Please attend a One day workshop on Big Data (Saturday 27 Feb : 9 to 6 PM)• Hadoop and Mapreduce• Hadoop and Mapreduce• VM setup• JDK, Eclipse and Hadoop installation • Map Reduce examples

Page 46: Big Data - Hadoop and MapReduce for QA and testing by Aditya Garg

ContactPlease contact us at [email protected]

Confidential | Copyright © QAAgility Technologies

MUMBAI711, Rupa SolitaireMBP, MahapeNavi Mumbai-400701

DENMARK1 Lindebo 7 Lej - 42,2630 Tasstrup, [email protected]

USA 200 E Campus View Blvd.Suite 200, Columbus, OH