© 2015 Blue Camphor Technologies (P) Ltd. www.skillspeed.com Slide 1
Python for Big Data Analytics
© 2015 Blue Camphor Technologies (P) Ltd. www.skillspeed.com Slide 2
Session Objectives
This session will help you to understand:
ᗍ Introduction to Python
ᗍ Web Scraping Use Case
ᗍ Introduction to Big data
ᗍ Getting your doubt’s cleared
© 2015 Blue Camphor Technologies (P) Ltd. www.skillspeed.com Slide 3
What is Python?
ᗍ Python is a general purpose High-level Programming Language designed to be easy to read and simple to implement
ᗍ It’s high-level built in Data Structures, combined with dynamic typing and dynamic binding, makes it very attractive for Rapid Application Development
ᗍ Python supports Modules and Packages, which encourages Program Modularity (feature of subdividing a program into separate sub-programs) and Code Reuse
ᗍ It is similar to PERL and RUBY but with certain differences such as Object-oriented features
© 2015 Blue Camphor Technologies (P) Ltd. www.skillspeed.com Slide 4
What is Python? (Cont’d)
Python has Object-oriented Structure. It supports:
Polymorphism
Static Polymorphis
m
Runtime Polymorphis
m Class A
Class B Class C
Polymorphism Multiple Inheritance Object Overloading
Operator ‘+’
5+5=10Skill+Speed=SkillS
peed
Get Started with Python
© 2015 Blue Camphor Technologies (P) Ltd. www.skillspeed.com Slide 5
Why Python?
Good for Text Processing
Generates HTML Content
Your C++ Program
Extended in C and C++
Script.py
Cpython Interpreter
Cpython Interprete
r
Clear Syntax
Get Started with Python
© 2015 Blue Camphor Technologies (P) Ltd. www.skillspeed.com Slide 6
Why Python? (Cont’d)
Interpreted Environment
Source Code
InterpreterOutput
Automatic Memory Management
Good for Code Steering and for Merging Multiple Programs
Supports Library Utilities and Third Party Utilities (Example: Numeric, NumPy, SciPy)
Get Started with Python
© 2015 Blue Camphor Technologies (P) Ltd. www.skillspeed.com Slide 7
Job Trends
Perc
enta
ge G
row
th
Get Started with Python
© 2015 Blue Camphor Technologies (P) Ltd. www.skillspeed.com Slide 8
Users of Python
Google App Engine is an eminent sample of Python-written application, it allows building web applications with Python programming language, using its rich collection of libraries, tools and frameworks
YouTube is a big user of Python, the entire site uses Python for different purposes: view video, control templates for website, administer video, access to canonical data, and many more. Python is everywhere at YouTube
Amazon Web Services uses Python Get Started with Python
© 2015 Blue Camphor Technologies (P) Ltd. www.skillspeed.com Slide 9
Some More Users of Python
Get Started with Python
© 2015 Blue Camphor Technologies (P) Ltd. www.skillspeed.com Slide 10
System Utilities GUIs (Tkinter) Internet Scripting Embedded Scripting
Database Programming
Artificial Intelligence
Image Processing
Major Uses of Python
Get Started with Python
© 2015 Blue Camphor Technologies (P) Ltd. www.skillspeed.com Slide 11
Demo: Web Scraping – Flipkart.com
ᗍ This Example demonstrates how to extract data from flipkart for a particular product like “Watch”
ᗍ We shall use requests (Python Package) which gets the web page for you, then you need to parse the HTML from the page to retrieve the data. That is done by BeautifulSoup
Get Started with Python
© 2015 Blue Camphor Technologies (P) Ltd. www.skillspeed.com Slide 12
ᗍ Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications
ᗍ Huge Amount of Data (Terabytes or Petabytes)
ᗍ The challenges include capture, curation, storage, search, sharing, transfer, analysis, and visualization
ᗍ Many systems or a collection of systems generates these huge data, few examples are Space Exploration, Deep Sea Navigation, Social Media etc.
What is Big Data?
Get Started with Python
© 2015 Blue Camphor Technologies (P) Ltd. www.skillspeed.com Slide 13
Why Big Data?
ᗍ Data being generated today is so huge that traditional systems are unable to process it neither are able to store it
ᗍ To create better DSS (Decision Support System) system
ᗍ Google alone receives 4 million search queries per minute
ᗍ Data is generated from everywhere such as Sensors for Climate Information, Social Media, Music Audio’s and Videos, Global Positioning System
ᗍ Only 10 percent of worlds data today resides in RDBMS and 90% elsewhere, how do we deal with this enormous data?
Get Started with Python
© 2015 Blue Camphor Technologies (P) Ltd. www.skillspeed.com Slide 14
Big Data Statistics
Every minute:
ᗍ Facebook users share nearly 2.5 million pieces of contentᗍ Twitter users tweet nearly 300,000 timesᗍ Instagram users post nearly 220,000 new photosᗍ YouTube users upload 72 hours of new video contentᗍ Apple users download nearly 50,000 appsᗍ Email users send over 200 million messagesᗍ Amazon generates over $80,000 in online sales
Refer: http://aci.info/2014/07/12/the-data-explosion-in-2014-minute-by-minute-infographic/
Get Started with Python
© 2015 Blue Camphor Technologies (P) Ltd. www.skillspeed.com Slide 15
Characteristics of Big Data
Get Started with Python
© 2015 Blue Camphor Technologies (P) Ltd. www.skillspeed.com Slide 16
Case Study 1: Big Data from Space
Satellite Imaging
Get Started with Python
© 2015 Blue Camphor Technologies (P) Ltd. www.skillspeed.com Slide 17
ᗍ Structure of Data:Data in social media are unstructured or semi-structured. Data from twitter /Facebook are in JSON. where do we store it? How do we process it?
ᗍ Quantity of Data:These are tons of unstructured, structured and semi structured data. How do I derive a pattern out of it?
ᗍ Processing of Data:How do we process this complex data structure, what technologies do we use?
ᗍ Prediction Algorithm:After having done all the good work of cleansing and slicing/dicing the data, which algorithm do we use. Is it decision tree, SVM, k-mean, kNN and the list goes on
Case Study 2: Social Media Analytics
Get Started with Python
© 2015 Blue Camphor Technologies (P) Ltd. www.skillspeed.com Slide 18
Why SkillSpeed?
Course Curriculum
from Industry Experts
Instructor Led Live Virtual Sessions
Lifetime access to Course
Content via LMS
100% Placement Assistance
24x7 Support
24x7
Get Started with Python
© 2015 Blue Camphor Technologies (P) Ltd. www.skillspeed.com Slide 19
Course Topics
Module 1
Introduction to Python
Module 2
Built-In Data Types, Strings, Sequence and
Files
Module 3
Functions, Sorting, Exceptions, Standard
Libraries
Module 4
Regular Expression and Object-oriented Programming
Module 5Debugging Python, Project
Skeleton in Python and SQLite Database
Module 6
Introduction to Big Data and Hadoop
Module 7
Python and Big Data
Module 8
Implementation of Machine Learning in
Python
Module 9
Working Examples of Machine Learning in
Python
Module 10
Project Implementation Get Started with Python
© 2015 Blue Camphor Technologies (P) Ltd. www.skillspeed.com Slide 20
Corporate Partners
Get Started with Python
© 2015 Blue Camphor Technologies (P) Ltd. www.skillspeed.com Slide 21
Lines open 24/7
To know more about the course, Please contact:
IND+91-90660-20904 USA1866-607-6547 (Toll Free)
Or reach us at
Contact Us
Get Started with Python
© 2015 Blue Camphor Technologies (P) Ltd. www.skillspeed.com Slide 22
References
https://harshbhimjyani.wordpress.com/2014/10/21/scraping-flipkart/
https://www.vlab.org/sandbox/events/satellite-imaging-big-data-from-space-shared/
http://www.datasciencecentral.com/profiles/blogs/data-veracity