apache mahout
TRANSCRIPT
-
5/24/2018 Apache Mahout
1/22
Guided By
Ms. Shikha PachoulyAssistant Professor
Computer Engineering
Department
6/21/2014
-
5/24/2018 Apache Mahout
2/22
Machine Learning Machine learning is programming computers to
optimize a performance criterion using example dataor past experience.
Machine Learning Strategies
1) Supervised
2)Unsupervised
6/21/2014
-
5/24/2018 Apache Mahout
3/22
Common Use Cases Recommend friends/dates/products
Classify content into predefined groups
Find similar content based on object properties Find associations/patterns in action/behaviors
Identify key topics in large collection of text
Detect anomalies in output
Ranking search results
6/21/2014
-
5/24/2018 Apache Mahout
4/22
Apache Mahout Introduction Machine Learning Library for Scalable applications
Includes core algorithms for Recommendation,Clustering and Classification that are implemented ontop of Hadoop Map-Reduce model.
Also includes core libraries are highly optimized toallow for good performance also for non-distributedalgorithms.
6/21/2014
-
5/24/2018 Apache Mahout
5/226/21/2014
-
5/24/2018 Apache Mahout
6/22
Mahout is distributed under a commercially friendlyApache Software license.
The goal of Mahout is to build a vibrant, responsive,diverse community to facilitate discussions not only on
the project itself but also on potential use cases.
Currently Mahout supports mainly three use cases:
1) Recommendation mining
2) Clustering
3) Classification
6/21/2014
-
5/24/2018 Apache Mahout
7/22
Why Mahout Many Open Source ML libraries (PyBrain, Shark etc)
either
1) lack community
2) lack scalability
3) lack documentations and examples
Most Mahout implementations are Map Reduceenabled
6/21/2014
-
5/24/2018 Apache Mahout
8/22
The main goal of Apache Mahout is to be useful topractitioners.
-This means implementations should be easy touse from within Java applications.
-It should be close to trivial to deploy thetrained models.
-Scaling to include more and more diverse datashould be simple.
6/21/2014
-
5/24/2018 Apache Mahout
9/22
Recommendations
Extensive Framework for collaborative filtering
Recommenders
1) user based
2) item based Many different similarity measures
e.g. Cosine, LLR, Tanimoto, Pearson,
6/21/2014
-
5/24/2018 Apache Mahout
10/22
Algorithms For Recommendatation User-Based Collaborative FilteringSingle Machine
Item-Based Collaborative Filtering - single machine /
Mapreduce
Matrix Factorization with Alternating Least Squares -
single machine / MapReduce
Matrix Factorization with Alternating Least Squares on
Implicit Feedback- single machine / MapReduce Weighted Matrix Factorization, SVD++, Parallel SGD -
single machine
6/21/2014
-
5/24/2018 Apache Mahout
11/22
User-Based Recommender
6/21/2014
-
5/24/2018 Apache Mahout
12/22
6/21/2014
-
5/24/2018 Apache Mahout
13/22
Clustering
6/21/2014
-
5/24/2018 Apache Mahout
14/22
Algorithms for Clustering K-Means Clustering
Fuzzy K-Means
Mean Shift Clustering Dirichlet Process Clustering (For Topic Modelling)
6/21/2014
-
5/24/2018 Apache Mahout
15/22
We can use commands instead of Clustering algorithmsthat can run on Hadoop infrastructure
e.g. for Canopy Clustering command is
bin/mahoutorg.apache.mahout.clustering.syntheticcontrol.canopy.Job
k-Means Clusteringbin/mahoutorg.apache.mahout.clustering.syntheticcontrol.kmeans.Job
Fuzzy k-Means Clusteringbin/mahoutorg.apache.mahout.clustering.syntheticcontrol.fuzzykmeans.Job
6/21/2014
-
5/24/2018 Apache Mahout
16/22
Classification
Algorithms implemented in Mahout for Classifiaction
Logistic Regression - trained via SGD - single machine
Naive Bayes/ Complementary Naive Bayes -MapReduce
Random Forest - MapReduce
Hidden Markov Models - single machine
Multilayer Perceptron - single machine
6/21/2014
-
5/24/2018 Apache Mahout
17/22
Running Nave Bayes from
Command Line Three Commands
1) mahout seq2sparse
performs TF/IDF transformations
2) mahout trainnb
model is trained by using Byes Model
3) mahout testnb
classification and testing is performed.
6/21/2014
-
5/24/2018 Apache Mahout
18/22
Installation of Mahout Download the tar files of both apache-mahout and
apache-maven projects
Unzip the tar files in a directory Set the Path Variables for maven
Set present working directory to the mahout's corefolder
Compile the project by 'mvn-compile' Build the project by 'mvn-install'
6/21/2014
-
5/24/2018 Apache Mahout
19/22
Mahout Vs WekaBase\ Technologies Mahout WEKA
Scalability More Less
Algorithms Less More
GUI No Yes
License Apache GPL
6/21/2014
-
5/24/2018 Apache Mahout
20/22
MAHOUT COMMERCIAL USERS
Adobe: Uses clustering algorithms to increase videoconsumption by better user targeting.
Amazon: For Personalization platform. AOL: For shopping recommendations. Twitter: Uses Mahouts LDA implementation for user interest
modeling. Yahoo! Mail: Uses Mahouts Frequent Pattern Set Mining. Drupal: Users Mahout to provide open source content
recommendation solutions. Evolv: Uses Mahout for its Workforce Predictive Analytics
platform. Foursquare: Uses Mahout for its recommendation engine . Idealo: Uses Mahouts recommendation engine.
6/21/2014
-
5/24/2018 Apache Mahout
21/22
References Bingwei Liu, Erik Blasch, Yu Chen, Dan Shen and Genshe Chen, on ScalableSentiment Classification for Big Data Analysis Using NaveBayes Classifier,2013 IEEE International Conference on Big Data.
Rui Mximo Esteves, Chunming Rong, Using Mahout for clusteringWikipedias latest Articles, 2011 Third IEEE International Conference on CloudComputing Technology and Science.
Kathleen Ericson and Shrideep Pallickara, On the Performance of DistributedData Clustering Algorithms in File and Streaming Processing Systems, 2011Fourth IEEE International Conference on Utility and Cloud Computing.
https://mahout.apache.org/
Sean Owen, Robin Anil , Mahout In Action, Manning Publications
6/21/2014
-
5/24/2018 Apache Mahout
22/22
THANK YOU
6/21/2014