big data mining - komputasi.files.wordpress.com · 1 17-2-2020 pengantar perkuliahan, mengenal big...
TRANSCRIPT
Komputasi Big Data
HusniJurusan Teknik Informatika
Universitas Trunojoyo Madura
Husni.trunojoyo.ac.id
Pengantar Perkuliahan
Silabus Mata KuliahUniversitas Trunojoyo Madura
Tahun Akademik 2019/2020, Semester Genap
• Mata Kuliah: Komputasi Big Data
• Pengampu: Husni, S.Kom., MT.
• Kode Kelas: ABCDE123 (Edmodo)– Program S1 Informatika, kelas A
• Rincian– Pilihan
– 3 SKS
• Jadwal: Selasa, jam 12.45, ruang 203 (gedung F)
2
Pengantar Perkuliahan• Kuliah ini mengantarkan konsep fundamental dan isi
peneletian di bidang Big Data, baik dari sisi Infrastruktur (terdistribusi) maupun Analitika (Mining).
• Topik yang dibahas termasuk:– ABC: AI, Big Data, Cloud Computing, – Data Science and Big Data Analytics: Discovering, Analyzing, Visualizing and
Presenting Data, – Fundamental Big Data: MapReduce Paradigm, Hadoop and Spark Ecosystem, – Foundations of Big Data Mining in Python, – Supervised Learning: Classification and Prediction, – Unsupervised Learning: Cluster Analysis, – Unsupervised Learning: Association Analysis, – Machine Learning with Scikit-Learn in Python, – Deep Learning for Big Data with TensorFlow,– Convolutional Neural Networks (CNN)– Recurrent Neural Networks (RNN)– Reinforcement Learning (RL)– Social Network Analysis (SNA)
3
Tujuan Pengajaran
1. Memahami dan menerapkan konsep fundamental dan isu-isu riset bidang big data mining.
2. Melaksanakan riset informatika (komputasi atau infrastruktur) dalam konteks big data mining.
4
Metode Pengajaran
Kuliah
Diskusi
Simulasi
Proyek
Problem Solving
5
Tugas
• Proyek
• Laporan
• Partisipasi
6
Rencana Bahasan (1/2)
7
No. Tanggal Topik
1 17-2-2020 Pengantar Perkuliahan, Mengenal Big Data
2 24-2-2020 ABC: AI, Big Data, Cloud Computing
3 2-3-2020 Data Science and Big Data Analytics: Discovering,Analyzing, Visualizing and Presenting Data
4 9-3-2020 Fundamental Big Data: MapReduce Paradigm,Hadoop and Spark Ecosystem
5 16-3-2020 Foundations of Big Data Mining in Python
6 23-3-2020 Supervised Learning: Classification and Prediction
7 30-3-2020 Unsupervised Learning: Cluster Analysis
8 6-4-2020 Ujian Tengah Semester (UTS)
Rencana Bahasan (2/2)
8
No. Tanggal Topik
9. 20-4-2020 Unsupervised Learning: Association Analysis
10 27-4-2020 Machine Learning with Scikit-Learn in Python
11. 4-5-2020 Deep Learning for Big Data with TensorFlow
12. 11-5-2020 Convolutional Neural Networks (CNN)
13. 18-5-2020 Recurrent Neural Networks (RNN)
14. Reinforcement Learning (RL)
15. Social Network Analysis (SNA)
16. Ujian Akhir Semester (UAS)
Penilaian
• UTS dan UAS: 40%
• Proyek: 60%
– Laporan tengah semester: Proposal
– Laporan akhir semester
9
Textbook dan Referensi• Buku Teks: Tidak tersedia, panduan utama handout:
– http://husni.trunojoyo.ac.id
• Referensi, ada banyak, misal:– Learning Data Mining with Python - Second Edition, Robert Layton, Packt Publishing, 2017.
– Hands-On Data Science and Python Machine Learning: Perform data mining and machine learning efficiently using Python and Spark, Frank Kane, Packt Publishing, 2017.
– Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems, Aurélien Géron, O’Reilly Media, 2017
– Practical Machine Learning with Python: A Problem-Solver’s Guide to Building Real-World Intelligent Systems, Dipanjan Sarkar, Raghav Bali, Tushar Sharma, Apress, 2017.
– Python Machine Learning: Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow, 2nd Edition, Sebastian Raschka and Vahid Mirjalili, Packt Publishing, 2017.
– Deep Learning with Python, Francois Chollet, Manning Publications, 2017.
– Big Data, Data Mining, and Machine Learning: Value Creation for Business Leaders and Practitioners, Jared Dean, Wiley, 2014.
– Data Mining: Concepts and Techniques, Third Edition, Jiawei Han, Micheline Kamber and Jian Pei, Morgan Kaufmann, 2011.
10
Ervin Varga (2019) Practical Data Science with Python 3: Synthesizing Actionable Insights from Data, Apress
11
12
Peter Ghavami (2020) Big Data Analytics Methods, 2nd edition, Walter de Gruyter
Jovan Pehcevski (2019) Big Data Analytics - Methods and Applications, Arcler Press
13
14
15
C. S. R. Prabhu, etc (2019) Big Data Analytics: Systems, Algorithms,Applications, Springer
16
Rohan Chopr. Etc. (2019) Data Science with Python, Packt Publishing
17
18
19
Learning Data Mining with Python - Second Edition, Robert Layton,
Packt Publishing, 2017
20
Hands-On Data Science and Python Machine Learning: Perform data mining and machine learning efficiently using Python and Spark,
Frank Kane, Packt Publishing, 2017
21
Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems,
Aurélien Géron, O’Reilly Media, 2017
22
Practical Machine Learning with Python: A Problem-Solver’s Guide to Building Real-World Intelligent Systems,
Dipanjan Sarkar, Raghav Bali, Tushar Sharma, Apress, 2017.
23
Deep Learning with Python, Francois Chollet, Manning Publications, 2017.
24
Big Data, Data Mining, and Machine Learning: Value Creation for Business Leaders and Practitioners,
Jared Dean, Wiley, 2014.
25
Data Mining: Concepts and Techniques, Third Edition, Jiawei Han, MichelineKamber and Jian Pei,
Morgan Kaufmann, 2011
26
Social Network Based Big Data Analysis and Applications, Lecture Notes in Social Networks,
Mehmet Kaya, Jalal Kawash, Suheil Khoury, Min-Yuh Day, Springer International Publishing, 2018.
Google Colab
27https://colab.research.google.com/notebooks/welcome.ipynb
Big Data Analytics
danData Mining
28
Big Data 4 V
29Source: https://www-01.ibm.com/software/data/bigdata/
30
31
10 V dari Big Data
Artificial IntelligenceMachine Learning & Deep Learning
32Source: https://blogs.nvidia.com/blog/2016/07/29/whats-difference-artificial-intelligence-machine-learning-deep-learning-ai/
Stephan Kudyba (2014),
Big Data, Mining, and Analytics: Components of Strategic Decision Making, Auerbach Publications
33
Arsitektur Analitika Big Data
34Source: Stephan Kudyba (2014), Big Data, Mining, and Analytics: Components of Strategic Decision Making, Auerbach Publications
Data Mining
OLAP
Reports
QueriesHadoop
MapReducePig
HiveJaql
ZookeeperHbase
CassandraOozieAvro
MahoutOthers
Middleware
Extract Transform
Load
Data Warehouse
Traditional Format
CSV, Tables
* Internal
* External
* Multiple formats
* Multiple locations
* Multiple applications
Big Data Sources
Big Data Transformation
Big Data Platforms & Tools
Big Data Analytics
Applications
Big Data Analytics
Transformed Data
Raw Data
Arsitektur Analitika Big Data
35Source: Stephan Kudyba (2014), Big Data, Mining, and Analytics: Components of Strategic Decision Making, Auerbach Publications
Data Mining
OLAP
Reports
QueriesHadoop
MapReducePig
HiveJaql
ZookeeperHbase
CassandraOozieAvro
MahoutOthers
Middleware
Extract Transform
Load
Data Warehouse
Traditional Format
CSV, Tables
* Internal
* External
* Multiple formats
* Multiple locations
* Multiple applications
Big Data Sources
Big Data Transformation
Big Data Platforms & Tools
Big Data Analytics
Applications
Big Data Analytics
Transformed Data
Raw Data
Data MiningAplikasi Analitika Big Data
Milestones Analitika Big Data
36
37
Social Big Data Mining(Hiroshi Ishikawa, 2015)
38
Arsitektur Penambangan Data Besar Sosial (Social Big Data Mining)
(Hiroshi Ishikawa, 2015)
39
Hardware
Software Social Data
Physical Layer
Logical Layer
Integrated analysis
Multivariate analysis
Application specific task
Data Mining
Conceptual Layer
Enabling Technologies Analysts• Model Construction• Explanation by Model
• Construction and confirmation of individual hypothesis
• Description and execution of application-specific task
• Integrated analysis model
• Natural Language Processing• Information Extraction• Anomaly Detection• Discovery of relationships
among heterogeneous data• Large-scale visualization
• Parallel distrusted processing
Source: Hiroshi Ishikawa (2015), Social Big Data Mining, CRC Press
Infrastruktur Business Intelligence (BI)
40Source: Kenneth C. Laudon & Jane P. Laudon (2014), Management Information Systems: Managing the Digital Firm, Thirteenth Edition, Pearson.
Data WarehouseData Mining dan Business Intelligence
Increasing potential
to support
business decisions End User
Business
Analyst
Data
Analyst
DBA
DecisionMaking
Data Presentation
Visualization Techniques
Data MiningInformation Discovery
Data Exploration
Statistical Summary, Querying, and Reporting
Data Preprocessing/Integration, Data Warehouses
Data Sources
Paper, Files, Web documents, Scientific experiments, Database Systems
41Source: Jiawei Han and Micheline Kamber (2006), Data Mining: Concepts and Techniques, Second Edition, Elsevier
Evolusi Kemampuan BI
Source: Turban et al. (2011), Decision Support and Business Intelligence Systems 42
Data Mining diantara Irisan Banyak Disiplin
Sta
tistic
s
Management Science &
Information Systems
Artificia
l Inte
lligence
Databases
Pattern
Recognition
Machine
Learning
Mathematical
Modeling
DATA
MINING
Source: Turban et al. (2011), Decision Support and Business Intelligence Systems 43
Data Mining:
Core Analytics Process
The KDD Process for Extracting Useful Knowledge
from Volumes of Data
44Source: Fayyad, U., Piatetsky-Shapiro, G., & Smyth, P. (1996). The KDD Process for Extracting Useful Knowledge from Volumes of Data.
Communications of the ACM, 39(11), 27-34.
Fayyad, U., Piatetsky-Shapiro, G., & Smyth, P. (1996). The KDD Process for Extracting Useful Knowledge from
Volumes of Data. Communications of the ACM, 39(11), 27-34.
45
Data Mining Proses Knowledge Discovery in Databases
(KDD)(Fayyad et al., 1996)
46Source: Fayyad, U., Piatetsky-Shapiro, G., & Smyth, P. (1996). The KDD Process for Extracting Useful Knowledge from Volumes of Data.
Communications of the ACM, 39(11), 27-34.
Proses Knowledge Discovery (KDD)
Data Cleaning
Data Integration
Databases
Data Warehouse
Task-relevant Data
Selection
Data Mining
Pattern Evaluation
47Source: Han & Kamber (2006)
Data mining:
inti dari proses penemuan pengetahuan
Data Mining Processing Pipeline(Charu Aggarwal, 2015)
48Source: Charu Aggarwal (2015), Data Mining: The Textbook Hardcover, Springer
Koleksi Data
Data PreprocessingAnalytical Processing
Ekstraksi Fitur
Pembersihan&
Integrasi
Building Block 1
Building Block 2
OutputuntukAnalis
Feedback (Optional)
Feedback (Optional)
Taksonomi Pekerjaan Data MiningData Mining
Prediction
Classification
Regression
Clustering
Association
Link analysis
Sequence analysis
Learning Method Popular Algorithms
Supervised
Supervised
Supervised
Unsupervised
Unsupervised
Unsupervised
Unsupervised
Decision trees, ANN/MLP, SVM, Rough
sets, Genetic Algorithms
Linear/Nonlinear Regression, Regression
trees, ANN/MLP, SVM
Expectation Maximization, Apriory
Algorithm, Graph-based Matching
Apriory Algorithm, FP-Growth technique
K-means, ANN/SOM
Outlier analysis Unsupervised K-means, Expectation Maximization (EM)
Apriory, OneR, ZeroR, Eclat
Classification and Regression Trees,
ANN, SVM, Genetic Algorithms
Source: Turban et al. (2011), Decision Support and Business Intelligence Systems 49
Wawasan Bisnisdengan
Analisis Sosial
50
Menganalisis Web Sosial: Analisis Jaringan Sosial
51
52Source: http://www.amazon.com/Analyzing-Social-Web-Jennifer-Golbeck/dp/0124055311
Jennifer Golbeck (2013), Analyzing the Social Web, Morgan Kaufmann
The 14th NTCIR (2018 - 2019)
53http://research.nii.ac.jp/ntcir/ntcir-14/index.html
NTCIR-14 Short Text Conversation Task (STC-3)
54
55
NTCIR-14 STC-3Short Text Conversation Task (STC-3)
Chinese Emotional Conversation Generation (CECG) Subtask
http://www.aihuang.org/p/challenge.html
Short Text Conversation (NTCIR-13 STC2)Retrieval-based
56Source: http://ntcirstc.noahlab.com.hk/STC2/stc-cn.htm
Short Text Conversation (NTCIR-13 STC2)
Generation-based
57Source: http://ntcirstc.noahlab.com.hk/STC2/stc-cn.htm
Istilah Penting
58
Rangkuman
59
• Kuliah ini memperkanalkan konsep mendasar dan isu-isu penelitian bidang Big Data Mining.
• Topik yang akan dibahas:– ABC: AI, Big Data, Cloud Computing, – Data Science and Big Data Analytics: Discovering, Analyzing, Visualizing
and Presenting Data, – Fundamental Big Data: MapReduce Paradigm, Hadoop and Spark
Ecosystem, – Foundations of Big Data Mining in Python, – Supervised Learning: Classification and Prediction, – Unsupervised Learning: Cluster Analysis, – Unsupervised Learning: Association Analysis, – Machine Learning with Scikit-Learn in Python, – Deep Learning for Big Data with TensorFlow,– Convolutional Neural Networks (CNN)– Recurrent Neural Networks (RNN)– Reinforcement Learning (RL)– Social Network Analysis (SNA)
Contact
Husni, S.Kom., MT.
Lab. Sistem Terdistribusi
Jurusan Teknik Informatika
Universitas Trunojoyo Madura
Email: [email protected]
Web: http://husni.trunojoyo.ac.id/60
Big Data Mining