tugas akhir -...

14
TWEET SUMMARIZATION BERDASARKAN TRENDING TOPIC TWITTER MENGGUNAKAN ALGORITMA TF-IDF DAN SINGLE LINKAGE AGGLOMERATIVE HIERARCHICAL CLUSTERING TUGAS AKHIR Diajukan Untuk Memenuhi Persyaratan Guna Meraih Gelar Sarjana Strata 1 Teknik Informatika Universitas Muhammadiyah Malang Oleh: Annisa 201210370311145 JURUSAN INFORMATIKA FAKULTAS TEKNIK UNIVERSITAS MUHAMMADIYAH MALANG April 2016

Upload: others

Post on 02-Sep-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: TUGAS AKHIR - eprints.umm.ac.ideprints.umm.ac.id/34176/1/jiptummpp-gdl-annisa2012-43858-1-1.pen… · Metode Hierarchical Clustering, K-Means dan Gabungan Keduanya dalam Cluster Data

TWEET SUMMARIZATION

BERDASARKAN TRENDING TOPIC TWITTER

MENGGUNAKAN ALGORITMA TF-IDF

DAN SINGLE LINKAGE

AGGLOMERATIVE HIERARCHICAL CLUSTERING

TUGAS AKHIR

Diajukan Untuk Memenuhi

Persyaratan Guna Meraih Gelar Sarjana Strata 1

Teknik Informatika Universitas Muhammadiyah Malang

Oleh:

Annisa

201210370311145

JURUSAN INFORMATIKA

FAKULTAS TEKNIK

UNIVERSITAS MUHAMMADIYAH MALANG

April 2016

Page 2: TUGAS AKHIR - eprints.umm.ac.ideprints.umm.ac.id/34176/1/jiptummpp-gdl-annisa2012-43858-1-1.pen… · Metode Hierarchical Clustering, K-Means dan Gabungan Keduanya dalam Cluster Data
Page 3: TUGAS AKHIR - eprints.umm.ac.ideprints.umm.ac.id/34176/1/jiptummpp-gdl-annisa2012-43858-1-1.pen… · Metode Hierarchical Clustering, K-Means dan Gabungan Keduanya dalam Cluster Data
Page 4: TUGAS AKHIR - eprints.umm.ac.ideprints.umm.ac.id/34176/1/jiptummpp-gdl-annisa2012-43858-1-1.pen… · Metode Hierarchical Clustering, K-Means dan Gabungan Keduanya dalam Cluster Data
Page 5: TUGAS AKHIR - eprints.umm.ac.ideprints.umm.ac.id/34176/1/jiptummpp-gdl-annisa2012-43858-1-1.pen… · Metode Hierarchical Clustering, K-Means dan Gabungan Keduanya dalam Cluster Data

KATA PENGANTAR

Alhamdulillah, segala puji bagi Allah SWT yang telah memberikan

rahmat dan petunjuk serta melancarkan jalan sehingga penulis dapat

menyelesaikan penelitian yang berjudul “Tweet Summarization Berdasarkan

Trending Topic Twitter Menggunakan Algoritma TF-IDF dan Single Linkage

Agglomerative Hierarchical Clustering”.

Pada penelitian ini dibuat suatu sistem tweet summarization berdasarkan

trending topic Twitter dalam bahasa Indonesia berbasis web. Sistem ini dirancang

untuk menghasilkan ringkasan teks otomatis dari multitweet berdasarkan trending

topic. Penulis berharap dengan dibuatnya sistem ini dapat membantu para

pengguna Twitter dalam membaca info trending topic melalui ringkasan otomatis

yang dihasilkan oleh sistem.

Penulis menyadari bahwa penelitian ini masih jauh dari kesempurnaan.

Oleh karena itu penulis mengharapkan kritik dan saran yang membangun untuk

pengembangan kedepannya.

Akhir kata penulis mengucapkan terima kasih kepada semua pihak yang

telah membantu hingga tugas akhir ini terselesaikan.

Malang, April 2016

Penulis

Page 6: TUGAS AKHIR - eprints.umm.ac.ideprints.umm.ac.id/34176/1/jiptummpp-gdl-annisa2012-43858-1-1.pen… · Metode Hierarchical Clustering, K-Means dan Gabungan Keduanya dalam Cluster Data

DAFTAR ISI

ABSTRAK ............................................................................................................ i

ABSTRACT .......................................................................................................... ii

KATA PENGANTAR .......................................................................................... iii

DAFTAR ISI ......................................................................................................... iv

DAFTAR GAMBAR ............................................................................................ vii

DAFTAR TABEL ................................................................................................viii

BAB I PENDAHULUAN ..................................................................................... 1

1.1. Latar Belakang .......................................................................................... 1

1.2. Rumusan Masalah ..................................................................................... 2

1.3. Tujuan Penelitian ....................................................................................... 2

1.4. Batasan Masalah ........................................................................................ 3

1.5. Metodologi ................................................................................................ 3

1.6. Sistematika Penulisan ................................................................................ 4

BAB II LANDASAN TEORI ............................................................................... 6

2.1. Twitter ....................................................................................................... 6

2.2. Cluster ....................................................................................................... 7

2.3. Text Summarization ................................................................................... 7

2.3.1. Pengertian Text Summarization ....................................................... 7

2.3.2. Pendekatan Text Summarization ...................................................... 8

2.3.3. Tahapan Membuat Ringkasan ......................................................... 8

2.3.4. Tujuan Text Summarization ............................................................. 9

2.4. Proses Text Summarization ....................................................................... 10

2.5. Algoritma Text Summarization ................................................................. 10

2.5.1. Pra Proses Data ................................................................................ 10

2.5.1.1. Pemecahan Tweet ..................................................................... 10

2.5.1.2. Case Folding ............................................................................ 11

2.5.1.3. Tokenizing ................................................................................ 11

2.5.1.4. Editing ...................................................................................... 12

2.5.1.5. Menghilangkan Stopwords ....................................................... 12

2.5.1.6. Normalisasi ............................................................................... 12

Page 7: TUGAS AKHIR - eprints.umm.ac.ideprints.umm.ac.id/34176/1/jiptummpp-gdl-annisa2012-43858-1-1.pen… · Metode Hierarchical Clustering, K-Means dan Gabungan Keduanya dalam Cluster Data

2.5.1.7. Stemming .................................................................................. 13

2.6. TF-IDF ...................................................................................................... 13

2.7. Single Linkage Hierarchical Clustering ................................................... 14

2.8. Teknik Pengujian ....................................................................................... 15

2.8.1. Pengujian Cluster ............................................................................. 15

2.8.1.1. Precision ................................................................................. 15

2.8.1.2. Recall ...................................................................................... 15

2.8.1.3. F-Measure .............................................................................. 16

2.8.2. Pengujian Ringkasan ........................................................................ 16

BAB III ANALISIS DAN PERANCANGAN SISTEM ...................................... 18

3.1. Analisis Masalah ....................................................................................... 18

3.2. Analisis Sistem .......................................................................................... 18

3.2.1. Analisis Data Masukan .................................................................... 21

3.2.2. Analisis Preprocessing .................................................................... 22

3.2.3. Analisis Metode TD-IDF ................................................................. 24

3.2.4. Analisis Metode Cluster .................................................................. 25

3.2.5. Analisis Metode Summarization ...................................................... 28

3.3. Perancangan Sistem ................................................................................... 29

3.3.1. Perancangan Basis Data ................................................................... 29

3.3.2. Perancangan Arsitektur .................................................................... 30

3.3.3. Perancangan Antarmuka .................................................................. 30

3.4. Spesifikasi Kebutuhan Perangkat Lunak ................................................. 32

BAB IV IMPLEMTASI DAN PENGUJIAN SISTEM ........................................ 33

4.1. Implementasi Sistem ................................................................................. 33

4.1.1. Implementasi Perangkat Keras ........................................................ 33

4.1.2. Implementasi Perangkat Lunak ....................................................... 33

4.1.3. Implementasi Basis Data ................................................................. 34

4.1.4. Implementasi Antarmuka ................................................................. 34

4.2. Pengujian ................................................................................................... 36

4.2.1. Pengujian Cluster ............................................................................. 36

4.2.1.1. Hasil Pengujian Cluster .......................................................... 36

4.2.1.2. Skenario Pengujian Cluster .................................................... 39

Page 8: TUGAS AKHIR - eprints.umm.ac.ideprints.umm.ac.id/34176/1/jiptummpp-gdl-annisa2012-43858-1-1.pen… · Metode Hierarchical Clustering, K-Means dan Gabungan Keduanya dalam Cluster Data

4.2.1.3. Evaluasi Pengujian Cluster .................................................... 40

4.2.2. Pengujian Ringkasan ........................................................................ 43

4.2.2.1. Hasil Pengujian Ringkasan ..................................................... 43

4.2.2.2. Skenario Pengujian Ringkasan ............................................... 45

4.2.2.3. Evaluasi Pengujian Ringkasan ............................................... 46

BAB V PENUTUP ................................................................................................ 48

5.1. Kesimpulan ................................................................................................ 48

5.2. Saran .......................................................................................................... 48

DAFTAR PUSTAKA ........................................................................................... 49

Page 9: TUGAS AKHIR - eprints.umm.ac.ideprints.umm.ac.id/34176/1/jiptummpp-gdl-annisa2012-43858-1-1.pen… · Metode Hierarchical Clustering, K-Means dan Gabungan Keduanya dalam Cluster Data

DAFTAR GAMBAR

Gambar 3.1 Gambaran Sistem .............................................................................. 19

Gambar 3.2 Gambaran Sistem tes.txt .................................................................... 20

Gambar 3.3 Flowchart Data Masukan Pertama .................................................... 21

Gambar 3.4 Test.txt ............................................................................................... 21

Gambar 3.5 Flowchart Data Masukan Kedua....................................................... 22

Gambar 3.6 Nilai Ketidakmiripan ......................................................................... 27

Gambar 3.7 Nilai Keterkaitan ............................................................................... 28

Gambar 3.8 Perancangan Arsitektur Sistem Ringkasan ....................................... 30

Gambar 3.9 Antarmuka Halaman Utama .............................................................. 30

Gambar 3.10 Antarmuka Halaman Memilih Dokumen ........................................ 31

Gambar 3.11 Antarmuka Halaman Hasil Ringkasan ............................................ 31

Gambar 4.1 Antarmuka Halaman Utama .............................................................. 34

Gambar 4.2 Antarmuka Memilih Dokumen ......................................................... 35

Gambar 4.3 Antarmuka Halaman Hasil Ringkasan .............................................. 35

Gambar 4.4 Ringkasan “BSM Tabungan Berencana” .......................................... 46

Page 10: TUGAS AKHIR - eprints.umm.ac.ideprints.umm.ac.id/34176/1/jiptummpp-gdl-annisa2012-43858-1-1.pen… · Metode Hierarchical Clustering, K-Means dan Gabungan Keduanya dalam Cluster Data

DAFTAR TABEL

Tabel 2.1 Pemecahan Tweet ................................................................................. 11

Tabel 2.2 Case Folding ......................................................................................... 11

Tabel 2.3 Tokenizing ............................................................................................. 11

Tabel 2.4 Editing ................................................................................................... 12

Tabel 2.5 Menghilangkan Stopwords .................................................................... 12

Tabel 2.6 Normalisasi ........................................................................................... 13

Tabel 2.7 Stemming ............................................................................................... 13

Tabel 2.8 Confusion Matrix .................................................................................. 15

Tabel 3.1 Trending Topic PHK ............................................................................. 18

Tabel 3.2 Pemecahan Tweet .................................................................................. 22

Tabel 3.3 Case Folding ......................................................................................... 23

Tabel 3.4 Tokenizing ............................................................................................. 23

Tabel 3.5 Menghilangkan Stopwords .................................................................... 24

Tabel 3.6 Perhitungan TF-IDF .............................................................................. 25

Tabel 3.7 Euclidean Matrix................................................................................... 26

Tabel 3.8 Matrix Level 1 ....................................................................................... 26

Tabel 3.9 Matrix Level 2 ....................................................................................... 26

Tabel 3.10 Matrix Level 3 ..................................................................................... 27

Tabel 3.11 Sampel Data Tweet “tes” ..................................................................... 29

Tabel 3.12 Struktur Tabel data_tweet ................................................................... 30

Tabel 3.13 SKPL ................................................................................................... 32

Tabel 4.1 Tabel data_tweet .................................................................................. 34

Tabel 4.2 Memilih Cluster Optimal Tema “BSM Tabungan Berencana” 1 ......... 38

Tabel 4.3 Cluster Optimal 30 data Uji dengan UPGMA ...................................... 39

Tabel 4.4 Precision ............................................................................................... 40

Tabel 4.5 Recall .................................................................................................... 41

Tabel 4.6 F-Measure Persen ................................................................................. 41

Tabel 4.7 Retweet .................................................................................................. 42

Tabel 4.8 Cluster “BSM Tabungan Berencana” .................................................. 43

Tabel 4.9 Nilai Keterkaitan Antar Tweet “BSM Tabungan Berencana” .............. 44

Page 11: TUGAS AKHIR - eprints.umm.ac.ideprints.umm.ac.id/34176/1/jiptummpp-gdl-annisa2012-43858-1-1.pen… · Metode Hierarchical Clustering, K-Means dan Gabungan Keduanya dalam Cluster Data

Tabel 4.10 Ringkasan “BSM Tabungan Berencana” ............................................ 45

Tabel 4.11 ROUGE-1 ............................................................................................ 46

Tabel 4.12 Beda Ringkasan................................................................................... 47

Page 12: TUGAS AKHIR - eprints.umm.ac.ideprints.umm.ac.id/34176/1/jiptummpp-gdl-annisa2012-43858-1-1.pen… · Metode Hierarchical Clustering, K-Means dan Gabungan Keduanya dalam Cluster Data

DAFTAR LAMPIRAN

Lampiran 1. Data Tweet Trending Topic “BSM Tabungan Berencana” ............. 51

Lampiran 2. TF-IDF Trending Topic “BSM Tabungan Berencana” Bag. 1 ........ 52

Lampiran 3. TF-IDF Trending Topic “BSM Tabungan Berencana” Bag. 2 ........ 53

Lampiran 4. Matrix Euclidan Distance “BSM Tabungan Berencana” 1 ............. 54

Lampiran 5. Matrix Euclidan Distance “BSM Tabungan Berencana” 2 ............. 55

Lampiran 6. Hirarki Cluster “BSM Tabungan Berencana” Bag.1........................ 56

Lampiran 7. Hirarki Cluster “BSM Tabungan Berencana” Bag.2........................ 57

Lampiran 8. Hirarki Cluster “BSM Tabungan Berencana” Bag.3........................ 58

Lampiran 9. Hirarki Cluster “BSM Tabungan Berencana” Bag.4........................ 59

Lampiran 10. Hirarki Cluster “BSM Tabungan Berencana” Bag.5...................... 60

Lampiran 11. Hirarki Cluster “BSM Tabungan Berencana” Bag.6...................... 61

Lampiran 12. Hirarki Cluster “BSM Tabungan Berencana” Bag.7...................... 62

Lampiran 13.. Hirarki Cluster “BSM Tabungan Berencana” Bag.8..................... 63

Lampiran 14. Hirarki Cluster “BSM Tabungan Berencana” Bag.9...................... 64

Lampiran 15. Hirarki Cluster “BSM Tabungan Berencana” Bag.10.................... 66

Lampiran 16. Hirarki Cluster “BSM Tabungan Berencana” Bag.11.................... 66

Lampiran 17. Hirarki Cluster “BSM Tabungan Berencana” Bag.12.................... 67

Lampiran 18. Hasil Klasterisasi“BSM Tabungan Berencana” Bag.1 ................... 68

Lampiran 19. Hasil Klasterisasi“BSM Tabungan Berencana” Bag.2 ................... 69

Page 13: TUGAS AKHIR - eprints.umm.ac.ideprints.umm.ac.id/34176/1/jiptummpp-gdl-annisa2012-43858-1-1.pen… · Metode Hierarchical Clustering, K-Means dan Gabungan Keduanya dalam Cluster Data

DAFTAR PUSTAKA

[1] Erkan, Günes, dan Dragomir R. Radev. "LexRank: Graph-based lexical

centrality as salience in text summarization." Journal of Artificial Intelligence

Research (2004): 457-479.

[2] Móro, Róbert, dan M. Bielikov. "Personalized text summarization based on

important terms identification." Database and Expert Systems Applications

(DEXA), 2012 23rd International Workshop on. IEEE, 2012.

[3] Berkhin, Pavel. “A survey of clustering data mining techniques.” Grouping

multidimensional data. Springer Berlin Heidelberg, 2006. 25-71.

[4] Fauzi, Ahmad. “All About Twitter.” Depok: Yureka, 2009. 3-4

[5] Kwak, Haewoon, Changhyun Lee, Hosung Park, dan Sue Moon. "What is

Twitter, a social network or a news media?." Proceedings of the 19th

international conference on World wide web. ACM, 2010.

[6] Alfina, Tahta, Budi Santosa, dan Ali Ridho Barakbah. "Analisa Perbandingan

Metode Hierarchical Clustering, K-Means dan Gabungan Keduanya dalam

Cluster Data (Studi Kasus: Problem Kerja Praktek Teknik Industri ITS)."

Jurnal Teknik ITS 1.1 (2012): A521-A525.

[7] Many, I. dan Maybury. 1999. Advance in Automatic Text Summarization. The

MIT Press: Cambrige.

[8] Zaman, B., dan E Winarko. 2011. Analisis Fitur Kalimat untuk Peringkas

Teks otomatis pada Bahasa Indonesia. Indonesian Journal of Computing and

Cybernetics Systems 5 (2): 60-68.

[9] Juhara, E., Budiman, E., dan Rohayati, R. 2005 Cendekia berbahasa. Bahasa

dan Sastra Indonesia. Bandung: PT Setia Purna Inves.

[10] Mustaqhfiri, Muchammad. 2011. Peringkasan Teks Otomatis Berita Olahraga

Berbahasa Indonesia Menggunakan Metode Maximum Marginal Relevance.

Skripsi .Teknik Informatika Fakultas Sains dan Teknologi Universitas Islam

Negeri Maulana Malik Ibrahim Malang.

[11] Amin, Fathhul. 2012. Sistem Temu Kembali dengan Metode Vector Space

Model. Semarang, Fakultas Teknologi Informasi, Universitas Stikubank.

Page 14: TUGAS AKHIR - eprints.umm.ac.ideprints.umm.ac.id/34176/1/jiptummpp-gdl-annisa2012-43858-1-1.pen… · Metode Hierarchical Clustering, K-Means dan Gabungan Keduanya dalam Cluster Data

[12] Hamzah, Amir, F.Soesianto, dan Jazi Eko Istiyanto . "Studi Kinerja Fungsi-

Fungsi Jarak dan Similaritas dalam Clustering Dokumen Teks Berbahasa

Indonesia." Seminar Nasional Informatika (SEMNASIF). Vol. 1. No. 1. 2015.

[13] Steinbach, Michael, George Karypis, dan Vipin Kumar. "A comparison of

document clustering techniques." KDD workshop on text mining. Vol. 400.

No. 1. 2000.

[14] Xu, Lei, Adam Krzyżak, dan Erkki Oja. "Rival penalized competitive

learning for clustering analysis, RBF net, and curve detection." Neural

Networks, IEEE Transactions on 4.4 (1993): 636-649.

[15] LIN, C.Y. 2004. ROUGE: A Package for Automatic Evaluation of

Summaries. Proceedings of Workshop on Text Summarization Brances Out.

[16] Santika, Putu Praba, and Gus Nanang Syaifuddin. "Semantic Clustering Dan

Pemilihan Kalimat Representatif Untuk Peringkasan Multi Dokumen." Jurnal

Teknologi Informasi dan Ilmu Komputer 1.2 (2015).