perbandingan kinerja algoritma data mining yang …
TRANSCRIPT
PERBANDINGAN KINERJA ALGORITMA DATA
MINING YANG BERBASIS TEKNIK FEATURE
SELECTION DALAM MENDETEKSI
PENYAKIT GINJAL KRONIS
SKRIPSI
Diajukan Guna Memenuhi Persyaratan Memperoleh
Gelar Sarjana Komputer (S.Kom)
Michelle Prizcillya
00000023738
PROGRAM STUDI SISTEM INFORMASI
FAKULTAS TEKNIK DAN INFORMATIKA
UNIVERSITAS MULTIMEDIA NUSANTARA
TANGERANG
2021
i
PERNYATAAN
ii
HALAMAN PERSETUJUAN
Skripsi Dengan Judul
“Perbandingan Kinerja Algoritma Data Mining berbasis teknik Feature selection
dalam Mendeteksi Penyakit Ginjal Kronis”
Oleh
Michelle Prizcillya
Telah disetujui untuk diajukan pada
Sidang Ujian Skripsi Universitas Multimedia Nusantara
Tangerang, 04 Januari 2021
Menyetujui,
iii
HALAMAN PENGESAHAN
Skripsi Dengan Judul
“Perbandingan Kinerja Algoritma Data Mining berbasis teknik Feature selection
dalam Mendeteksi Penyakit Ginjal Kronis”
Oleh
Michelle Prizcillya
Telah diujikan pada hari Senin, 11 Januari 2021,
pukul 11.00 sd 12.30 dan dinyatakan lulus
dengan susunan penguji sebagai berikut
Ketua Sidang Penguji
Ririn Ikana Desanti, S.Kom., M.Kom. Monika Evelin Johan, S.Kom., M.M.S.I
Dosen Pembimbing
Ir. Raymond Sunardi Oetama, M.C.I.S.
Disahkan oleh
Ketua Program Studi Sistem Informasi – UMN
Ririn Ikana Desanti, S.Kom., M.Kom.
25 Januari 2021
iv
PERBANDINGAN KINERJA ALGORITMA DATA MINING
BERBASIS TEKNIK FEATURE SELECTION DALAM
MENDETEKSI PENYAKIT GINJAL KRONIS
ABSTRAK Oleh : Michelle Prizcillya
Penyakit ginjal kronis atau PGK adalah hilangnya fungsi ginjal secara bertahap dari
waktu ke waktu. PGK disebut sebagai "silent killer" sebab pasien yang terkena PGK
tidak menyadari bahwa mereka terkena PGK karena PGK tidak mempunyai gejala
di tahapan awal. Atribut yang diperlukan untuk melakukan tes kesehatan untuk
PGK juga cukup banyak sehingga membutuhkan biaya yang lumayan mahal. oleh
karena itu PGK dapat dicegah, ditanggulangi dan kemungkinan mendapatkan
perawatan yang efektif akan lebih besar jika sudah diketahui sedari awal serta dapat
menghemat biaya pengobatan sehingga dibuatnya model untuk mempermudah
dalam melakukan deteksi PGK.
Proses penelitian ini akan menggunakan data science tools Rapid Miner dan proses
data mining dengan kerangka kerja CRISP-DM untuk melakukan perbandingan
penggunaan yaitu K-Nearest Neighbour(K-NN), decision tree serta logistic
regression berbasis teknik feature selection sebagai teknik untuk pemilihan atribut
yang relevan dalam sebuah dataset dan untuk melakukan pemilihan dan reduksi
data maka teknik feature selection yang digunakan adalah forward selection dan
backward elimination yang akan dibandingkan hasilnya.
Hasil dari penelitian ini menyimpulkan bahwa teknik feature selection terbukti
meningkatkan akurasi pada penggunaan algoritma K-NN, decision tree dan logistic
regression sedangkan algoritma terbaik untuk forward selection itu adalah decision
tree dan backward elimination adalah logistic regression.
Kata kunci : backward elimination, decision tree, forward selection, k-nearest
neighbour , logistic regression.
v
PERFORMANCE COMPARSION OF DATA MINING ALGORITHM
BASED ON FEATURE SELECTION TECHNIQUES IN DETECTING
CHRONIC KIDNEY DISEASES
ABSTRACT By : Michelle Prizcillya
Chronic kidney disease or CKD is a gradual loss of renal function over time. CKD
is referred to as a "silent killer" because patients affected by PGK do not realize
that they have CKD because CKD has no symptoms in the early stages. The
attributes needed to do a health test for CKD are also quite a lot so it costs quite a
lot. therefore PGK can be prevented, overcome and the possibility of getting
effective treatment will be greater if it is known from the beginning and can save
on medical costs so that the model is created to facilitate CKD detection.
This research process will use Rapid Miner data science tools and data mining
process with CRISP-DM framework to compare the use of K-Nearest Neighbour
(K-NN), decision tree and logistic regression based feature selection technique as
a technique for selecting relevant attributes in a dataset and to do data selection
and reduction, the feature selection technique used is forward selection and
backward elimination that will be compared to the result.
The results of this study concluded that feature selection techniques are proven to
improve accuracy in the use of K-NN algorithms, decision trees and logistic
regression while the best algorithm for forward selection is decision tree and
backward elimination is logistic regression.
Keywords : backward elimination, decision tree, forward selection, k-nearest
neighbour , logistic regression.
vi
KATA PENGANTAR
Puji syukur kepada Tuhan Yang Maha Esa sehingga telah terselesaikannya
laporan skrispi yang berjudul “Perbandingan Kinerja Algortima Data Mining
berbasis teknik feature selection dalam Mendeteksi Penyakit Ginjal Kronis” ini
pada waktunya. Laporan skripsi ini disusun untuk memenuhi salah satu prasyarat
kelulusan di Universitas Multimedia Nusantara dan juga merupakan bukti bahwa
telah menyelesaikan kuliah jenjang program Strata-1 guna memperoleh gelar
Sarjana Komputer .
Dalam penyusunan laporan skripsi ini, banyak sekali hambatan dan
rintangan yang telah hadapi namun pada akhirnya dapat dilalui karena adanya
arahan dan bimbingan dari berbagai pihak terkait baik secara moral maupun
spiritual. Maka diucapkan rasa hormat serta terima kasih kepada seluruh pihak
yang telah membantu. Pihak – pihak yang ada berkaitan dengan skripsi ini
diantaranya sebagai berikut :
1. Bapak Ir. Raymond Sunardi Oetama, M.C.I.S. selaku dosen pembimbing
yang telah memberikan bimbingan dan saran-saran yang diberikan selama
pengerjaan laporan skripsi.
2. Bapak Iwan Prasetiawan, S.Kom.,M.M. yang sudah mau berdiskusi serta
memberikan saran dalam penulisan laporan skripi.
3. Orangtua beserta keluarga yang telah memberikan doa dan dukungan
selama proses pembuatan laporan skripsi.
4. Teman-teman dikampus yaitu Lambe Negara serta Cristi ,Okta, Hansen,
Sandro dan Jenny yang selalu berbagi suka dan duka selama perkuliahan
vii
yang serta memberikan saran dan dukungan dalam proses pengerjaan
laporan skripsi.
5. Teman-teman di Sungai Pinyuh yaitu Ampit Girls yang selalu menemani
dan memberikan dukungan pembuatan laporan skripsi selama pandemi
COVID-19.
6. Kepada idola yaitu Hyungwon dan MONSTA X serta masih banyak lagi
yang selalu memberikan semangat dan dukungan dalam proses pengerjaan
laporan skripsi .
7. Serta kepada semua pihak yang tidak dapat diucapkan satu per satu yang
telah membantu serta memberikan dukungan dan semangat dalam
pembuatan laporan skripsi ini.
Disadari bahwa laporan skripsi ini memang masih jauh dari kata
kesempurnaan, tetapi penulisan skripsi ini telah dilakukan dan telah berusaha
semaksimal mungkin. Oleh karena itu, diharapkannya segala bentuk saran serta
masukan bahkan kritik yang membangun dari berbagai pihak. Sekali lagi
mengucapkan banyak terima kasih, semoga laporan ini bermanfaat bagi kita semua
dan terkhususnya bagi pembuat skripsi ini.
Tangerang, 05 Desember 2020
Michelle Prizcillya
viii
DAFTAR ISI
PERNYATAAN ....................................................................................................... i
HALAMAN PERSETUJUAN ................................................................................ ii
HALAMAN PENGESAHAN ................................................................................ iii
ABSTRAK ............................................................................................................. iv
ABSTRACT ............................................................................................................ v
KATA PENGANTAR ........................................................................................... vi
DAFTAR ISI ........................................................................................................ viii
DAFTAR TABEL .................................................................................................. xi
DAFTAR GAMBAR ............................................................................................ xii
DAFTAR RUMUS ............................................................................................... xv
BAB I PENDAHULUAN ....................................................................................... 1
1.1. Latar Belakang ......................................................................................... 1
1.2. Rumusan Masalah .................................................................................... 6
1.3. Batasan Masalah ....................................................................................... 6
1.4. Tujuan Penelitian ...................................................................................... 7
1.5. Manfaat Penelitian .................................................................................... 7
BAB II TINJAUAN PUSTAKA ............................................................................. 8
2.1. Penyakit Ginjal Kronis ............................................................................. 8
2.2. CRISP-DM ............................................................................................. 10
2.3. Feature selection .................................................................................... 13
2.3.1. Forward selection ........................................................................... 14
2.3.2. Backward elimination ..................................................................... 15
2.4. Algoritma K-Nearest Neighbour ............................................................ 16
ix
2.5. Algoritma Decision-Tree ........................................................................ 19
2.6. Algoritma Logistic regression ................................................................ 21
2.7. Tools / Alat Bantu Software ................................................................... 22
2.8. Penelitian Terdahulu ............................................................................... 22
BAB III METODOLOGI PENELITIAN.............................................................. 26
3.1 Gambaran Umum Objek Penelitian ....................................................... 26
3.2 Metode Penelitian ................................................................................... 26
3.2.1. Data Collection ............................................................................... 26
3.2.2. Variabel Independen ....................................................................... 28
3.2.3. Variabel Dependen .......................................................................... 28
3.3. Alur Penelitian ........................................................................................ 29
3.3.1. Business Understanding .................................................................. 30
3.3.2. Data Understanding ........................................................................ 30
3.3.3. Data Preparation ............................................................................ 31
3.3.4. Modeling ......................................................................................... 31
3.3.5. Evaluation ....................................................................................... 34
3.3.6. Deployment ..................................................................................... 34
3.3 Validasi Hasil ......................................................................................... 34
BAB IV ANALISIS DAN HASIL PENELITIAN ............................................... 36
4.1. Business Understanding Phase .............................................................. 36
4.2. Data Understanding Phase .................................................................... 36
4.3. Data Preparation Phase ......................................................................... 38
4.4. Modelling Phase ..................................................................................... 40
4.4.1. Membaca dataset PGK, melakukan data preparation dan melakukan
data cleansing. ............................................................................................... 42
x
4.4.2. Set role untuk memilih atribut yang akan digunakan dan splitting the
data. ..........................................................................................................44
4.4.3. Pembuatan Model............................................................................ 46
4.5. Evaluation Phase .................................................................................... 67
BAB V KESIMPULAN DAN SARAN ................................................................ 70
5.1 Kesimpulan ............................................................................................. 70
5.2 Saran ....................................................................................................... 72
DAFTAR PUSTAKA ........................................................................................... 73
LAMPIRAN .......................................................................................................... 78
xi
DAFTAR TABEL
Tabel 1.1. Tabel Perbandingan Penelitian Sebelumnya .......................................... 4
Tabel 1.1. Tabel Perbandingan Penelitian Sebelumnya .......................................... 5
Tabel 2.1. Tahapan Penyakit Ginjal Kronis ............................................................ 9
Tabel 2.2. Penelitian Sebelumnya ......................................................................... 23
Tabel 2.2. Penelitian Sebelumnya ......................................................................... 24
Tabel 2.2. Penelitian Sebelumnya ......................................................................... 25
Tabel 3.1. Tabel Dataset PGK ..............................................................................27
Tabel 4.1. Perbandingan Algoritma menggunakan teknik Feature selection ......67
Tabel 4.2. Tabel Akurasi PGK .............................................................................. 69
xii
DAFTAR GAMBAR
Gambar 2.1. Proses CRISP-DM ........................................................................... 11
Gambar 2.2. Flowchart Forward selection ........................................................... 14
Gambar 2.3. Flowchart Backward elimination. .................................................... 15
Gambar 2.4. flowchart K-NN ............................................................................... 18
Gambar 2.5. Rule Decision tree ............................................................................ 20
Gambar 3.1. Flowchart CRISP-DM ....................................................................29
Gambar 3.2. Flowchart Fase Modelling ............................................................... 32
Gambar 4.1. Syntax Perhitungan Flowchart ........................................................37
Gambar 4.2. Screenshot hasil perhitungan ............................................................ 38
Gambar 4.3. Syntax menghitung missing values ................................................... 38
Gambar 4.4. Screenshot hasil missing values ....................................................... 39
Gambar 4.5. Syntax mencari K di K-NN .............................................................. 40
Gambar 4. 6 Screenshot hasil pencarian K di K-NN ............................................ 41
Gambar 4.7. Flowchart Penelitian ........................................................................ 41
Gambar 4.8. Screenshot Retrieve Operator Rapid Miner .................................... 42
Gambar 4.9. Screenshot isi Operator Rapid Miner ............................................... 43
Gambar 4.10. Operator Filter Examples ............................................................... 43
Gambar 4.11. Operator set role dan split data ...................................................... 44
Gambar 4.12. Operator Set role ............................................................................ 45
Gambar 4.13. Operator Split data ......................................................................... 45
Gambar 4.14. Operator forward selection K-NN.................................................. 47
Gambar 4.15. Cross-validation K-NN .................................................................. 47
Gambar 4.16. isi dalam Cross-validation K-NN .................................................. 47
Gambar 4.17. Jumlah K dalam K-NN ................................................................... 48
Gambar 4.18. Tabel gambar hasil Atribut K-NN .................................................. 48
Gambar 4.19. Akurasi K-NN forward selection ................................................... 49
Gambar 4.20. Model K-NN forward selection ..................................................... 49
Gambar 4.21. Forward selection Decision tree .................................................... 49
Gambar 4.22. Cross-validation Decision tree ...................................................... 50
xiii
Gambar 4.23. isi cross-validation decision tree.................................................... 50
Gambar 4.24. Hasil forward selection decision tree ............................................. 51
Gambar 4.25. Hasil Akurasi forward selection decision tree ............................... 51
Gambar 4.26. Model forward selection decision tree ........................................... 52
Gambar 4.27. Operator Forward selection logistic regression............................. 52
Gambar 4.28. Cross-validation logistic regression .............................................. 52
Gambar 4.29. Isi cross-validation logistic regression .......................................... 53
Gambar 4.30. Hasil atribut forward selection logistic regression ........................ 53
Gambar 4.31. Hasil Akurasi forward selection logistic regression ...................... 54
Gambar 4.32. Model forward selection logistic regression .................................. 54
Gambar 4.33. Operator Backward elimination K-NN .......................................... 55
Gambar 4.34. Cross-validation K-NN .................................................................. 55
Gambar 4.35. Isi cross-validation K-NN .............................................................. 55
Gambar 4.36. parameter nilai K ............................................................................ 56
Gambar 4.37. Hasil atribut backward elimination K-NN ..................................... 56
Gambar 4.38. Hasil Akurasi backward elimination K-NN ................................... 57
Gambar 4.39. Model backward elimination K-NN .............................................. 57
Gambar 4.40. Backward elimination decision tree ............................................... 57
Gambar 4.41. Cross-validation Decision tree ...................................................... 58
Gambar 4.42. Isi cross-validation decision tree ................................................... 58
Gambar 4.43. Hasil atribut backward elimination decision tree .......................... 59
Gambar 4. 44. Hasil akurasi backward elimination decision tree ........................ 59
Gambar 4.45. Model backward elimination decision tree ................................... 60
Gambar 4.46. Backward elimination logistic regression ...................................... 60
Gambar 4.47. Cross-validation logistic regression .............................................. 61
Gambar 4.48. Isi dalam cross-validation logistic regression ............................... 61
Gambar 4.49. Hasil atribut backward ellimination logistic regression ................ 62
Gambar 4.50. Hasil Akurasi backward elimination logistic regression ............... 62
Gambar 4.51. Model backward elimination logistic regression ........................... 62
Gambar 4.52. Cross-validation K-NN .................................................................. 63
Gambar 4.53. Isi dari cross-validation K-NN ....................................................... 63
xiv
Gambar 4.54. Hasil Akurasi K-NN ....................................................................... 63
Gambar 4.55. Model K-NN .................................................................................. 64
Gambar 4.56. Cross-validation decision tree ....................................................... 64
Gambar 4.57. Isi dalam cross-validation decision tree ........................................ 65
Gambar 4.58. Hasil akurasi decision tree ............................................................. 65
Gambar 4.59. Model decision tree ........................................................................ 65
Gambar 4.60. Cross-validation logistic regression .............................................. 66
Gambar 4.61. Isi dalam logistic regression .......................................................... 66
Gambar 4.62. Hasil Akurasi logistic regression ................................................... 66
Gambar 4.63. Model logistic regression ............................................................... 67
xv
DAFTAR RUMUS
Rumus 2.1. Menghitung e-GFR .............................................................................. 9
Rumus 2.2. Menghitung euclidean distance ......................................................... 17
Rumus 2.3. Menghitung Gain Ratio ..................................................................... 20
Rumus 2.4. Rule Model Tree ................................................................................ 20
Rumus 2.5. Fungsi Classifier pada Logistic Regression ....................................... 21
Rumus 2.6. Model Persamaan Logistic Regression .............................................. 22