contraceptive method choice
DESCRIPTION
Contraceptive Method Choice. 指導教授 黃三益博士 組員 :B924020007 王俐文 B924020009 謝孟凌 B924020014 陳怡珺. Background and Motivation. Population of the world increases tremendously, people of present day pay more attention to contraceptive method. - PowerPoint PPT PresentationTRANSCRIPT
Contraceptive Method Choice
指導教授 黃三益博士 組員 :B924020007 王俐文 B924020009 謝孟凌 B924020014 陳怡珺
Background and Motivation
Population of the world increases tremendously, people of present day pay more attention to contraceptive method.
Step one: Translate the Business Problem into a Data Mining Problem
Topic: Contraceptive Method Choice Predict the current contraceptive method choice (no
use, long-term methods, or short-term methods) of a woman based on her demographic and socio-economic characteristics.
Especially what kind of couples would chose long-term method.
Step two: Select Appropriate Data
Title: Contraceptive Method Choice Sources:
Origin: Subset of the 1987 National Indonesia Contraceptive Prevalence Survey
Creator: Tjen-Sien Lim Date: June 7, 1997
Step two: Select Appropriate Data
Number of Instances: 1473
There is no missing value in this dataset.
Step two: Select Appropriate Data
Number of attributes:
10 (including the class attribute) Wife's age
Wife's education
Husband's education
Number of children ever born
Wife's religion
Wife's now working?
Husband's occupation
Standard-of-living index
Media exposure
Contraceptive method used (class attribute)
Step three: Get to Know the Data
Attribute Information
Attribute Name Attribute Type Description
of Attribute Value
Contraceptive method used
class attribute 1=No-use
2=Long-term
3=Short-term
Step three: Get to Know the Data
Attribute Information
Attribute Name Attribute Type Description
of Attribute Value
Wife's age Numerical
50.040.030.020.010.0
wife_age
80
60
40
20
0
Frequency
Mean = 32.538Std. Dev. = 8.2272N = 1,473
Histogram
Step three: Get to Know the Data
Attribute Information
Attribute Name Attribute Type Description
of Attribute Value
Wife's education Categorical 1=low 2, 3, 4=high
543210
wife_educarion
600
500
400
300
200
100
0
Frequency
Mean = 2.96Std. Dev. = 1.015N = 1,473
Histogram
Step three: Get to Know the Data
Attribute Information
Attribute Name Attribute Type Description
of Attribute Value
Husband's education Categorical 1=low 2, 3, 4=high
543210
husband_education
1,000
800
600
400
200
0
Frequency
Mean = 3.43Std. Dev. = 0.816N = 1,473
Histogram
Step three: Get to Know the Data
Attribute Information
Attribute Name Attribute Type Description
of Attribute Value
Number of children ever born
Numerical
15.010.05.00.0
number_of_children
300
250
200
150
100
50
0
Frequency
Mean = 3.261Std. Dev. = 2.3585N = 1,473
Histogram
Step three: Get to Know the Data
Attribute Information
Attribute Name Attribute Type Description
of Attribute Value
Wife's religion Binary 0=Non-Islam
1=Islam
1.510.50-0.5
wife_religion
2,000
1,500
1,000
500
0
Frequency
Mean = 0.85Std. Dev. = 0.357N = 1,473
Histogram
Step three: Get to Know the Data
Attribute Information
Attribute Name Attribute Type Description
of Attribute Value
Wife's now working?
Binary 0=Yes
1=No
1.510.50-0.5
wife_now_work
1,400
1,200
1,000
800
600
400
200
0
Frequency
Mean = 0.75Std. Dev. = 0.433N = 1,473
Histogram
Step three: Get to Know the Data
Attribute Information
Attribute Name Attribute Type Description
of Attribute Value
Husband's occupation Categorical 1, 2, 3, 4
543210
husband_occupation
700
600
500
400
300
200
100
0
Frequency
Mean = 2.14Std. Dev. = 0.865N = 1,473
Histogram
Step three: Get to Know the Data
Attribute Information
Attribute Name Attribute Type Description
of Attribute Value
Standard-of-living index Categorical 1=low 2, 3, 4=high
543210
standard_of_living_index
700
600
500
400
300
200
100
0
Frequency
Mean = 3.13Std. Dev. = 0.976N = 1,473
Histogram
Step three: Get to Know the Data
Attribute Information
Attribute Name Attribute Type Description
of Attribute Value
Media exposure Binary 0=Good 1=Not good
1.510.50-0.5
media_exposure
2,500
2,000
1,500
1,000
500
0
Frequency
Mean = 0.07Std. Dev. = 0.262N = 1,473
Histogram
Step Four : Create a Model Set
Raw Data
Step Four : Create a Model Set
Total 1473 samples 75% of the data as training set
the rest of the data as testing set
→By random sampling Rapid Miner
Step Five: Fix Problems with the Data
No missing value Skewed distributions
Step Six : Transform Data to Bring Information to the Surface most of the values of the attribute named Media
Exposure are “Good” the numeric variables to do the statistical analysis to
finding outliers
1.510.50-0.5
media_exposure
2,500
2,000
1,500
1,000
500
0
Frequency
Mean = 0.07Std. Dev. = 0.262N = 1,473
Histogram
15.010.05.00.0
number_of_children
300
250
200
150
100
50
0
Frequency
Mean = 3.261Std. Dev. = 2.3585N = 1,473
Histogram
Step7 Build Model By RapidMiner, build it with Decision Tree
Step7 Build Model(con’t)
Ripper Rule if wife_age > 30 and Num_children_born <= 1 then 1 (5
3 / 1 / 3) if Num_children_born <= 0 then 1 (36 / 0 / 0) if Wife_education = 4 and wife_age <= 42 and Wife_r
eligion = 0 and Num_children_born > 3 then 2 (0 / 14 / 0)
if Wife_education = 1 and Husband_occupation = 2 then 1 (17 / 0 / 1)
if Wife_education = 4 and wife_age > 33 and Num_children_born > 2 and Husband_occupation = 1 and Num_children_born <= 3 then 2 (1 / 10 / 2)
Step7 Build Model(con’t)
if Num_children_born > 2 and wife_age <= 33 and Wife_now_working = 1 and Num_children_born <= 4 and wife_age > 28 then 3 (1 / 0 / 13)
if wife_age <= 35 and Num_children_born > 4 and Media_exposure = 0 then 3 (1 / 2 / 12)
if Husband_education = 4 and wife_age <= 44 and living_level = 3 and wife_age > 37 then 2 (0 / 5 / 0)
else 1 (305 / 168 / 281)
Step7 Build Model(con’t)
Weka-JRip (Wife_education = 4) and (Num_children_born >= 3) and (wif
e_age >= 35) => method_used=2 (178.0/76.0) (wife_age <= 33) and (Num_children_born >= 3) => method_use
d=3 (271.0/120.0) (wife_age <= 33) and (Num_children_born >= 1) and (wife_age <
= 22) => method_used=3 (106.0/51.0) => method_used=1 (771.0/342.0)
Step7 Build Model(con’t)
Step 8 Assess Model Decision Tree
Step 8 Assess Model(con’t) Ripper Rule
Step 8 Assess Model(con’t) JRip Rule
Conclusion
Result The problems we should improve
more data ignore some attributes details of the attribute are not so clear period and environment have changed
Thanks for you listening…