Download - DataMining Shahjad
-
8/18/2019 DataMining Shahjad
1/14
-
8/18/2019 DataMining Shahjad
2/14
OverviewIntroduction
Explanation of Data Mining Techniques
AdvantagesApplications
Privacy
-
8/18/2019 DataMining Shahjad
3/14
Data MiningWhat is Data Mining
!The process of se"i auto"atically analy#ing largedata$ases to %nd useful patterns&
!Atte"pts to discover rules and patterns fro" data&Areas of 'se Internet ( Discover needs of custo"ersEcono"ics ( Predict stoc) prices*cience ( Predict environ"ental changeMedicine ( Match patients with si"ilar pro$le"s cure
-
8/18/2019 DataMining Shahjad
4/14
Exa"ple of Data Mining+redit +ard +o"pany wants to discover infor"ation
a$out clients fro" data$ases, Want to %nd-+lients who respond to pro"otions in !.un) Mail&
+lients that are li)ely to change to another co"petitor+lients that are li)ely to not pay
*ervices that clients use to try to pro"ote servicesa/liated with the +redit +ard +o"pany
Anything else that "ay help the +o"pany provide0pro"ote services to help their clients and ulti"ately"a)e "ore "oney,
-
8/18/2019 DataMining Shahjad
5/14
Data Mining 1 Data WarehousingData Warehouse- !is a repository 2or archive3 of
infor"ation gathered fro" "ultiple sources4 storedunder a uni%ed sche"a4 at a single site,&
+ollect data *tore in single repositoryAllows for easier query develop"ent as a single
repository can $e queried,
-
8/18/2019 DataMining Shahjad
6/14
Data Mining Techniques+lassi%cation
+lustering
5egressionAssociation 5ules
-
8/18/2019 DataMining Shahjad
7/14
+lassi%cation+lassi%cation- 6iven a set of ite"s that have several classes4
and given the past instances 2training instances3 with theirassociated class4 +lassi%cation is the process of predicting theclass of a new ite",
Therefore to classify the new ite" and identify to which classit $elongs
Exa"ple- A $an) wants to classify its 7o"e 8oan +usto"ersinto groups according to their response to $an)advertise"ents, The $an) "ight use the classi%cations!5esponds 5arely4 5esponds *o"eti"es4 5esponds9requently&,
The $an) will then atte"pt to %nd rules a$out the custo"ersthat respond 9requently and *o"eti"es,
The rules could $e used to predict needs of potentialcusto"ers,
-
8/18/2019 DataMining Shahjad
8/14
Technique for+lassi%cationDecision:Tree +lassi%ers
.o$
Inco"e
.o$
Inco"e Inco"e
+arpenterEngineer Doctor
;ad 6ood ;ad 6ood ;ad 6ood
?
?
?
B>?
BC>?
B>>?
Predicting credit ris) of a person with the o$s
-
8/18/2019 DataMining Shahjad
9/14
+lustering !+lustering algorith"s %nd groups of ite"s that are
si"ilar, F It divides a data set so that records withsi"ilar content are in the sa"e group4 and groups
are as diGerent as possi$le fro" each other, & 2H3
Exa"ple- Insurance co"pany could use clustering togroup clients $y their age4 location and types ofinsurance purchased,
The categories are unspeci%ed and this is referred toas unsupervised learningJ
-
8/18/2019 DataMining Shahjad
10/14
+lustering6roup Data into +lusters
*i"ilar data is grouped in the sa"e clusterDissi"ilar data is grouped in the sa"e cluster
7ow is this achieved ?:Kearest Keigh$or A classi%cation "ethod that classi%es a point $y
calculating the distances $etween the point and pointsin the training data set, Then it assigns the point to the
class that is "ost co""on a"ong its ):nearestneigh$ors 2where ) is an integer3,2H3
7ierarchical 6roup data into t:trees
-
8/18/2019 DataMining Shahjad
11/14
Advantages of DataMiningProvides new )nowledge fro" existing data
Pu$lic data$ases6overn"ent sources +o"pany Data$ases
Old data can $e used to develop new )nowledge
Kew )nowledge can $e used to i"prove services or products
I"prove"ents lead to- ;igger pro%tsMore e/cient service
-
8/18/2019 DataMining Shahjad
12/14
'ses of Data Mining*ales0 Mar)etingDiversify target "ar)et Identify clients needs to increase response rates
5is) Assess"ent Identify +usto"ers that pose high credit ris)
9raud Detection Identify people "isusing the syste", E,g, People who
have two *ocial *ecurity Ku"$ers
+usto"er +are Identify custo"ers li)ely to change providers Identify custo"er needs
-
8/18/2019 DataMining Shahjad
13/14
Privacy +oncernsEGective Data Mining requires large sources of data To achieve a wide spectru" of data4 lin) "ultiple data
sources
8in)ing sources leads can $e pro$le"atic for privacy asfollows- If the following histories of a custo"er werelin)ed-*hopping 7istory+redit 7istory;an) 7istory
E"ploy"ent 7istory
-
8/18/2019 DataMining Shahjad
14/14
Thank you