20161017 r語言資料分析實務 (2)
TRANSCRIPT
-
R (2)
Data Science
-
2016/10/17()
. CC --
3.0 http://shouzo.github.io/
-
Agenda() Prepare() Basic() Theme() Reference
-
() Prepare
() Prepare
-
() Prepare
"RStudio"
-
() Basic
() Basic
-
() Basic 1.
1.
R rvest
https://blog.gtwang.org/r/rvest-web-scraping-with-r/
-
() Basic 1.
Xpath
-
() Basic
Google (Chrome) -
1.
-
() Basic
Mozilla Firefox - FireBug
1.
-
() Basic 1.
Xpath
https://github.com/aweimeowaweimeow
-
() Basic 1.
-
() Basic 1.
-
() Basic 1.
-
() Basic 1.
-
() Basic 1.
Xpath
-
() Basic 1.
-
() Basic 1.
-
() Basic 1.
-
() Basic 1.
TAG
-
() Basic 1.
-
() Basic 1.
-
() Basic
2.
2.
-
() Basic 2.
(1) CSV(2) XML(3) JSON(4) DB ()(5) RData(6) SPSSStataSASOctave ...
CSV
-
() BasicCSV STEP1read.table()
CSV ( tab )
read.csv2()
read.delim2()
[ ]
read.table (file= , header= TRUE or FALSE, sep= "")[ ]
file header sep
#()theUrl
-
() BasicCSV
STEP2head()[ ] head()
STEP3data.frame()[ ] data.frame (1 = 1, 2 = 2, 3 = 3, ......,stringsAsFactors=TRUE or False)[ ] stringsAsFactors character () factor character
2.
-
() BasicSTEP2STEP3
>head(tomato) #RoundTomatoPriceSourceSweetAcidColorTextureOverall 11SimpsonSM 3.99WholeFoods 2.82.83.73.43.421Tuttorosso(blue) 2.99Pioneer 3.32.83.43.02.931Tuttorosso(green) 0.99Pioneer 2.82.63.32.82.941LaFedeSMDOP 3.99ShopRite 2.62.83.02.32.852CentoSMDOP 5.49DAgostino 3.33.12.92.83.162CentoOrganic 4.99DAgostino 3.22.92.93.12.9Avg.of.TotalsTotal.of.Avg 1 16.116.12 15.315.33 14.314.34 13.413.45 14.415.26 15.515.1>>xy#"q"character>q>theDFtheDF$Sport [1]"Hockey""Football" "Baseball" "Curling""Rugby""Lacrosse" [7]"Basketball" "Tennis""Cricket""Soccer"
2.
-
() Theme
() Theme
-
() Theme
(1) ""
(2)
(3)
-
() Theme
1.
1.
-
() Theme 1.
STEP 1STEP 2STEP 3""STEP 4""STEP 5""
Text mining and word cloud fundamentals in R5 simple steps you should know
https://goo.gl/snM2nZ
-
() Theme 1. STEP 1
http://www.technewsworld.com/story/83998.htmlBig Data and Analytics: Creating New Value
-
() Theme 1. STEP 1
-
() Theme 1. STEP 2
#install.packages("rvest")#""install.packages("tm")#""install.packages("SnowballC")#Textstemminginstall.packages("wordcloud")#""install.packages("RColorBrewer")#Colorpalettes
#library("rvest")library("tm")library("SnowballC")library("wordcloud")library("RColorBrewer")
RStudio
-
() Theme 1. STEP 3""
Chrome "" ( "F12")
-
() Theme 1.
"Copy Xpath"
2
2
1
STEP 3""
-
() Theme 1. STEP 3""
Xpath ()
//*[@id="storybody"] Xpath
-
() Theme 1. STEP 3""
#"source.page"source.page
-
() Theme 1. STEP 3""
#""docs
-
() Theme 1. STEP 3""
""
#""toSpace
-
() Theme 1. STEP 4""
dtm
-
() Theme 1. STEP 5""
#set.seed(1000)
#wordcloud(words=d$word,freq=d$freq,min.freq= 2,max.words=30,random.order=FALSE,rot.per=0.35,colors=brewer.pal( 8,"Dark2"))
-
() Theme
2.
2.
-
() Theme 2.
STEP 1STEP 2STEP 3""STEP 4""STEP 5""
http://andrew.ga/works/TextMining/
-
() Theme 2. STEP 1
http://www.appledaily.com.tw/realtimenews/arti
cle/life/20161016/968938/
-
() Theme 2. STEP 1
-
() Theme 2. STEP 2
RStudio#install.packages("rvest")#""install.packages("jiebaR")#""install.packages("tm")#""install.packages("wordcloud2")#""
#library("rvest")library("jiebaR")library("tm")library("wordcloud2")
-
() Theme 2. STEP 3""
Chrome "" ( "F12")
-
() Theme 2. STEP 3""
1
2
"Copy Xpath"
-
() Theme 2. STEP 3""
Xpath ()
//*[@id="summary"] Xpath
-
() Theme 2. STEP 3""
#"source.page"source.page
-
() Theme 2. STEP 3""
space_tokenizer= function(x){unlist(strsplit(as.character(x[[ 1]]),'[[:space:]]+' ))}
jieba_tokenizer= function(d){unlist(segment(d[[ 1]],mixseg)) }
#CNCorpus####CNCorpusFunctionStart#### CNCorpus= function(d.vec){doc
-
() Theme 2.
content.corpus=CNCorpus(list(content.vec))#CNCorpuscontent.corpus
-
() Theme 2.
frequency
-
() Reference
() Reference
-
http://datascienceandr.org/
() Reference
1. R - Wush WuChih Cheng LiangJohnson Hsieh
2. R - &http://goo.gl/18mwug
3. R - https://goo.gl/NPdzzP
1. DataCamphttps://www.datacamp.com/
2. R for Data Sciencehttp://r4ds.had.co.nz/
-
R
Jared P. Lander
() Reference
-
Taiwan R User Grouphttps://www.facebook.com/Tw.R.User/
https://www.facebook.com/twdsconf/
/ Data Visualizationhttps://www.facebook.com/data.visualize/
() Reference
-
Q & A