20161017 r語言資料分析實務 (2)

58
R (2) Data Science - 2016/10/17( ) . CC - - 3.0 http://shouzo.github.io/

Upload: -

Post on 26-Jan-2017

287 views

Category:

Data & Analytics


1 download

TRANSCRIPT

  • R (2)

    Data Science

    -

    2016/10/17()

    . CC --

    3.0 http://shouzo.github.io/

  • Agenda() Prepare() Basic() Theme() Reference

  • () Prepare

    () Prepare

  • () Prepare

    "RStudio"

  • () Basic

    () Basic

  • () Basic 1.

    1.

    R rvest

    https://blog.gtwang.org/r/rvest-web-scraping-with-r/

  • () Basic 1.

    Xpath

  • () Basic

    Google (Chrome) -

    1.

  • () Basic

    Mozilla Firefox - FireBug

    1.

  • () Basic 1.

    Xpath

    https://github.com/aweimeowaweimeow

  • () Basic 1.

  • () Basic 1.

  • () Basic 1.

  • () Basic 1.

  • () Basic 1.

    Xpath

  • () Basic 1.

  • () Basic 1.

  • () Basic 1.

  • () Basic 1.

    TAG

  • () Basic 1.

  • () Basic 1.

  • () Basic

    2.

    2.

  • () Basic 2.

    (1) CSV(2) XML(3) JSON(4) DB ()(5) RData(6) SPSSStataSASOctave ...

    CSV

  • () BasicCSV STEP1read.table()

    CSV ( tab )

    read.csv2()

    read.delim2()

    [ ]

    read.table (file= , header= TRUE or FALSE, sep= "")[ ]

    file header sep

    #()theUrl

  • () BasicCSV

    STEP2head()[ ] head()

    STEP3data.frame()[ ] data.frame (1 = 1, 2 = 2, 3 = 3, ......,stringsAsFactors=TRUE or False)[ ] stringsAsFactors character () factor character

    2.

  • () BasicSTEP2STEP3

    >head(tomato) #RoundTomatoPriceSourceSweetAcidColorTextureOverall 11SimpsonSM 3.99WholeFoods 2.82.83.73.43.421Tuttorosso(blue) 2.99Pioneer 3.32.83.43.02.931Tuttorosso(green) 0.99Pioneer 2.82.63.32.82.941LaFedeSMDOP 3.99ShopRite 2.62.83.02.32.852CentoSMDOP 5.49DAgostino 3.33.12.92.83.162CentoOrganic 4.99DAgostino 3.22.92.93.12.9Avg.of.TotalsTotal.of.Avg 1 16.116.12 15.315.33 14.314.34 13.413.45 14.415.26 15.515.1>>xy#"q"character>q>theDFtheDF$Sport [1]"Hockey""Football" "Baseball" "Curling""Rugby""Lacrosse" [7]"Basketball" "Tennis""Cricket""Soccer"

    2.

  • () Theme

    () Theme

  • () Theme

    (1) ""

    (2)

    (3)

  • () Theme

    1.

    1.

  • () Theme 1.

    STEP 1STEP 2STEP 3""STEP 4""STEP 5""

    Text mining and word cloud fundamentals in R5 simple steps you should know

    https://goo.gl/snM2nZ

  • () Theme 1. STEP 1

    http://www.technewsworld.com/story/83998.htmlBig Data and Analytics: Creating New Value

  • () Theme 1. STEP 1

  • () Theme 1. STEP 2

    #install.packages("rvest")#""install.packages("tm")#""install.packages("SnowballC")#Textstemminginstall.packages("wordcloud")#""install.packages("RColorBrewer")#Colorpalettes

    #library("rvest")library("tm")library("SnowballC")library("wordcloud")library("RColorBrewer")

    RStudio

  • () Theme 1. STEP 3""

    Chrome "" ( "F12")

  • () Theme 1.

    "Copy Xpath"

    2

    2

    1

    STEP 3""

  • () Theme 1. STEP 3""

    Xpath ()

    //*[@id="storybody"] Xpath

  • () Theme 1. STEP 3""

    #"source.page"source.page

  • () Theme 1. STEP 3""

    #""docs

  • () Theme 1. STEP 3""

    ""

    #""toSpace

  • () Theme 1. STEP 4""

    dtm

  • () Theme 1. STEP 5""

    #set.seed(1000)

    #wordcloud(words=d$word,freq=d$freq,min.freq= 2,max.words=30,random.order=FALSE,rot.per=0.35,colors=brewer.pal( 8,"Dark2"))

  • () Theme

    2.

    2.

  • () Theme 2.

    STEP 1STEP 2STEP 3""STEP 4""STEP 5""

    http://andrew.ga/works/TextMining/

  • () Theme 2. STEP 1

    http://www.appledaily.com.tw/realtimenews/arti

    cle/life/20161016/968938/

  • () Theme 2. STEP 1

  • () Theme 2. STEP 2

    RStudio#install.packages("rvest")#""install.packages("jiebaR")#""install.packages("tm")#""install.packages("wordcloud2")#""

    #library("rvest")library("jiebaR")library("tm")library("wordcloud2")

  • () Theme 2. STEP 3""

    Chrome "" ( "F12")

  • () Theme 2. STEP 3""

    1

    2

    "Copy Xpath"

  • () Theme 2. STEP 3""

    Xpath ()

    //*[@id="summary"] Xpath

  • () Theme 2. STEP 3""

    #"source.page"source.page

  • () Theme 2. STEP 3""

    space_tokenizer= function(x){unlist(strsplit(as.character(x[[ 1]]),'[[:space:]]+' ))}

    jieba_tokenizer= function(d){unlist(segment(d[[ 1]],mixseg)) }

    #CNCorpus####CNCorpusFunctionStart#### CNCorpus= function(d.vec){doc

  • () Theme 2.

    content.corpus=CNCorpus(list(content.vec))#CNCorpuscontent.corpus

  • () Theme 2.

    frequency

  • () Reference

    () Reference

  • http://datascienceandr.org/

    () Reference

    1. R - Wush WuChih Cheng LiangJohnson Hsieh

    2. R - &http://goo.gl/18mwug

    3. R - https://goo.gl/NPdzzP

    1. DataCamphttps://www.datacamp.com/

    2. R for Data Sciencehttp://r4ds.had.co.nz/

  • R

    Jared P. Lander

    () Reference

  • Taiwan R User Grouphttps://www.facebook.com/Tw.R.User/

    https://www.facebook.com/twdsconf/

    / Data Visualizationhttps://www.facebook.com/data.visualize/

    () Reference

  • Q & A