introduction of rmecab

第２回Japan.R

RMeCabで、テキスト解析を

行う@gepuro

自己紹介早川　敦士電気通信大学

システム工学科三年

●学祭でジャンク市をやったり、

●合宿で花火を打ち上げたりしてます。

● 富士山に登ったり、

● 部誌を書いたり、

● 2011 年度 S-PLUS学生研究奨励賞で特別賞を頂いたり、

● DBCLSでバイトしたり、

してます。

興味

テキストマイニングデータマイニング統計学品質管理

自己紹介

ブログhttp://d.hatena.ne.jp/gepuro/

Twitter @gepuro

RMeCabって？

テキストマイニングの為のツールで

RからMeCabを呼び出して使用するインターフェースです。

http://rmecab.jp/wiki/index.php?RMeCabからRMeCab_0.98_R_x86_64-unknown-linux-gnu.tar.gzをダウンロードして、>install.packages(“RMeCab_0.98_R_x86_64-unknown-linux-gnu.tar.gz”,destdir=”,”,repos=NULL)でインストールできる。

詳しくは、上記のサイトで。

インストール

形態素解析> rlt <- RMeCabC("お腹が空いた",0)

> unlist(rlt)

名詞助詞動詞助動詞

"お腹" "が" "空い" "た"

> rlt <- RMeCabC("お腹が空いた",1)

> unlist(rlt)

名詞助詞動詞助動詞

"お腹" "が" "空く" "た"

ターム・文書行列をつくる> novel <- docMatrix("novel",c("名詞","形容詞"))

> novel[4:15,] docsterms bocchan_NATUME hana_AKUTAGAWA kokoro_NATUME [[LESS-THAN-1]] 0 0 0 [[TOTAL-TOKENS]] 12492 1646 34937 am 1 0 0 glad 1 0 0 see 1 0 0 to 1 0 0 you 1 0 0 —— ？ 1 0 0 あいつ 5 0 0 あした 1 0 0 あすこ 3 0 2 あそこ 1 0 0

ターム・文書行列をつくる

docMatrixdocMatrixの引数の引数

minFreq=n:n回以上出現するタームを出力kigo=1:記号を総語数にカウントするweight:重み付け　“tf*idf,”tf*idf*norm”dic:ユーザー辞書の指定co:共起語の行列を作るなどなど・・・

参考

Rによるテキストマイニング入門

著：石田　基広

出版社：森北出版株式会社

RとLinuxと・・・http://rmecab.jp/wiki/index.php?RMeCab

ご清聴ありがとうございました。

Webからコーパスを収集するのに良いツールor

データクリーニングに関する教科書・サイト

をご存知でしたら、ご教授願います。

introduction of rmecab

Technology

introduction of product - home.komatsu · introduction of...

introduction of horse

introduction of yochiyochi.rb

introduction of cf9000s

introduction of boton

basic introduction of blast

introduction of ipv6

introduction of zigopuri

introduction of cardiovascular surgery

introduction of irumme

introduction of bronchitis

introduction of kfpp

introduction of riotjs

introduction of marketing

introduction of myself

introduction of proyoung vproerp

introduction of cadcam

introduction of

introduction of iiw introduction of kwjs the 1 st ...

introduction of openpear