jinseog kim dep. of applied statistics, dongguk university...

98
R: 석소Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] R for big data 1 / 98

Upload: duongngoc

Post on 10-Apr-2018

229 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

R: 통계 및 빅데이터 분석 소프트웨어

Jinseog KimDep. of Applied Statistics, Dongguk University

Email: [email protected]

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 1 / 98

Page 2: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

R의 다운로드 및 설치

R의 소개

1995년 Robert Gentleman과 Ross Ihaka(뉴질랜드 오클랜드 대학)에 의해서 개발

1970년 중반 AT&T의 벨 연구소에서 개발된 S언어를 기반으로 함

무료 공개 소프트웨어

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 2 / 98

Page 3: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

R의 다운로드 및 설치

R의 설치

http://www.r-project.org/

Figure : R homepage

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 3 / 98

Page 4: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

R의 다운로드 및 설치

R의 설치

http://www.r-project.org/

Figure : R homepage

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 4 / 98

Page 5: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

R의 다운로드 및 설치

R의 실행

바탕화면 더블클릭

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 5 / 98

Page 6: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

R의 다운로드 및 설치

R의 실행

“>”(command prompt) 이후에 명령어 입력

엔터키(Enter)를 입력하면 명령이 수행

명령이 종료되지 않은 경우 “+”가 나타나며 계속하여 명령을 입력

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 6 / 98

Page 7: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

R의 다운로드 및 설치

R의 실행

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 7 / 98

Page 8: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

R의 다운로드 및 설치

Rstudio의 설치

개발자를 위한 통합환경(IDE), http://www.rstudio.com/

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 8 / 98

Page 9: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

R 객체

R의 기초 용어 및 유틸리티

객체(object): R에서는 자료, 함수, 연산자등은 모두 객체, 메모리에 저장

R 작업공간(workspace): 작업중 만들어지는 객체들의 모임

ls() : 객체들의 리스트

ls() #

## character(0)

rm() : R 객체를 삭제

x <- 1

y <- 1:10

ls()

## [1] "x" "y"

rm(x,y)

ls()

## character(0)

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 9 / 98

Page 10: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

R 객체

R의 기초 용어 및 유틸리티

R 작업공간(R workspace): R을 이용하여 작업하는 동안 만들어지는 객체(object)들의 모임(collection)

help()는 R 객체들에 대한 도움말을 출력해 주며, help() 대신에 ?객체명을 사용할 수

있다.

help(ls)

?ls

작업디렉토리(working directory)의 확인 및 변경

getwd()

## [1] "D:/Dropbox/bigdata/lectures"

#setwd("D:/share/lectures/R-note")

현재 작업공간(working directory)은 save.image()를 이용하여 저장되며, 이 때작업디렉토리에는 .RData라는 파일이 생성된다.

save.image()

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 10 / 98

Page 11: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

R 객체

R의 기초 용어 및 유틸리티

R패키지: R의 확장기능 이용 ⇐ R패키지 추가 설치search()는 설치된 R패키지들을 확인하는 명령

search()

## [1] ".GlobalEnv" "package:knitr" "package:stats"

## [4] "package:graphics" "package:grDevices" "package:utils"

## [7] "package:datasets" "package:methods" "Autoloads"

## [10] "package:base"

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 11 / 98

Page 12: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

R 객체

R의 기초 용어 및 유틸리티

library(): R에 설치된 모든 패키지 및 설명

library()

library(package_name)는 새로운 패키지를 현재 R세션으로 불러들는 함수

library(MASS)

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 12 / 98

Page 13: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

R 객체

R의 기초 용어 및 유틸리티

install.packages(): R에 새로운 패키지 설치

install.packages("stringr")

help(), ?: 함수 및 객체에 대한 도움말

help("ls")

?ls

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 13 / 98

Page 14: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

R 객체(R objects)

R 객체

R 객체에는 아래와 같은 종류들이 있음

atomic(상수)

vector(벡터)

matrix(행렬)

list(리스트)

data.frame(데이터프레임)

function(함수)

operator(연산자) ...

R 객체들 중에서 데이터를 저장하기 위한 객체 atomic, vector, matrix, data.frame ⇒데이터객체 (data object)

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 14 / 98

Page 15: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

R 객체(R objects)

데이터객체의 타입(Type)

실수형(double)

정수형(integer)

문자형(character)

논리형(logical),

복소수형(complex number)

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 15 / 98

Page 16: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

R 객체(R objects)

데이터객체 저장모드(storage mode)

수치형(numeric) : 실수형, 정수형

문자형(character)

논리형(logical)

복소수형(complex number)

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 16 / 98

Page 17: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

R 객체(R objects)

데이터객체 클래스(class)

실수형(double)

정수형(integer)

문자형(character)

논리형(logical),

팩터형(factor): 혈액형, 성별등 범주형자료의 표현

행렬(matrix)

리스트(list)

데이터프레임(data.frame)

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 17 / 98

Page 18: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

R 객체(R objects)

데이터객체 예제

실수형(double) / 정수형(integer)

typeof(10L);mode(10L)

## [1] "integer"

## [1] "numeric"

typeof(10);mode(10)

## [1] "double"

## [1] "numeric"

문자형(character)

typeof("Hello World"); mode("Hello World")

## [1] "character"

## [1] "character"

논리형(logical)

typeof(2 < 4); mode(2 < 4)

## [1] "logical"

## [1] "logical"

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 18 / 98

Page 19: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

자료구조 (Data structure)

벡터 (vector)

벡터는 하나 이상의 원소로 이루어진 자료

벡터를 구성하는 각 원소는 그 유형(data type)이 동일해야 함⇒ (1,2,"a","b")는 잘못된 벡터

x1 <- c(1,2,3,4)

x2 <- 1:3

x3 <- c("A", "B", "C")

y <- c(x1, 0, x2) # 1,2,3,4, 0, 1,2,3

c(,...,)는 벡터를 생성하는 함수

: 는 연속된 정수벡터를 생성하는 연산자

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 19 / 98

Page 20: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

자료구조 (Data structure)

벡터 (vector)

벡터를 생성하는 함수로는 아래와 같은 것들이 있다.

rep : 반복

rep(2, 10)

## [1] 2 2 2 2 2 2 2 2 2 2

rep(c(1,2), each=5)

## [1] 1 1 1 1 1 2 2 2 2 2

seq : 등차 수열 생성

seq(0, 1, length=11)

## [1] 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

1에서 9까지 2씩 증가하는 숫자로 이루어진 벡터를 만듬

seq(1, 9, by = 2)

## [1] 1 3 5 7 9

numeric, double, integer, character: 속성이 numeric, double, integer, 혹은 character인벡터를 괄호안의 수만큼 할당함

integer(length = 10)

## [1] 0 0 0 0 0 0 0 0 0 0

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 20 / 98

Page 21: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

자료구조 (Data structure)

벡터의 클래스

numeric: 연속형

factor: 범주형

ordered: 순서있는 범주형

R code mode(x) class(x)x <- c(1:10) ”numeric” ”numeric”

x <- factor(1:10) ”numeric” ”factor”x <- ordered(1:10) ”numeric” ”ordered” ”factor”

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 21 / 98

Page 22: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

자료구조 (Data structure)

벡터의 인덱싱

R 데이터오브젝트의 component를 접근하는 방법은 아래와 같이 인덱스와 component이름을이용한다.

object[ arg1, ... , argn ] # for vector, matrix, array

object[[ arg1, ... , argn ]] # for list

object$tag # for data.frame or named list

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 22 / 98

Page 23: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

자료구조 (Data structure)

matrix(행렬)

matrix()함수 이용

X1 <- matrix(1:20, nrow=2, ncol=5); X1

## [,1] [,2] [,3] [,4] [,5]

## [1,] 1 3 5 7 9

## [2,] 2 4 6 8 10

diag()함수이용 대각행렬 생성

X2 <- diag(1, 5); X2

## [,1] [,2] [,3] [,4] [,5]

## [1,] 1 0 0 0 0

## [2,] 0 1 0 0 0

## [3,] 0 0 1 0 0

## [4,] 0 0 0 1 0

## [5,] 0 0 0 0 1

X2 <- diag(10)

X2 <- diag(1:10)

X2 <- diag(c(1,3,5,7,9))

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 23 / 98

Page 24: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

자료구조 (Data structure)

행렬/벡터의 결합

열단위 결합

x <- c(1,2,3); y <- c(4,5,6)

cbind(x,y)

## x y

## [1,] 1 4

## [2,] 2 5

## [3,] 3 6

행단위 결합

rbind(x,y)

## [,1] [,2] [,3]

## x 1 2 3

## y 4 5 6

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 24 / 98

Page 25: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

자료구조 (Data structure)

리스트(list)

List는 서로 다른 R오브젝트들을 원소(component)로 가지는 오브젝트

숫자벡터, 논리값, 행렬, 문자, 배열, 함수등 모든 R오브젝트가 리스트의 원소가 될 수 있다.

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 25 / 98

Page 26: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

자료구조 (Data structure)

리스트의 생성

list()를 이용

list(name_1=object_1, ..., name_m=object_m)

여기서 name_1... name_m은 콤포넌트의 이름object_1... 은 콤포넌트 값

Lst <- list(name="fred", wife="mary", child.ages=c(4,7,9))

Lst

## $name

## [1] "fred"

##

## $wife

## [1] "mary"

##

## $child.ages

## [1] 4 7 9

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 26 / 98

Page 27: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

자료구조 (Data structure)

구성요소에 대한 접근방법

[[]]

Lst[[1]]

## [1] "fred"

구성요소 이름이 있는 경우

Lst[["name"]]; # or Lst£name

## [1] "fred"

서브리스트(sub-list)

Lst[2:3]

## $wife

## [1] "mary"

##

## $child.ages

## [1] 4 7 9

콤포넌트의 개수: length()

length(Lst)

## [1] 3

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 27 / 98

Page 28: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

자료구조 (Data structure)

리스트의 결합

c() : 벡터의 생성 또는 결합과 동일

list1 <- list(a1=1, b1=1:3)

list2 <- list(a2=c("Kim", "Park"))

c(list1, list2)

## $a1

## [1] 1

##

## $b1

## [1] 1 2 3

##

## $a2

## [1] "Kim" "Park"

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 28 / 98

Page 29: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

자료구조 (Data structure)

데이터프레임(data frame)

데이터프레임은 아래의 특징을 가지는 리스트

벡터, 펙터(factor), 행렬, 리스트 또는 다른 데이터프레임을 구성요소로 가짐

행렬, 리스트 그리고 데이터프레임의 행, 구성요소 또는 변수는 새로운 데이터프레임의 행,구성요소 또는 변수

벡터(숫자, 문자등)는 데이터프레임의 열

데이터 프레임에 포함된 변수(열)는 길이가 모두 동일

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 29 / 98

Page 30: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

자료구조 (Data structure)

데이터 프레임 만들기

data.frame()

name <- c("kim","lee","park","Oh")

sex <- c('f','m','f','m')

income <- c(100,102,300,204)

d1 <- data.frame(name=name, gender=sex, incom=income)

d1

## name gender incom

## 1 kim f 100

## 2 lee m 102

## 3 park f 300

## 4 Oh m 204

as.data.frame() 리스트나 행렬을 데이터프레임으로 변환

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 30 / 98

Page 31: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

자료구조 (Data structure)

데이터 프레임관련 함수

앞줄 보기

head(d1, 2)

## name gender incom

## 1 kim f 100

## 2 lee m 102

변수명 출력

names(d1)

## [1] "name" "gender" "incom"

데이터 차원출력

nrow(d1) # number of rows

## [1] 4

ncol(d1) # number of columns

## [1] 3

dim(d1) # row and column dimension

## [1] 4 3

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 31 / 98

Page 32: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

자료구조 (Data structure)

형변환 함수

as.numeric()

as.character()

as.matrix()

as.data.frame()

unlist()

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 32 / 98

Page 33: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

파일에서 데이터 읽어오기

외부 text 파일

외부파일을 다음의 형식을 만족

파일의 첫 번째 줄은 변수명을 지정

관측치을 변수명에 대응하는 순서대로 입력

예) 위의 형식에 의하여 작성된 외부파일(titanic.txt)

Surv N Class Age Sex

20 23 Crew Adult Female

192 862 Crew Adult Male

1 1 First Child Female

5 5 First Child Male

13 13 Second Child Female

...

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 33 / 98

Page 34: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

파일에서 데이터 읽어오기

read.table() 함수

예제 데이터를 데이터프레임(titanic)으로 변환

titanic <- read.table("data/titanic.txt", header=T)

head(titanic)

## Surv N Class Age Sex

## 1 20 23 Crew Adult Female

## 2 192 862 Crew Adult Male

## 3 1 1 First Child Female

## 4 5 5 First Child Male

## 5 140 144 First Adult Female

## 6 57 175 First Adult Male

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 34 / 98

Page 35: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

파일에서 데이터 읽어오기

read.table() 함수

예제 데이터를 데이터프레임(titanic)으로 변환

titanic <- read.table("data/titanic.txt", header=T)

head(titanic)

## Surv N Class Age Sex

## 1 20 23 Crew Adult Female

## 2 192 862 Crew Adult Male

## 3 1 1 First Child Female

## 4 5 5 First Child Male

## 5 140 144 First Adult Female

## 6 57 175 First Adult Male

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 35 / 98

Page 36: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

파일에서 데이터 읽어오기

read.csv() 함수

원본 데이터를 Excel 파일로 편집하는 경우 다음과 같은 방식으로 R data.frame으로 불러들일 수있다. as a CSV file (Comma Separated Values).

CSV(Comma Separated Values)형식으로 저장: 파일 ⇒ Save As:

아래의 함수를 이용한다.

my.table=read.csv(file.choose()) ## using dialog box

my.table=read.csv("c:/xfile.csv") ## file name

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 36 / 98

Page 37: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

파일에서 데이터 읽어오기

scan() 함수

아래와 같이 텍스트 데이터 파일(’input.dat’) 입력되어 있다고 하자.

52.00 54.75 57.50

57.50 59.75 111.0

128.0 101.0 131.0 93.0

이러한 파일을 읽기 위해서 scan() 함수를 이용한다.

inp <- scan("input.dat")

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 37 / 98

Page 38: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

파일에서 데이터 읽어오기

edit() 함수

기존의 데이터(olddata)를 수정할 때

newdata <- edit(olddata)

새로운 데이터를 편집할 때:

xnew <- edit(data.frame())

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 38 / 98

Page 39: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

파일에서 데이터 읽어오기

RODBC를 이용한 엑셀파일 접근

R에서 엑셀파일을 연결하는 함수로 엑셀파일의 버전에 따라 odbcConnectExcel과

odbcConnectExcel2007를 제공한다.

odbcConnectExcel(xls.file, readOnly = TRUE, ...)

odbcConnectExcel2007(xls.file, readOnly = TRUE, ...)

예를 들어 W:/data/에 API2.xls파일을 접속하는 코드는 아래와 같다.

x.con = odbcConnectExcel("T:/data/test.xlsx",

readOnly=F)

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 39 / 98

Page 40: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

파일에서 데이터 읽어오기

ODBC: Excel 파일정보 확인

> odbcGetInfo(con)

DBMS_Name DBMS_Ver Driver_ODBC_Ver Data_Source_Name Driver_Name Driver_Ver

"EXCEL" "08.00.0000" "03.51" "" "odbcjt32.dll" "04.00.6305"

ODBC_Ver Server_Name

"03.52.0000" "EXCEL"

> tbls=sqlTables(con)

TABLE_CAT TABLE_SCHEM TABLE_NAME TABLE_TYPE REMARKS

1 W:\\data\\API2 <NA> API$ SYSTEM TABLE <NA>

sqlTables함수는 접속한 엑셀파일의 정보를 보여주며, 특히 여기서 TABLE_NAME항목은

엑셀파일에 있는 워크쉬트의 이름을 알려준다. 여기서 주의할 점은 실제 엑셀파일의워크시트이름은 API이지만 sqlTables함수에서 보여주는 이름은 API$이며, 쿼리문을 사용할때는 [API$]를 이용한다.

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 40 / 98

Page 41: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

파일에서 데이터 읽어오기

ODBC: SQL을 이용한 데이터 검색

sqlQuery함수: 엑셀파일의 워크시트를 읽어오기 위한 RODBC 함수, 함수의 인수는 SQL문

SQL문에 따라 분석에 필요한 자료를 생성할 수 있다.

sqlQuery함수의 수행결과는 R 데이터프레임 객체로 변환되어 저장되며,

이후 R에서는 이 객체를 이용하여 다양한 분석을 하게 된다.

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 41 / 98

Page 42: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

파일에서 데이터 읽어오기

ODBC: sqlQuery를 이용한 검색

> a=sqlQuery(con, "select * from [API$]", as.is=T)

> head(a)

id type name region api100 api99 diff nstud

1 01611190130229 H Alameda High Alameda 731 693 38 1090

2 01611190132878 H Encinal High Alameda 622 589 33 840

3 01611196000004 M Chipman Middle Alameda 622 572 50 472

4 01611196090005 E Lum (Donald D.) Alameda 774 732 42 272

5 01611196090013 E Edison Elementa Alameda 811 784 27 216

6 01611196090021 E Otis (Frank) El Alameda 780 725 55 247

> b=sqlQuery(con, "select id,region, api100,api99 from [API$] where type='H'", as.is=T)

> head(b)

id region api100 api99

1 01611190130229 Alameda 731 693

2 01611190132878 Alameda 622 589

3 01611270130450 Alameda 789 773

4 01611430131177 Alameda 716 728

5 01611500132225 Alameda 741 723

6 01611680132746 Alameda 491 443

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 42 / 98

Page 43: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

파일에서 데이터 읽어오기

ODBC

또한, 엑셀파일 접속시 옵션 readOnly=F를 사용한 경우, 아래처럼 SQL의 update문을 사용할 수

있다.

> sqlQuery(con, "update [API$] set type='H' where id='01611190130229'")

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 43 / 98

Page 44: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

파일에서 데이터 읽어오기

XLConnect

XLConnect는 R에서 Microsoft Excel 데이터를 핸들링하기 위한 패키지로 다양한 OS에서 사용할수 있다.

library(XLConnect)

df <- readWorksheetFromFile("<file name and extension>",

sheet=1,

startRow = 4,

endCol = 2)

wb <- loadWorkbook("<name and extension of your file>")

df <- readWorksheet(wb, sheet=1)

sheet : sheet name or index.

startRow/startCol: row or column the data set should be imported,

endRow/endCol

region: range (eg A5:B5)

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 44 / 98

Page 45: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

파일에서 데이터 읽어오기

XLConnect

XLConnect는 R에서 Microsoft Excel 데이터를 핸들링하기 위한 패키지로 다양한 OS에서 사용할수 있다.

library(XLConnect)

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 45 / 98

Page 46: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

파일에서 데이터 읽어오기

XLConnect

XLConnect는 R에서 Microsoft Excel 데이터를 핸들링하기 위한 패키지로 다양한 OS에서 사용할수 있다.

# Excel 파일

demoExcelFile <- system.file("demoFiles/mtcars.xlsx",

package = "XLConnect")

# 엑셀파일 로딩

wb <- loadWorkbook(demoExcelFile)

# 엑셀파일의 'mtcars'시트에서 데이터를 읽어옴

dt <- readWorksheet(wb, sheet = "mtcars")

head(dt)

## mpg cyl disp hp drat wt qsec vs am gear carb

## 1 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4

## 2 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4

## 3 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1

## 4 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1

## 5 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2

## 6 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 46 / 98

Page 47: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

파일에서 데이터 읽어오기

XLConnect

iris data.frame을 XLConnect를 이용하여 품종별로 엑셀의 서로다른 워크시트에 저장하는프로그램

# Load workbook (create if not existing)

wb <- loadWorkbook("iris.xlsx", create = TRUE)

## Error: IllegalArgumentException (Java): Your InputStream was neither an OLE2

stream, nor an OOXML stream

Species <- as.character(unique(iris$Species))

for(sp in Species){# Create worksheet

createSheet(wb, name = sp)

# Write data to worksheet (메모리에만 저장되며 파일에는 저장되지 않음)

writeWorksheet(wb, iris[iris$Species==sp,],

sheet = sp, header=TRUE)

}# 아래 코드에 의해 파일로 저장됨

saveWorkbook(wb)

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 47 / 98

Page 48: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

파일에서 데이터 읽어오기

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 48 / 98

Page 49: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

R 연산자/내장함수

R 연산자

Operator Descriptions- , +, *, / Minus,Plus, Multiplication, Division

%% Modulus(나머지연산)%/% Integer division(정수나누기의 몫)< Less than> Greater than== Equal to>= Greater than or equal to<= Less than or equal to! Unary not^ Exponentiation& And, vectorized&& And| Or, vectorized|| Or<- Left assignment= Left ssignment-> Right assignment<<- global assignment(함수 외부의 변수값 지정)

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 49 / 98

Page 50: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

R 연산자/내장함수

R 연산자 예제

x <- c(1, 10, 13, 3)

x %% 2

## [1] 1 0 1 1

x%/% 3

## [1] 0 3 4 1

x > 3

## [1] FALSE TRUE TRUE FALSE

y <- c(3, 5, 2, 1)

x>y

## [1] FALSE TRUE TRUE TRUE

z <- TRUE

!z

## [1] FALSE

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 50 / 98

Page 51: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

R 연산자/내장함수

R 연산자 예제

x1 <- x%%2; x1

## [1] 1 0 1 1

y1 <- y%%2; y1

## [1] 1 1 0 1

x1 | y1

## [1] TRUE TRUE TRUE TRUE

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 51 / 98

Page 52: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

R 연산자/내장함수

내장함수(built-in functions)

함수 R 함수제곱근 sqrt지수함수 exp로그함수 log(5), log2(5), log10(5), log(5, base=3)최대값 max, pmax최소값 min, pmin합 sum평균 mean절대값 abs누적연산 cummax, cummin, cumprod, cumsum삼각함수 sin, cos, tan

올림,반올림... ceiling, round, trunc, floor

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 52 / 98

Page 53: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

R 연산자/내장함수

R built-in functions

a <- 1:5

sqrt(a)

## [1] 1.000000 1.414214 1.732051 2.000000 2.236068

exp(a)

## [1] 2.718282 7.389056 20.085537 54.598150 148.413159

out <- (a + sqrt(a))/(exp(2)+1); out

## [1] 0.2384058 0.4069842 0.5640743 0.7152175 0.8625604

x1 <- seq(-2, 4, by = .5); x1

## [1] -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0

floor(x1)

## [1] -2 -2 -1 -1 0 0 1 1 2 2 3 3 4

trunc(x1)

## [1] -2 -1 -1 0 0 0 1 1 2 2 3 3 4

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 53 / 98

Page 54: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

R 연산자/내장함수

R built-in functions

a <- c(1,-2,3,-4)

b <- c(-1,2,-3,4)

min(a,b)

## [1] -4

pmin(a,b)

## [1] -1 -2 -3 -4

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 54 / 98

Page 55: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

R 연산자/내장함수

Other built-in functions

print(): Prints a single R object

a <- c(5,3,6,2,4)

print(a)

## [1] 5 3 6 2 4

cat(): Prints multiple objects, one after the other

cat("mean of a is ",mean(a), "variance of a is ", var(a),"\n")

## mean of a is 4 variance of a is 2.5

unique():Gives the vector of distinct values

x <- c(1,5,1,3,5,7,5)

unique(x)

## [1] 1 5 3 7

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 55 / 98

Page 56: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

R 연산자/내장함수

Other built-in functions

diff(): Replace a vector by the vector of first differences

diff(x)

## [1] 4 -4 2 2 2 -2

sort(): Sort elements into order, but omitting NAs

order(): x[order(x)] orders elements of x, with NAs last

rev(): reverse the order of vector elements

print(x)

## [1] 1 5 1 3 5 7 5

sort(x)

## [1] 1 1 3 5 5 5 7

order(x)

## [1] 1 3 4 2 5 7 6

rev(x)

## [1] 5 7 5 3 1 5 1

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 56 / 98

Page 57: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

R 데이터 핸들링

인덱싱(indexing)

x <- sample(1:10, 15, rep=T)

x

## [1] 1 9 3 1 2 4 5 1 7 6 8 4 1 5 8

others <- (x > 1)

others

## [1] FALSE TRUE TRUE FALSE TRUE TRUE TRUE FALSE TRUE TRUE TRUE

## [12] TRUE FALSE TRUE TRUE

x[others]

## [1] 9 3 2 4 5 7 6 8 4 5 8

ind <- which(x > 1)

ind

## [1] 2 3 5 6 7 9 10 11 12 14 15

x[ind]

## [1] 9 3 2 4 5 7 6 8 4 5 8

x[!others]

## [1] 1 1 1 1

x[-ind]

## [1] 1 1 1 1

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 57 / 98

Page 58: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

R 데이터 핸들링

subscripting(데이터에서 일부분을 추출)

USArrests data :This data set contains statistics, in arrests per 100,000 residents for assault, murder, andrape in each of the 50 US states in 1973. Also given is the percent of the population living inurban areas.

A data frame with 50 observations on 4 variables.1 Murder: numeric Murder arrests (per 100,000)2 Assault: numeric Assault arrests (per 100,000)3 UrbanPop: numeric Percent urban population4 Rape: numeric Rape arrests (per 100,000)

head(USArrests,3)

## Murder Assault UrbanPop Rape

## Alabama 13.2 236 58 21.2

## Alaska 10.0 263 48 44.5

## Arizona 8.1 294 80 31.0

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 58 / 98

Page 59: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

R 데이터 핸들링

subscripting

Numeric subscripts

# Top 5 states with high murder rate

nidx <- order(USArrests$Murder, decreasing=T)[1:5]

nidx

## [1] 10 24 9 18 40

USArrests[nidx,]

## Murder Assault UrbanPop Rape

## Georgia 17.4 211 60 25.8

## Mississippi 16.1 259 44 17.1

## Florida 15.4 335 80 31.9

## Louisiana 15.4 249 66 22.2

## South Carolina 14.4 279 48 22.5

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 59 / 98

Page 60: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

R 데이터 핸들링

subscripting

Logical subscripts

lidx <- (USArrests$Murder

< quantile(USArrests$Murder, 0.1))

head(lidx, 10)

## [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

USArrests[lidx,]

## Murder Assault UrbanPop Rape

## Iowa 2.2 56 57 11.3

## Maine 2.1 83 51 7.8

## New Hampshire 2.1 57 56 9.5

## North Dakota 0.8 45 44 7.3

## Vermont 2.2 48 32 11.2

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 60 / 98

Page 61: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

R 데이터 핸들링

subset

subset함수

subset(USArrests, UrbanPop > 85)

## Murder Assault UrbanPop Rape

## California 9.0 276 91 40.6

## New Jersey 7.4 159 89 18.8

## New York 11.1 254 86 26.1

## Rhode Island 3.4 174 87 8.3

subset(USArrests, UrbanPop < 40 & Murder < 10,

select = c(Assault, Rape))

## Assault Rape

## Vermont 48 11.2

## West Virginia 81 9.3

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 61 / 98

Page 62: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

R 데이터 핸들링

데이터 결합

authors

## surname nationality

## 1 Tukey US

## 2 Venables Australia

## 3 Tierney US

## 4 Ripley UK

## 5 McNeil Australia

books

## name title

## 1 Tukey Exploratory Data Analysis

## 2 Venables Modern Applied Statistics ...

## 3 Tierney LISP-STAT

## 4 Ripley Spatial Statistics

## 5 Ripley Stochastic Simulation

## 6 McNeil Interactive Data Analysis

## 7 R Core An Introduction to R

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 62 / 98

Page 63: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

R 데이터 핸들링

merge() : 데이터 결합 (2)

authors의 ”surname”과 authors, books의 ”name”을 키로 결합

m1 <- merge(authors, books, by.x = "surname", by.y = "name")

m1

## surname nationality title

## 1 McNeil Australia Interactive Data Analysis

## 2 Ripley UK Spatial Statistics

## 3 Ripley UK Stochastic Simulation

## 4 Tierney US LISP-STAT

## 5 Tukey US Exploratory Data Analysis

## 6 Venables Australia Modern Applied Statistics ...

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 63 / 98

Page 64: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

R 데이터 핸들링

aggregate() : 데이터 요약

Splits the data into subsets, computes summary statistics for each

aggregate(요약변수, list(그룹화변수), 요약함수)

aggregate(x = testDF, by = list(fby1, fby2), FUN = "mean")

aggregate(Sepal.Length~Species, data=iris, FUN=mean)

## Species Sepal.Length

## 1 setosa 5.006

## 2 versicolor 5.936

## 3 virginica 6.588

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 64 / 98

Page 65: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

R 데이터 핸들링

apply() : 데이터 요약

Apply Functions Over Array Margins

apply(array, MARGIN, FUN, ...)

apply(iris[, 1:4], 2, mean)

## Sepal.Length Sepal.Width Petal.Length Petal.Width

## 5.843333 3.057333 3.758000 1.199333

apply(iris[, 1:4], 1, sum)[1:10]

## [1] 10.2 9.5 9.4 9.4 10.2 11.4 9.7 10.1 8.9 9.6

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 65 / 98

Page 66: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

R 데이터 핸들링

lapply() : 벡터화에 의한 데이터 요약

Apply a Function over a List or Vector

lapply(vector or list, FUN, ...)

lapply(iris[1:4], mean) # lapply(iris[,1:4], mean)

## $Sepal.Length

## [1] 5.843333

##

## $Sepal.Width

## [1] 3.057333

##

## $Petal.Length

## [1] 3.758

##

## $Petal.Width

## [1] 1.199333

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 66 / 98

Page 67: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

R 데이터 핸들링

lapply() : 데이터 요약

lapply(1:4, function(i) mean(iris[,i]))

## [[1]]

## [1] 5.843333

##

## [[2]]

## [1] 3.057333

##

## [[3]]

## [1] 3.758

##

## [[4]]

## [1] 1.199333

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 67 / 98

Page 68: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

R 데이터 핸들링

data.table 객체생성

data.table(...)

library(data.table)

DT <- data.table(x=c("b","b","a","a"),v=rnorm(4))

DT

## x v

## 1: b 1.1902136

## 2: b -1.4447011

## 3: a -0.3003707

## 4: a -0.2358570

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 68 / 98

Page 69: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

R 데이터 핸들링

data.frame으로부터 data.table 객체생성

CARS <- data.table(cars)

head(CARS)

## speed dist

## 1: 4 2

## 2: 4 10

## 3: 7 4

## 4: 7 22

## 5: 8 16

## 6: 9 10

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 69 / 98

Page 70: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

R 데이터 핸들링

data.table 목록

tables()

## NAME NROW NCOL MB COLS KEY

## [1,] CARS 50 2 1 speed,dist

## [2,] DT 4 2 1 x,v

## Total: 2MB

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 70 / 98

Page 71: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

R 데이터 핸들링

Group summary

Iris <- data.table(iris)

names(Iris)

## [1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"

## [5] "Species"

Iris[, mean(Petal.Width), by="Species"]

## Species V1

## 1: setosa 0.246

## 2: versicolor 1.326

## 3: virginica 2.026

Iris[,lapply(.SD, mean),by=Species]

## Species Sepal.Length Sepal.Width Petal.Length Petal.Width

## 1: setosa 5.006 3.428 1.462 0.246

## 2: versicolor 5.936 2.770 4.260 1.326

## 3: virginica 6.588 2.974 5.552 2.026

tapply(iris$Petal.Width, iris$Species, mean)

## setosa versicolor virginica

## 0.246 1.326 2.026

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 71 / 98

Page 72: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

R 데이터 핸들링

sqldf 패키지

sqldf : R의 데이터프레임을 SQL 문법을 이용하여 조작

## Load the package

library(sqldf)

# Use the iris data set

sqldf('select count(*) `N`,

AVG("Sepal.Width") `Sepal.Length`

from iris group by Species')

## N Sepal.Length

## 1 50 3.428

## 2 50 2.770

## 3 50 2.974

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 72 / 98

Page 73: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

R 데이터 핸들링

sqldf

system.time({a8r <- aggregate(iris[1:2], iris[5], mean)

})

## user system elapsed

## 0 0 0

system.time({a8s <- sqldf('select Species,

avg("Sepal.Length") `Sepal.Length`,

avg("Sepal.Width") `Sepal.Width`

from iris group by Species')

})

## user system elapsed

## 0 0 0

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 73 / 98

Page 74: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

R 프로그래밍

조건문

조건문에 해당되는 표현으로는 다음의 3가지 종류가 있다.

if ( cond ) expr

if ( cond ) expr1 else expr2

if ( cond1 ) expr1

else if( cond2 ) expr2

else expr3

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 74 / 98

Page 75: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

R 프로그래밍

조건문

iris 데이터에서 Sepal.Length의 median을 구하고 Sepal.Length를 median보다 크면 ”L”, 작으면”S”가 되도록 하여라.

data(iris)

n=length(iris$Sepal.Length)

Sepal.Length.Cat = character(n)

Med=median(iris$Sepal.Length)

for(i in 1:n){if(iris$Sepal.Length[i]<Med) {

Sepal.Length.Cat[i] = "S"

} else {Sepal.Length.Cat[i] = "L";

}}Sepal.Length.Cat[1:10]

## [1] "S" "S" "S" "S" "S" "S" "S" "S" "S" "S"

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 75 / 98

Page 76: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

R 프로그래밍

순환문

순환문의 표현으로는 다음 세가지 표현을 사용한다.

while ( cond ) expr

repeat expr

for ( var in list ) expr

break : while, repeat, for에서 순환문을 끝내는 구문

next : 이후의 문장을 건너뛰고 다음 순환

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 76 / 98

Page 77: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

R 프로그래밍

순환문

표준정규분포에서 100개의 난수발생

x <- rnorm(10)

sum.positive <- 0

for(i in 1:length(x)){if(x[i] > 0) sum.positive <- sum.positive + x[i]

}sum.positive

## [1] 3.869031

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 77 / 98

Page 78: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

R 프로그래밍

함수의 작성

함수작성방법

function_name <- function(arg_1, arg_2, ...){

expression...;

return(...)

}

(mile을 km로 바꾸는 프로그램)

miles.to.km <- function(miles) miles*8/5

miles.to.km(175) # Approximate distance to Sydney, in miles

## [1] 280

- 만일 100, 200 300 miles를 kilometer로 바꾼다면

miles.to.km(c(100,200,300))

## [1] 160 320 480

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 78 / 98

Page 79: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

R 그래픽스

그래프를 위한 기본함수

고수준 함수: plot(), barplot(), boxplot(), hist(), pie(), persp()

저수준함수

점그리기: points()선그리기: lines(), abline(), arrows()문자출력: text()도형: rect(), ploygon()좌표축: axis()격자표현: grid()

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 79 / 98

Page 80: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

R 그래픽스

plot()

x <- rnorm(100, sd=2); y <- 0.3 + 2*x + rnorm(100, sd=1)

plot(x)

0 20 40 60 80 100

−4

−2

02

4

Index

x

0 20 40 60 80 100

−4

−2

02

4

Index

x

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 80 / 98

Page 81: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

R 그래픽스

bar plot

#par(mai=c(2,1,0.5,0.5))

pie.sales <- c(0.12, 0.3, 0.26, 0.16, 0.04, 0.12)

names(pie.sales) <- c("Blueberry", "Cherry", "Apple", "Boston Cream",

"Other", "Vanilla Cream")

barplot(pie.sales, las=2) #las=2: x-tick

Blu

eber

ry

Che

rry

App

le

Bos

ton

Cre

am

Oth

er

Van

illa

Cre

am0.00

0.05

0.10

0.15

0.20

0.25

0.30

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 81 / 98

Page 82: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

R 그래픽스

bar plot (2)

counts <- table(mtcars$vs, mtcars$gear)

#par(cex=1.5)

barplot(counts, main="Car Distribution by Gears and VS",

xlab="Number of Gears", col=c("darkblue","red"),

legend = rownames(counts), beside=TRUE)

3 4 5

01

Car Distribution by Gears and VS

Number of Gears

02

46

810

12

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 82 / 98

Page 83: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

R 그래픽스

bar plot (3)

#par(cex=1.5)

barplot(counts, main="Car Distribution by Gears and VS",

xlab="Number of Gears", col=c("darkblue","red"),

legend = rownames(counts))

3 4 5

10

Car Distribution by Gears and VS

Number of Gears

02

46

810

1214

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 83 / 98

Page 84: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

R 그래픽스

pie plot

pie(pie.sales)

Blueberry

Cherry

Apple

Boston Cream

Other

Vanilla Cream

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 84 / 98

Page 85: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

R 그래픽스

mtcar data

head(mtcars)

## mpg cyl disp hp drat wt qsec vs am gear carb

## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4

## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4

## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1

## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1

## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2

## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 85 / 98

Page 86: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

R 그래픽스

Dot Chart

dotchart(mtcars$mpg,labels=row.names(mtcars),

cex=0.7,

main="Gas Milage \nfor Car Models",

xlab="Miles Per Gallon")

Mazda RX4

Mazda RX4 Wag

Datsun 710

Hornet 4 Drive

Hornet Sportabout

Valiant

Duster 360

Merc 240D

Merc 230

Merc 280

Merc 280C

Merc 450SE

Merc 450SL

Merc 450SLC

Cadillac Fleetwood

Lincoln Continental

Chrysler Imperial

Fiat 128

Honda Civic

Toyota Corolla

Toyota Corona

Dodge Challenger

AMC Javelin

Camaro Z28

Pontiac Firebird

Fiat X1−9

Porsche 914−2

Lotus Europa

Ford Pantera L

Ferrari Dino

Maserati Bora

Volvo 142E

10 15 20 25 30

Gas Milage for Car Models

Miles Per Gallon

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 86 / 98

Page 87: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

R 그래픽스

Dot Chart (2)

idx <- order(mtcars$mpg)

dotchart(mtcars$mpg[idx],labels=row.names(mtcars)[idx], cex=1,

main="Gas Milage \nfor Car Models", xlab="Miles Per Gallon")

Cadillac FleetwoodLincoln ContinentalCamaro Z28Duster 360Chrysler ImperialMaserati BoraMerc 450SLCAMC JavelinDodge ChallengerFord Pantera LMerc 450SEMerc 450SLMerc 280CValiantHornet SportaboutMerc 280Pontiac FirebirdFerrari DinoMazda RX4Mazda RX4 WagHornet 4 DriveVolvo 142EToyota CoronaDatsun 710Merc 230Merc 240DPorsche 914−2Fiat X1−9Honda CivicLotus EuropaFiat 128Toyota Corolla

10 15 20 25 30

Gas Milage for Car Models

Miles Per Gallon

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 87 / 98

Page 88: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

R 그래픽스

par(): 그래프 옵션

x<-rnorm(100)

par(mfrow=c(1,2))

hist(x)

plot(x)

Histogram of x

x

Fre

quen

cy

−2 −1 0 1 2 3

05

1015

20

0 20 40 60 80 100

−2

−1

01

23

Index

x

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 88 / 98

Page 89: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

R 그래픽스

그래프을 이용한 요약

cars 데이터는 자동차의 속도(speed)와 정지시까지 거리(dist)

data(cars)

head(cars, 3)

## speed dist

## 1 4 2

## 2 4 10

## 3 7 4

tail(cars, 3)

## speed dist

## 48 24 93

## 49 24 120

## 50 25 85

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 89 / 98

Page 90: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

R 그래픽스

히스토그램(Histogram)

hist(cars$speed, nclass=8, main="Histogram", xlab="speed")

Histogram

speed

Fre

quen

cy

5 10 15 20 25

02

46

8

Figure : 히스토그램

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 90 / 98

Page 91: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

R 그래픽스

상자그림(box plot)

boxplot(Sepal.Length~Species, data=iris, main="Box plot")

setosa versicolor virginica

4.5

5.0

5.5

6.0

6.5

7.0

7.5

8.0

Box plot

Figure : 상자그림

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 91 / 98

Page 92: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

R 그래픽스

scatter plot (산점도)

plot(cars)

5 10 15 20 25

020

4060

8010

012

0

speed

dist

Figure : 산점도

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 92 / 98

Page 93: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

R 그래픽스

scatter plot

data(iris)

plot(Petal.Length ~ Sepal.Length, data=iris, bty="l",pch=20)

abline(a=0,b=1,lty=2,lwd=2)

abline(lm(Petal.Length ~ Sepal.Length, data=iris),lty=1,lwd=2)

4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0

12

34

56

7

Sepal.Length

Pet

al.L

engt

h

Figure : Sepal.Length v.s. Petal.Length

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 93 / 98

Page 94: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

R 그래픽스

scatter plot - pair()함수 이용

pairs(iris[,1:4], main = "Fisher's Iris Data",

pch = 21,bg = c("red","green3","blue")[unclass(iris$Species)])

Sepal.Length

2.0 2.5 3.0 3.5 4.0 0.5 1.0 1.5 2.0 2.5

4.5

5.5

6.5

7.5

2.0

2.5

3.0

3.5

4.0

Sepal.Width

Petal.Length

12

34

56

7

4.5 5.5 6.5 7.5

0.5

1.0

1.5

2.0

2.5

1 2 3 4 5 6 7

Petal.Width

Fisher's Iris Data

Figure : pair()함수의 이용

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 94 / 98

Page 95: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

R 그래픽스

정규확률플롯(QQ plot)

qqnorm(cars$speed)

qqline(cars$speed)

−2 −1 0 1 2

510

1520

25

Normal Q−Q Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 95 / 98

Page 96: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

R 그래픽스

3d plot

A data frame with 31 observations on 3 variables.

[,1] Girth numeric Tree diameter in inches

[,2] Height numeric Height in ft

[,3] Volume numeric Volume of timber in cubic ft

head(trees)

## Girth Height Volume

## 1 8.3 70 10.3

## 2 8.6 65 10.3

## 3 8.8 63 10.2

## 4 10.5 72 16.4

## 5 10.7 81 18.8

## 6 10.8 83 19.7

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 96 / 98

Page 97: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

R 그래픽스

3D plot

require(scatterplot3d)

scatterplot3d(trees, type="h", highlight.3d=TRUE,

angle=55, scale.y=0.7, pch=16, main="3 dimensional plot for trees data")

3 dimensional plot for trees data

8 10 12 14 16 18 20 22

1020

3040

5060

7080

60

65

70

75

80

85

90

Girth

Hei

ghtV

olum

e

Figure : trees 자료의 3차원 산점도

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 97 / 98

Page 98: Jinseog Kim Dep. of Applied Statistics, Dongguk University …datamining.dongguk.ac.kr/lectures/2017-1/dm/R-note-1… ·  · 2016-03-08Jinseog Kim Dep. of Applied Statistics, Dongguk

R 그래픽스

3D-파이차트 범주형 자료

slices <- c(18, 12, 4, 16, 8, 9, 12)

lbels <- c("US", "UK", "Australia", "Germany", "Canada", "India", "Korea")

library(plotrix)

pie3D(slices,labels=lbels,explode=0.1, main="3D Pie Chart", mar=c(4,0,3,0))

3D Pie Chart

US

UK

Australia

Germany

Canada India

Korea

Jinseog Kim Dep. of Applied Statistics, Dongguk University Email: [email protected] for big data 98 / 98