r language definition

33
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 安装 R 获取数据 统计绘图 统计计算 R 语言定义 R Language Definition OOP 黄湘云 Department of Statistics China University of Mining & Technology,Beijing May 4, 2016

Upload: -

Post on 08-Jan-2017

116 views

Category:

Software


3 download

TRANSCRIPT

Page 1: R Language definition

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

安装 R 包 获取数据 统计绘图 统计计算 R 语言定义

R Language DefinitionOOP

黄湘云

Department of StatisticsChina University of Mining & Technology,Beijing

May 4, 2016

Page 2: R Language definition

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

安装 R 包 获取数据 统计绘图 统计计算 R 语言定义

目录

1 安装 R 包

2 获取数据

3 统计绘图

4 统计计算

5 R 语言定义

Page 3: R Language definition

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

安装 R 包 获取数据 统计绘图 统计计算 R 语言定义

安装 R 包从指定网站安装 R 包install.packages("ggplot2",repos = "https://mirrors.tuna.tsinghua.edu.cn/CRAN/")install.packages("pim",repos = "http://www.r-forge.r-project.org/",lib = "D:/library/")

一行命令安装 cran 上所有的 Packages

install.packages(setdiff(available.packages()[,1],.packages(all.available = TRUE)))

安装 Bioconductor 上的 R 包

source("https://bioconductor.org/biocLite.R")library(BiocInstaller)biocLite("ggtree")

安装 Github 上的 R 包

library(devtools)install_github("ropensci/aRxiv")

Page 4: R Language definition

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

安装 R 包 获取数据 统计绘图 统计计算 R 语言定义

安装注意几项

依赖问题 主要针对离线安装和源码编译安装平台问题 有些 R 包只能装在类 unix 操作系统上软件问题 有些 R 包要求相应 R 软件版本

特定库依赖 如 gputools 包依赖 cuda 库

# OS informationSys.info()

sysname release version nodename"Windows" "10 x64" "build 10586" "LENOVO-PC"

machine login user effective_user"x86-64" "Xiangyun Huang" "Xiangyun Huang" "Xiangyun Huang"

Page 5: R Language definition

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

安装 R 包 获取数据 统计绘图 统计计算 R 语言定义

R 软件信息

> version

_platform x86_64-w64-mingw32arch x86_64os mingw32system x86_64, mingw32statusmajor 3minor 2.5year 2016month 04day 14svn rev 70478language Rversion.string R version 3.2.5 (2016-04-14)nickname Very, Very Secure Dishes

Page 6: R Language definition

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

安装 R 包 获取数据 统计绘图 统计计算 R 语言定义

目录

1 安装 R 包

2 获取数据

3 统计绘图

4 统计计算

5 R 语言定义

Page 7: R Language definition

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

安装 R 包 获取数据 统计绘图 统计计算 R 语言定义

R 软件系统自带的数据集

# view R data setslength(data(package = .packages(all.available = TRUE))$results[,3])## [1] 20969

# extended packageslibrary(ggplot2)data("diamonds")head(diamonds)

## carat cut color clarity depth table price x y z## 1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43## 2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31## 3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31## 4 0.29 Premium I VS2 62.4 58 334 4.20 4.23 2.63## 5 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75## 6 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48

Page 8: R Language definition

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

安装 R 包 获取数据 统计绘图 统计计算 R 语言定义

目录

1 安装 R 包

2 获取数据

3 统计绘图

4 统计计算

5 R 语言定义

Page 9: R Language definition

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

安装 R 包 获取数据 统计绘图 统计计算 R 语言定义

plot 函数–pch点

26 shaps of points

Page 10: R Language definition

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

安装 R 包 获取数据 统计绘图 统计计算 R 语言定义

plot 函数–lty–lwd线

0 10 20 30 40 50

−1.

5−

1.0

−0.

50.

00.

51.

01.

52.

0

Simple Use of Color In a Plot

Just a Whisper of a Label

Page 11: R Language definition

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

安装 R 包 获取数据 统计绘图 统计计算 R 语言定义

调色板–连续型

Blues

BuGn

BuPu

GnBu

Greens

Greys

Oranges

OrRd

PuBu

PuBuGn

PuRd

Purples

RdPu

Reds

YlGn

YlGnBu

YlOrBr

YlOrRd

Page 12: R Language definition

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

安装 R 包 获取数据 统计绘图 统计计算 R 语言定义

调色板–极端型

BrBG

PiYG

PRGn

PuOr

RdBu

RdGy

RdYlBu

RdYlGn

Spectral

Page 13: R Language definition

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

安装 R 包 获取数据 统计绘图 统计计算 R 语言定义

调色板–离散型

Accent

Dark2

Paired

Pastel1

Pastel2

Set1

Set2

Set3

Page 14: R Language definition

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

安装 R 包 获取数据 统计绘图 统计计算 R 语言定义

举个栗子

barplot(sample(seq(20),6,replace = T),col=brewer.pal(11, "BrBG")[3:8])0

510

1520

Page 15: R Language definition

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

安装 R 包 获取数据 统计绘图 统计计算 R 语言定义

再举个栗子

Set3 3 colors

VADeaths

Fre

quen

cy

10 20 30 40 50 60 70

0.0

0.5

1.0

1.5

2.0

2.5

3.0

Set2 3 colors

VADeaths

Fre

quen

cy

0 20 40 60 800

24

68

Set1 3 colors

VADeaths

Fre

quen

cy

0 20 40 60 80

01

23

45

6

Set3 8 colors

VADeaths

Fre

quen

cy

0 20 40 60 80 100

05

1015

Greys 8 colors

VADeaths

Fre

quen

cy

0 20 40 60 80

01

23

45

6

Greens 8 colors

VADeaths

Fre

quen

cy

0 20 40 60 800

12

34

56

Page 16: R Language definition

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

安装 R 包 获取数据 统计绘图 统计计算 R 语言定义

继续举个栗子

Sepal.Length

2.0 2.5 3.0 3.5 4.0 0.5 1.0 1.5 2.0 2.5

4.5

5.5

6.5

7.5

2.0

3.0

4.0

Sepal.Width

Petal.Length

12

34

56

7

0.5

1.5

2.5

Petal.Width

4.5 5.5 6.5 7.5 1 2 3 4 5 6 7 1.0 1.5 2.0 2.5 3.0

1.0

2.0

3.0

Species

Page 17: R Language definition

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

安装 R 包 获取数据 统计绘图 统计计算 R 语言定义

三维图–persp

X

−10

−5

0

5

10

Y

−10

−5

0

5

10Z

−2

0

2

4

6

8

.

z = Sinc( x2 + y2)

Page 18: R Language definition

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

安装 R 包 获取数据 统计绘图 统计计算 R 语言定义

添加数学公式

−3 −2 −1 0 1 2 3

0.0

0.1

0.2

0.3

0.4

Normal Probability Density Function

x

dnor

m(x

)

f(x) =1

2πσ2 e

−(x−µ)22σ2

−3 −2 −1 0 1 2 3

0.0

0.2

0.4

0.6

0.8

1.0

Normal Cumulative Distribution Function

x

cum

sum

(x)/

sum

(x)

φ(x) =1

2π ⌠⌡−∞

x

e(−t2 2)dt

Page 19: R Language definition

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

安装 R 包 获取数据 统计绘图 统计计算 R 语言定义

anscombe 数据

4 6 8 10 12 14

45

67

89

1011

x1

y 1

y1 = 3 + 0.5x1

4 6 8 10 12 14

34

56

78

9

x1

y 2

y2 = 3 + 0.5x1

4 6 8 10 12 14

68

1012

x1

y 3

y3 = 3 + 0.5x1

8 10 12 14 16 18

68

1012

x4

y 4

y4 = 3 + 0.5x4

σ̂2

= 1.531,Multiple R2 = 0.667,Adjusted R2 = 0.629

Page 20: R Language definition

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

安装 R 包 获取数据 统计绘图 统计计算 R 语言定义

barplot

cardiothoracic gastrointestinal neuro other trauma

Types of surgery

Abs

olut

e fr

eque

ncy

050

100

150

200

Page 21: R Language definition

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

安装 R 包 获取数据 统计绘图 统计计算 R 语言定义

barplot

0

10

20

30

40

cardiothoracic gastrointestinal neuro other traumasurgery

Rel

ativ

e fr

eque

ncy

in %

Types of surgery

Page 22: R Language definition

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

安装 R 包 获取数据 统计绘图 统计计算 R 语言定义

目录

1 安装 R 包

2 获取数据

3 统计绘图

4 统计计算

5 R 语言定义

Page 23: R Language definition

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

安装 R 包 获取数据 统计绘图 统计计算 R 语言定义

多元概率分布函数

The multivariate t distribution (MVT) is given by

T(a,b,Σ, ν) = 21−ν2

Γ(ν2 )

∫ ∞

0sν−1e−

s22 Φ(

sa√ν,

sb√ν,Σ)ds

multivariate normal distribution function (MVN)

Φ(a,b,Σ) = 1√|Σ|(2π)m

∫ b1

a1

∫ b2

a2

· · ·∫ bm

am

e−12

x⊤Σ−1xdx

x = (x1, x2, . . . , xm)⊤,−∞ ≤ ai ≤ bi ≤ ∞ for all i, and Σ is apositive semi-definite symmetric m × m matrix

Page 24: R Language definition

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

安装 R 包 获取数据 统计绘图 统计计算 R 语言定义

计算多元正态分布的概率

library(mvtnorm)library(matrixcalc)path<-"C:/Users/Xiangyun Huang/Desktop/LanguageDefinitionR/"setwd(path)sigma <- read.csv(file="data/sigma1.csv",header=F, sep=",")mat<-matrix(0,nrow = nrow(sigma),ncol = ncol(sigma))sigma <- as.matrix(sigma)attributes(sigma)<-attributes(mat)# str(sigma)# is.symmetric.matrix(sigma)# is.positive.definite(sigma)m = nrow(sigma)Fn = pmvnorm(lower=rep(-Inf, m), upper=rep(0, m),

mean=rep(0, m), sigma =sigma)Fn

## [1] 6.748008e-29## attr(,"error")## [1] 2.48407e-31## attr(,"msg")## [1] "Normal Completion"

Page 25: R Language definition

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

安装 R 包 获取数据 统计绘图 统计计算 R 语言定义

多元 t 分布分位数计算

n <- c(26, 24, 20, 33, 32)V <- diag(1/n); df <- 130C <- matrix(c(1,1,1,0,0,-1,0,0,1,0,

0,-1,0,0,1,0,0,0,-1,-1,0,0,-1,0,0),ncol=5)

cv <- C %*% V %*% t(C) ## covariance matrixdv <- t(1/sqrt(diag(cv)))cr <- cv * (t(dv) %*% dv) ## correlation matrixdelta <- rep(0,5)Tn<-qmvt(0.95, df = df, delta = delta, corr = cr,

abseps = 0.0001,maxpts = 100000, tail = "both")Tn

## $quantile## [1] 2.560946#### $f.quantile## [1] 1.523691e-07#### attr(,"message")## [1] "Normal Completion"

Page 26: R Language definition

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

安装 R 包 获取数据 统计绘图 统计计算 R 语言定义

目录

1 安装 R 包

2 获取数据

3 统计绘图

4 统计计算

5 R 语言定义

Page 27: R Language definition

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

安装 R 包 获取数据 统计绘图 统计计算 R 语言定义

谈谈对象

class(sigma) #Object Classes

## [1] "matrix"

str(sigma) #Compactly Display the Structure of an Arbitrary R Object

## num [1:83, 1:83] 0.3806 -0.0104 0 0 0 ...

mode(sigma) #The (Storage) Mode of an Object

## [1] "numeric"

typeof(sigma) #The Type of an Object

## [1] "double"

attributes(sigma) #Object Attribute Lists

## $dim## [1] 83 83

Page 28: R Language definition

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

安装 R 包 获取数据 统计绘图 统计计算 R 语言定义

pryr 包撬开 R 的底层

pryr: Useful tools to pry back the covers of R and understandthe language at a deeper level.

c(is.list(sigma),is.matrix(sigma))

## [1] FALSE TRUE

c(is.vector(sigma),is.numeric(sigma))

## [1] FALSE TRUE

c(is.array(sigma),is.data.frame(sigma))

## [1] TRUE FALSE

Page 29: R Language definition

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

安装 R 包 获取数据 统计绘图 统计计算 R 语言定义

撬开 tabulate 函数

tabulate

## function (bin, nbins = max(1L, bin, na.rm = TRUE))## {## if (!is.numeric(bin) && !is.factor(bin))## stop("'bin' must be numeric or a factor")## if (typeof(bin) != "integer")## bin <- as.integer(bin)## if (nbins > .Machine$integer.max)## stop("attempt to make a table with >= 2^31 elements")## nbins <- as.integer(nbins)## if (is.na(nbins))## stop("invalid value of 'nbins'")## .Internal(tabulate(bin, nbins))## }## <bytecode: 0x0000000016e96018>## <environment: namespace:base>

Page 30: R Language definition

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

安装 R 包 获取数据 统计绘图 统计计算 R 语言定义

撬开 tabulate 函数

library(pryr)show_c_source(.Internal(tabulate(bin,nbins)))SEXP attribute_hidden do_tabulate(SEXP call, SEXP op, SEXP args, SEXP rho){

checkArity(op, args);SEXP in = CAR(args), nbin = CADR(args);if (TYPEOF(in) != INTSXP) error("invalid input");R_xlen_t n = XLENGTH(in);/* FIXME: could in principle be a long vector */int nb = asInteger(nbin);if (nb == NA_INTEGER || nb < 0)

error(_("invalid '%s' argument"), "nbin");SEXP ans = allocVector(INTSXP, nb);int *x = INTEGER(in), *y = INTEGER(ans);if (nb) memset(y, 0, nb * sizeof(int));for(R_xlen_t i = 0 ; i < n ; i++)

if (x[i] != NA_INTEGER && x[i] > 0 && x[i] <= nb) y[x[i] - 1]++;return ans;

}

Page 31: R Language definition

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

安装 R 包 获取数据 统计绘图 统计计算 R 语言定义

撬开 plot

methods(plot)

## [1] plot,ANY-method plot,color-method plot.acf*## [4] plot.data.frame* plot.decomposed.ts* plot.default## [7] plot.dendrogram* plot.density* plot.ecdf## [10] plot.factor* plot.formula* plot.function## [13] plot.ggplot* plot.gtable* plot.hclust*## [16] plot.histogram* plot.HoltWinters* plot.isoreg*## [19] plot.lm* plot.medpolish* plot.mlm*## [22] plot.ppr* plot.prcomp* plot.princomp*## [25] plot.profile.nls* plot.raster* plot.spec*## [28] plot.stepfun plot.stl* plot.table*## [31] plot.ts plot.tskernel* plot.TukeyHSD*## see '?methods' for accessing help and source code

Page 32: R Language definition

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

安装 R 包 获取数据 统计绘图 统计计算 R 语言定义

visible from generic isS4plot,ANY-method TRUE graphics plot TRUEplot,color-method TRUE colorspace plot TRUEplot.acf FALSE registered S3method for plot plot FALSEplot.data.frame FALSE registered S3method for plot plot FALSEplot.decomposed.ts FALSE registered S3method for plot plot FALSEplot.default TRUE graphics plot FALSEplot.dendrogram FALSE registered S3method for plot plot FALSEplot.density FALSE registered S3method for plot plot FALSEplot.ecdf TRUE stats plot FALSEplot.factor FALSE registered S3method for plot plot FALSEplot.formula FALSE registered S3method for plot plot FALSEplot.function TRUE graphics plot FALSEplot.ggplot FALSE registered S3method for plot plot FALSEplot.gtable FALSE registered S3method for plot plot FALSEplot.hclust FALSE registered S3method for plot plot FALSEplot.histogram FALSE registered S3method for plot plot FALSEplot.HoltWinters FALSE registered S3method for plot plot FALSEplot.isoreg FALSE registered S3method for plot plot FALSEplot.lm FALSE registered S3method for plot plot FALSEplot.medpolish FALSE registered S3method for plot plot FALSEplot.mlm FALSE registered S3method for plot plot FALSEplot.ppr FALSE registered S3method for plot plot FALSEplot.prcomp FALSE registered S3method for plot plot FALSEplot.princomp FALSE registered S3method for plot plot FALSEplot.profile.nls FALSE registered S3method for plot plot FALSEplot.raster FALSE registered S3method for plot plot FALSEplot.spec FALSE registered S3method for plot plot FALSEplot.stepfun TRUE stats plot FALSEplot.stl FALSE registered S3method for plot plot FALSEplot.table FALSE registered S3method for plot plot FALSEplot.ts TRUE stats plot FALSEplot.tskernel FALSE registered S3method for plot plot FALSEplot.TukeyHSD FALSE registered S3method for plot plot FALSE

Page 33: R Language definition

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

...

.

安装 R 包 获取数据 统计绘图 统计计算 R 语言定义

统计计算环境

> devtools::session_info()$packages

package * version date sourcecolorspace 1.2-6 2015-03-11 CRAN (R 3.2.4)devtools 1.11.0 2016-04-12 CRAN (R 3.2.4)digest 0.6.9 2016-01-08 CRAN (R 3.2.4)evaluate 0.9 2016-04-29 CRAN (R 3.2.5)formatR 1.3 2016-03-05 CRAN (R 3.2.4)ggplot2 * 2.1.0 2016-03-01 CRAN (R 3.2.4)gtable 0.2.0 2016-02-26 CRAN (R 3.2.4)highr 0.5.1 2015-09-18 CRAN (R 3.2.4)knitr * 1.12.27 2016-05-01 Github (yihui/knitr@5a0a689)labeling 0.3 2014-08-23 CRAN (R 3.2.3)magrittr 1.5 2014-11-22 CRAN (R 3.2.3)matrixcalc * 1.0-3 2012-09-15 CRAN (R 3.2.3)memoise 1.0.0 2016-01-29 CRAN (R 3.2.3)munsell 0.4.3 2016-02-13 CRAN (R 3.2.3)mvtnorm * 1.0-5 2016-02-02 CRAN (R 3.2.3)plyr 1.8.3 2015-06-12 CRAN (R 3.2.3)RColorBrewer * 1.1-2 2014-12-07 CRAN (R 3.2.3)Rcpp 0.12.4 2016-03-26 CRAN (R 3.2.4)scales 0.4.0 2016-02-26 CRAN (R 3.2.3)stringi 1.0-1 2015-10-22 CRAN (R 3.2.3)stringr 1.0.0 2015-04-30 CRAN (R 3.2.3)withr 1.0.1 2016-02-04 CRAN (R 3.2.3)