introdução ao r - usp · introdu¸c˜ao ao r ricardo ehlers ehlers@icmc.usp.br departamento de...

Post on 04-Jul-2020

0 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Introducao ao R

Ricardo Ehlersehlers@icmc.usp.br

Departamento de Matematica Aplicada e Estatıstica

Universidade de Sao Paulo

Introducao

“A big computer, a complex algorithm and a long time does notequal science.” Robert Gentleman

“Far better an approximate answer to the right question than theexact answer to the wrong question.” John Tukey

1

Alguns periodicos,

• Communications in Statistics - Simulation and Computation

• Computational Statistics

• Computational Statistics & Data Analysis

• Journal of Computational and Graphical Statistics

• Journal of Statistical Computation and Simulation

• Journal of Statistical Software

• The R Journal

• Statistics and Computing

2

O Pacote R

Introducao ao R

Pacote estatistico gratuito e de codigo aberto. Disponivel em,

http://www.r-project.org

para sistemas Unix, Windows e Mac OS X.

• Programavel.

• Excelentes recursos graficos.

• Inumeras ferramentas estatisticas.

• Simulacao de distribuicoes de probabilidade.

• Otimizacao numerica.

• Ajuste de varios modelos padrao (regressao, MLG, etc).

• Roda rotinas Fortran e C pre-compiladas.

• Etc.

3

Pacotes nao incluidos na distribuicao base podem ser instalados.

• Os pacotes disponiveis estao em,http://CRAN.R-project.org/web/packages/

• Topicos especiais,http://CRAN.R-project.org/web/views/

• Varios manuais estao disponiveis em,http://CRAN.R-project.org/manuals.html

4

Alguns Comandos Simples

> x = c(1,2,3,4,5,6)

> x

[1] 1 2 3 4 5 6

> y = c(x,0,0,x,1)

> y

[1] 1 2 3 4 5 6 0 0 1 2 3 4 5 6 1

> z = c(1,3,5,7,9,11)

> 3 * x + z

[1] 4 9 14 19 24 29

> range(x)

[1] 1 65

> length(z)

[1] 6

> seq(from=0, to=10, by=0.5)

[1] 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5

[16] 7.5 8.0 8.5 9.0 9.5 10.0

> seq(from=0, to=10, length=10)

[1] 0.000000 1.111111 2.222222 3.333333 4.444444 5.555556

[8] 7.777778 8.888889 10.000000

> rep(x,times=2)

[1] 1 2 3 4 5 6 1 2 3 4 5 6

> rep(x,each=2)

[1] 1 1 2 2 3 3 4 4 5 5 6 66

> log(x)

[1] 0.0000000 0.6931472 1.0986123 1.3862944 1.6094379 1.7917595

> exp(x)

[1] 2.718282 7.389056 20.085537 54.598150 148.413159 403.428793

> sqrt(x)

[1] 1.000000 1.414214 1.732051 2.000000 2.236068 2.449490

> x > 4

[1] FALSE FALSE FALSE FALSE TRUE TRUE

> ! (x > 4)

[1] TRUE TRUE TRUE TRUE FALSE FALSE

7

> x[x < 4]

[1] 1 2 3

> (x+1)[x < 4]

[1] 2 3 4

8

Representacao Numerica

Em computadores de 32 bits pode-se pensar na seguinterepresentacao para um numero inteiro u,

u =

32∑

i=1

(xi 2i−1)− 231

sendo x = (x1, . . . , x32) um vetor de 0’s e 1’s. Nesta representacao,o maior inteiro possıvel de ser armazenado e obtido fazendo todosxi = 1,

u =32∑

i=1

2i−1 − 231 = 2147483647

e o menor inteiro e obtido fazendo todos xi = 0,

u = −231 = −2147483648.

9

> u <- function(x){

+ i = 1:32

+ aux = x*2^(i-1)

+ k = sum(aux) -2^31

+ return(k)

+ }

> u(rep(0,32))

[1] -2147483648

> u(rep(1,32))

[1] 2147483647

10

No R as caracteristicas numericas da sua maquina estao na variavel.Machine,

> class(.Machine)

[1] "list"

> names(.Machine)

[1] "double.eps" "double.neg.eps" "double.xmin"

[4] "double.xmax" "double.base" "double.digits"

[7] "double.rounding" "double.guard" "double.ulp.digits"

[10] "double.neg.ulp.digits" "double.exponent" "double.min.exp"

[13] "double.max.exp" "integer.max" "sizeof.long"

[16] "sizeof.longlong" "sizeof.longdouble" "sizeof.pointer"

> .Machine$integer.max

[1] 2147483647

> is.integer(.Machine$integer.max)

[1] TRUE

> .Machine$integer.max + as.integer(1)

[1] NA

11

Aritmetica de Ponto Flutuante

Num computador, um numero finito de numeros reais pode serrepresentado. Quais dentre os infinitos numeros reais podem serrepresentados?

(−1)s︸ ︷︷ ︸

sinal

(d0d1d2 . . . dt−1︸ ︷︷ ︸

mantissa

) ( β︸︷︷︸

base

)e

sendo Emin < e < Emax, 0 ≤ di ≤ β − 1 e o numero de digitos t ea precisao.

12

Numeros finitos, infinitos e NaN

> pi / 0

[1] Inf

> is.finite(pi / 0)

[1] FALSE

> is.infinite(pi / 0)

[1] TRUE

> 0 / 0

[1] NaN

> is.nan(0/0)

[1] TRUE 13

Criando Matrizes

> matrix(0,nrow=3,ncol=3)

[,1] [,2] [,3]

[1,] 0 0 0

[2,] 0 0 0

[3,] 0 0 0

> matrix(1:9,nrow=3,ncol=3)

[,1] [,2] [,3]

[1,] 1 4 7

[2,] 2 5 8

[3,] 3 6 9

> matrix(c(-1,2.2,3,4.1,5,6,7,8.32,9), nrow=3, ncol=3)

[,1] [,2] [,3]

[1,] -1.0 4.1 7.00

[2,] 2.2 5.0 8.32

[3,] 3.0 6.0 9.00

14

> matrix(1:9, nrow=3, ncol=4)

[,1] [,2] [,3] [,4]

[1,] 1 4 7 1

[2,] 2 5 8 2

[3,] 3 6 9 3

> A = matrix(1:9, nrow=3, ncol=3, byrow=TRUE)

> A

[,1] [,2] [,3]

[1,] 1 2 3

[2,] 4 5 6

[3,] 7 8 9

15

> A = matrix(0, nrow=3, ncol=3)

> A[1,1] = 2

> A[1,3] = 4

> A[3,2] = 5

> A[3,3] = 6

> A[2,1] = 3

> A

[,1] [,2] [,3]

[1,] 2 0 4

[2,] 3 0 0

[3,] 0 5 6

16

Operando com matrizes

> nrow(A)

[1] 3

> ncol(A)

[1] 3

> A[1,]

[1] 2 0 4

> A[,2]

[1] 0 0 5

17

> A[1:2,2:3]

[,1] [,2]

[1,] 0 4

[2,] 0 0

> diag(A)

[1] 2 0 6

> diag(diag(A))

[,1] [,2] [,3]

[1,] 2 0 0

[2,] 0 0 0

[3,] 0 0 6

18

Transposta, determinante e inversa de uma matriz

> t(A)

[,1] [,2] [,3]

[1,] 2 3 0

[2,] 0 0 5

[3,] 4 0 6

> det(A)

[1] 60

> solve(A)

[,1] [,2] [,3]

[1,] 0.00 0.3333333 0.0

[2,] -0.30 0.2000000 0.2

[3,] 0.25 -0.1666667 0.0

19

Autovalores e autovetores de uma matriz

> eigen(A)

eigen() decomposition

$values

[1] 7.468906+0.000000i 0.265547+2.821842i 0.265547-2.821842i

$vectors

[,1] [,2] [,3]

[1,] 0.5744240+0i 0.0559305+0.5943459i 0.0559305-0.5943459i

[2,] 0.2307262+0i 0.6318703+0.0000000i 0.6318703+0.0000000i

[3,] 0.7853677+0i -0.4435397-0.2182595i -0.4435397+0.2182595i

20

Operacoes com matrizes

> B = matrix(c(1,2.3,-3,4,5,6.7,7,-8,-9.1),

+ nrow=3, ncol=3)

> B

[,1] [,2] [,3]

[1,] 1.0 4.0 7.0

[2,] 2.3 5.0 -8.0

[3,] -3.0 6.7 -9.1

> A * B

[,1] [,2] [,3]

[1,] 2.0 0.0 28.0

[2,] 6.9 0.0 0.0

[3,] 0.0 33.5 -54.6

21

> A %*% B

[,1] [,2] [,3]

[1,] -10.0 34.8 -22.4

[2,] 3.0 12.0 21.0

[3,] -6.5 65.2 -94.6

> cbind(A,B)

[,1] [,2] [,3] [,4] [,5] [,6]

[1,] 2 0 4 1.0 4.0 7.0

[2,] 3 0 0 2.3 5.0 -8.0

[3,] 0 5 6 -3.0 6.7 -9.1

> rbind(A,B)

[,1] [,2] [,3]

[1,] 2.0 0.0 4.0

[2,] 3.0 0.0 0.0

[3,] 0.0 5.0 6.0

[4,] 1.0 4.0 7.0

[5,] 2.3 5.0 -8.0

[6,] -3.0 6.7 -9.1

22

Listas

> lista = list(a=1:5, b=c("x","y","z"), c=c(-1,4,7), d=TRUE)

> names(lista)

[1] "a" "b" "c" "d"

> lista

$a

[1] 1 2 3 4 5

$b

[1] "x" "y" "z"

$c

[1] -1 4 7

$d

[1] TRUE23

Leitura de Dados

> data(package="MASS")

Data sets in package MASS:

Aids2 Australian AIDS Survival Data

Animals Brain and Body Weights for 28 Species

Boston Housing Values in Suburbs of Boston

Cars93 Data from 93 Cars on Sale in the USA in 1993

Cushings Diagnostic Tests on Patients with Cushings Syndrome

DDT DDT in Kale

GAGurine Level of GAG in Urine of Children

Insurance Numbers of Car Insurance claims

Melanoma Survival from Malignant Melanoma

OME Tests of Auditory Perception in Children with

OME

Pima.te Diabetes in Pima Indian Women

Pima.tr Diabetes in Pima Indian Women

Pima.tr2 Diabetes in Pima Indian Women

Rabbit Blood Pressure in Rabbits

Rubber Accelerated Testing of Tyre Rubber

SP500 Returns of the Standard and Poors 500

Sitka Growth Curves for Sitka Spruce Trees in 1988

Sitka89 Growth Curves for Sitka Spruce Trees in 1989

Skye AFM Compositions of Aphyric Skye Lavas

Traffic Effect of Swedish Speed Limits on Accidents

...

24

> library(MASS)

> Animals

body brain

Mountain beaver 1.350 8.1

Cow 465.000 423.0

Grey wolf 36.330 119.5

Goat 27.660 115.0

Guinea pig 1.040 5.5

Dipliodocus 11700.000 50.0

Asian elephant 2547.000 4603.0

Donkey 187.100 419.0

Horse 521.000 655.0

Potar monkey 10.000 115.0

Cat 3.300 25.6

Giraffe 529.000 680.0

25

Gorilla 207.000 406.0

Human 62.000 1320.0

African elephant 6654.000 5712.0

Triceratops 9400.000 70.0

Rhesus monkey 6.800 179.0

Kangaroo 35.000 56.0

Golden hamster 0.120 1.0

Mouse 0.023 0.4

Rabbit 2.500 12.1

Sheep 55.500 175.0

Jaguar 100.000 157.0

Chimpanzee 52.160 440.0

Rat 0.280 1.9

Brachiosaurus 87000.000 154.5

Mole 0.122 3.0

Pig 192.000 180.0

26

> road

deaths drivers popden rural temp fuel

Alabama 968 158 64.0 66.0 62 119.0

Alaska 43 11 0.4 5.9 30 6.2

Arizona 588 91 12.0 33.0 64 65.0

Arkanas 640 92 34.0 73.0 51 74.0

Calif 4743 952 100.0 118.0 65 105.0

Colo 566 109 17.0 73.0 42 78.0

Conn 325 167 518.0 5.1 37 95.0

Dela 118 30 226.0 3.4 41 20.0

DC 115 35 12524.0 0.0 44 23.0

Florida 1545 298 91.0 57.0 67 216.0

Georgia 1302 203 68.0 83.0 54 162.0

Idaho 262 41 8.1 40.0 36 29.0

Ill 2207 544 180.0 102.0 33 350.0

Ind 1410 254 129.0 89.0 37 196.0

Iowa 833 150 49.0 100.0 30 109.0

27

Kansas 669 136 27.0 124.0 42 94.0

Kent 911 147 76.0 65.0 44 104.0

Louis 1037 146 72.0 40.0 65 109.0

Maine 1196 46 31.0 19.0 30 37.0

Maryl 616 157 314.0 29.0 44 113.0

Mass 766 255 655.0 17.0 37 166.0

Mich 2120 403 137.0 95.0 33 306.0

Minn 841 189 43.0 110.0 22 132.0

Miss 648 85 46.0 59.0 57 77.0

Mo 1289 234 63.0 100.0 40 180.0

Mont 259 38 4.6 72.0 29 31.0

28

> help(road)

road package:MASS R Documentation

Road Accident Deaths in US States

Description:

A data frame with the annual deaths in road accidents for half the

US states.

Usage:

road

Format:

Columns are:

"state" name.

"deaths" number of deaths.

"drivers" number of drivers (in 10,000s).

"popden" population density in people per square mile.

"rural" length of rural roads, in 1000s of miles.

"temp" average daily maximum temperature in January.

"fuel" fuel consumption in 10,000,000 US gallons per year.

Source:

Imperial College, London M.Sc. exercise

29

> class(road)

[1] "data.frame"

> rownames(road)

[1] "Alabama" "Alaska" "Arizona" "Arkanas" "Calif" "Colo"

[8] "Dela" "DC" "Florida" "Georgia" "Idaho" "Ill"

[15] "Iowa" "Kansas" "Kent" "Louis" "Maine" "Maryl"

[22] "Mich" "Minn" "Miss" "Mo" "Mont"

> colnames(road)

[1] "deaths" "drivers" "popden" "rural" "temp" "fuel"

> dim(road)

[1] 26 6

30

> transform(road, temp2 = temp**2)

deaths drivers popden rural temp fuel temp2

Alabama 968 158 64.0 66.0 62 119.0 3844

Alaska 43 11 0.4 5.9 30 6.2 900

Arizona 588 91 12.0 33.0 64 65.0 4096

Arkanas 640 92 34.0 73.0 51 74.0 2601

Calif 4743 952 100.0 118.0 65 105.0 4225

...

31

> summary(road)

deaths drivers popden rural

Min. : 43.0 Min. : 11.0 Min. : 0.40 Min. : 0.00

1st Qu.: 571.5 1st Qu.: 86.5 1st Qu.: 31.75 1st Qu.: 30.00

Median : 799.5 Median :148.5 Median : 66.00 Median : 65.50

Mean :1000.7 Mean :191.2 Mean : 595.74 Mean : 60.71

3rd Qu.:1265.8 3rd Qu.:226.2 3rd Qu.: 135.00 3rd Qu.: 93.50

Max. :4743.0 Max. :952.0 Max. :12524.00 Max. :124.00

temp fuel

Min. :22.00 Min. : 6.20

1st Qu.:33.75 1st Qu.: 67.25

Median :41.50 Median :104.50

Mean :43.69 Mean :115.24

3rd Qu.:53.25 3rd Qu.:154.50

Max. :67.00 Max. :350.00

> colMeans(road)

deaths drivers popden rural temp fuel

1000.65385 191.19231 595.73462 60.70769 43.69231 115.23846

32

> ttemp = cut(road$temp, breaks=c(22,35,60,67))

> ttemp

[1] (60,67] (22,35] (60,67] (35,60] (60,67] (35,60] (35,60] (35,60]

[10] (60,67] (35,60] (35,60] (22,35] (35,60] (22,35] (35,60] (35,60]

[19] (22,35] (35,60] (35,60] (22,35] <NA> (35,60] (35,60] (22,35]

Levels: (22,35] (35,60] (60,67]

> levels(ttemp)=c("baixa","media","alta")

> ttemp

[1] alta baixa alta media alta media media media media alta media

[13] baixa media baixa media media alta baixa media media baixa <NA>

[25] media baixa

Levels: baixa media alta

> table(ttemp)

ttemp

baixa media alta

6 14 5

33

Use o comando read.table para ler de arquivo ou URL noformato data.frame.

> write.table(road, file="road.txt")

Use scan para ler do console, de arquivo ou URL.

> x = read.table(file="road.txt",header=TRUE)

34

> x

deaths drivers popden rural temp fuel

Alabama 968 158 64.0 66.0 62 119.0

Alaska 43 11 0.4 5.9 30 6.2

Arizona 588 91 12.0 33.0 64 65.0

Arkanas 640 92 34.0 73.0 51 74.0

Calif 4743 952 100.0 118.0 65 105.0

Colo 566 109 17.0 73.0 42 78.0

Conn 325 167 518.0 5.1 37 95.0

Dela 118 30 226.0 3.4 41 20.0

DC 115 35 12524.0 0.0 44 23.0

Florida 1545 298 91.0 57.0 67 216.0

Georgia 1302 203 68.0 83.0 54 162.0

Idaho 262 41 8.1 40.0 36 29.0

Ill 2207 544 180.0 102.0 33 350.0

Ind 1410 254 129.0 89.0 37 196.0

Iowa 833 150 49.0 100.0 30 109.0

Kansas 669 136 27.0 124.0 42 94.0

Kent 911 147 76.0 65.0 44 104.0

Louis 1037 146 72.0 40.0 65 109.0

Maine 1196 46 31.0 19.0 30 37.0

Maryl 616 157 314.0 29.0 44 113.0

Mass 766 255 655.0 17.0 37 166.0

Mich 2120 403 137.0 95.0 33 306.0

Minn 841 189 43.0 110.0 22 132.0

Miss 648 85 46.0 59.0 57 77.0

Mo 1289 234 63.0 100.0 40 180.0

Mont 259 38 4.6 72.0 29 31.0

35

Programacao

> x = 1

> if (x>5) x else -x

[1] -1

> ifelse(x>5,x,-x)

[1] -1

> x= c(-3.2, 2,3,-4.5,5,6,0)

> log(x)

[1] NaN 0.6931472 1.0986123 NaN 1.6094379 1.7917595

> any(x<=0)

[1] TRUE

36

> if (any(x<=0)) x[x<=0]

[1] -3.2 -4.5 0.0

> which(x<=0)

[1] 1 4 7

> all(x==0)

[1] FALSE

37

> for (i in 1:5) print(i**2)

[1] 1

[1] 4

[1] 9

[1] 16

[1] 25

> i = 1

> while (i<=5) {

+ print(i**2)

+ i = i+1

+ }

[1] 1

[1] 4

[1] 9

[1] 16

[1] 25

38

Porem e bom evitar loops no R.

> args(apply)

function (X, MARGIN, FUN, ...)

NULL

> A

[,1] [,2] [,3]

[1,] 2 0 4

[2,] 3 0 0

[3,] 0 5 6

> apply(A, MARGIN = 1,FUN = mean)

[1] 2.000000 1.000000 3.666667

> apply(A, MARGIN = 2,FUN = sum)

[1] 5 5 10

39

> apply(A, MARGIN = 1,FUN = function(x) ifelse(x>0,log(x),0))

[,1] [,2] [,3]

[1,] 0.6931472 1.098612 0.000000

[2,] 0.0000000 0.000000 1.609438

[3,] 1.3862944 0.000000 1.791759

40

Criando Funcoes

Seja X ∼ Exponencial(2). Sua funcao de densidade deprobabilidade e,

f (x) = 2 exp(−2x)I (x ≥ 0).

> f1 <- function(x) {

+ fx = ifelse(x < 0, 0, 2 * exp(-2 * x))

+ return(fx)

+ }

41

Deixando livre o parametro da distribuicao X ∼ Exponencial(β),

f (x) = β exp(−βx)I (x ≥ 0), β > 0.

> dexp <- function(x,b) {

+ fdp = ifelse(x<=0, 0, b *exp(-b*x))

+ return(fdp)

+ }

42

> f1(x=3)

[1] 0.004957504

> dexp(x=3,b=2)

[1] 0.004957504

> integrate(dexp,-Inf,Inf,b=2)

1 with absolute error < 5e-07

> integrate(f1,0,2)

0.9816844 with absolute error < 1.1e-14

> integrate(f1,2,Inf)

0.01831564 with absolute error < 2.8e-06

43

Graficos

> par(mfrow=c(1,2))

> boxplot(road$deaths , xlab="deaths", border="red")

> boxplot(road$drivers, xlab="drivers", cex.lab=2)

01000

2000

3000

4000

deaths

0200

400

600

800

drivers

44

> par(mfrow=c(1,2))

> boxplot(road$popden , xlab="popden" , outline=FALSE)

> boxplot(road$rural , xlab="rural", horizontal=TRUE)

050

100

150

200

popden

0 40 80 120

rural

45

> par(mfrow=c(1,2))

> boxplot(road$temp, xlab="temp", width=2)

> boxplot(road$fuel, xlab="fuel", notch=TRUE)

30

40

50

60

temp

050

100

150

200

250

300

350

fuel

46

> boxplot(road, outline=FALSE)

deaths drivers popden rural temp fuel

05

00

10

00

15

00

20

00

47

> par(mfrow=c(1,3))

> hist(road[,1],main=colnames(road[1]),xlab="")

> hist(road[,2],main=colnames(road[2]),xlab="",col="grey")

> hist(road[,4],main=colnames(road[4]),xlab="",nclass=15)

deaths

Freq

uenc

y

0 2000 4000

05

1015

drivers

Freq

uenc

y

0 200 600 1000

05

1015

rural

Freq

uenc

y

0 40 80 120

01

23

4

48

> par(mfrow=c(1,3))

> for (i in 1:3) {

+ qqnorm(road[,i],main=colnames(road[i]))

+ qqline(road[,i])

+ }

−2 −1 0 1 2

010

0020

0030

0040

00

deaths

Theoretical Quantiles

Sam

ple

Qua

ntile

s

−2 −1 0 1 2

020

040

060

080

0

drivers

Theoretical Quantiles

Sam

ple

Qua

ntile

s

−2 −1 0 1 2

020

0040

0060

0080

0010

000

1200

0

popden

Theoretical Quantiles

Sam

ple

Qua

ntile

s

49

> head(airquality)

Ozone Solar.R Wind Temp Month Day

1 41 190 7.4 67 5 1

2 36 118 8.0 72 5 2

3 12 149 12.6 74 5 3

4 18 313 11.5 62 5 4

5 NA NA 14.3 56 5 5

6 28 NA 14.9 66 5 6

> names(airquality)

[1] "Ozone" "Solar.R" "Wind" "Temp" "Month" "Day"

> dim(airquality)

[1] 153 6

> attach(airquality, pos=2)

50

> par(mfrow=c(1,2))

> plot(Month,Ozone)

> boxplot(Ozone~Month)

5 6 7 8 9

050

100

150

Month

Ozo

ne

5 6 7 8 9

050

100

150

51

> par(mfrow=c(2,2), mar=c(3,3,1,0), mgp=c(1.7,1,0))

> plot(Temp,Ozone)

> plot(Ozone~Temp, subset = Month <= 6, main="Meses 5 e 6")

> plot(Ozone~Temp, subset = Month==7 | Month==8, main="Meses 7 e 8")

> plot(Ozone~Temp, subset = Month==9, main="Mes 9")

60 70 80 90

050

100

150

Temp

Ozone

60 70 80 900

20

40

60

80

100

Meses 5 e 6

TempO

zone

75 80 85 90 95

50

100

150

Meses 7 e 8

Temp

Ozone

65 70 75 80 85 90

20

40

60

80

Mes 9

Temp

Ozone

52

> par(mfrow=c(2,2), mar=c(3,3,1,0), mgp=c(1.7,1,0))

> plot(Ozone, ty="l", xlab="")

> plot(Solar.R, ty="l", xlab="")

> plot(Wind, ty="l", xlab="")

> plot(Temp, ty="l", xlab="")

0 50 100 150

050

100

150

Ozone

0 50 100 1500

50

100

200

300

Sola

r.R

0 50 100 150

510

15

20

Win

d

0 50 100 150

60

70

80

90

Tem

p

53

> plot(airquality[,1:4])

Ozone

0 100 200 300 60 70 80 90

050

100

150

010

020

030

0

Solar.R

Wind

510

1520

0 50 100 150

6070

8090

5 10 15 20

Temp

54

> library(lattice)

> barley[1:15,]

yield variety year site

1 27.00000 Manchuria 1931 University Farm

2 48.86667 Manchuria 1931 Waseca

3 27.43334 Manchuria 1931 Morris

4 39.93333 Manchuria 1931 Crookston

5 32.96667 Manchuria 1931 Grand Rapids

6 28.96667 Manchuria 1931 Duluth

7 43.06666 Glabron 1931 University Farm

8 55.20000 Glabron 1931 Waseca

9 28.76667 Glabron 1931 Morris

10 38.13333 Glabron 1931 Crookston

11 29.13333 Glabron 1931 Grand Rapids

12 29.66667 Glabron 1931 Duluth

13 35.13333 Svansota 1931 University Farm

14 47.33333 Svansota 1931 Waseca

15 25.76667 Svansota 1931 Morris

55

> table(barley$year,barley$site)

Grand Rapids Duluth University Farm Morris Crookston Waseca

1932 10 10 10 10 10 10

1931 10 10 10 10 10 10

56

> figura = barchart(yield ~ variety | site,

+ data = barley,groups = year,

+ layout = c(1,6),

+ ylab = "Barley Yield (bushels/acre)",

+ scales = list(x = list(abbreviate = TRUE,minlength = 5)))

> plot(figura)B

arl

ey Y

ield

(bu

sh

els

/acre

)

2030405060

Svnst N.462 Mnchr N.475 Velvt Ptlnd Glbrn N.457 WN.38 Trebi

Grand Rapids

2030405060

Duluth

2030405060

University Farm

2030405060

Morris

2030405060

Crookston

2030405060

Waseca

57

Grafico em 3D

> x=seq(-pi,pi,length=50)

> y = x

> f = outer(x,y,function(x,y)cos(y)/(1+x^2))

> persp(x,y,f,theta=30,phi=40)

x

y

f

58

Veja tambem:

CRAN Task View: Graphic Displays, etc.

The R Graph Gallery

R Graphics by Paul Murrell

59

Construindo Tabelas no Latex

> m = lm(deaths ~ temp + popden, data=road)

> summary(m)

Call:

lm(formula = deaths ~ temp + popden, data = road)

Residuals:

Min 1Q Median 3Q Max

-903.5 -519.4 -235.9 231.2 3236.0

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 82.35216 646.30806 0.127 0.900

temp 22.03153 14.16167 1.556 0.133

popden -0.07437 0.07559 -0.984 0.335

Residual standard error: 921.4 on 23 degrees of freedom

Multiple R-squared: 0.1287, Adjusted R-squared: 0.05295

F-statistic: 1.699 on 2 and 23 DF, p-value: 0.205

60

> library(xtable)

> tab = xtable(m,caption="Exemplo de regress~ao.",label="tab1",

+ digits=2)

> print(tab)

Estimate Std. Error t value Pr(>|t|)(Intercept) 82.35 646.31 0.13 0.90

temp 22.03 14.16 1.56 0.13popden -0.07 0.08 -0.98 0.34

Table 1: Exemplo de regressao.

61

\begin{table}[ht]

\begin{center}

\begin{tabular}{rrrrr}

\hline

& Estimate & Std. Error & t value & Pr($>$$|$t$|$) \\

\hline

(Intercept) & 82.35 & 646.31 & 0.13 & 0.90 \\

temp & 22.03 & 14.16 & 1.56 & 0.13 \\

popden & -0.07 & 0.08 & -0.98 & 0.34 \\

\hline

\end{tabular}

\caption{Exemplo de regress~ao linear.}

\label{tab1}

\end{center}

\end{table}

62

Distribuicoes de Probabilidade

Exemplo. Seja X ∼ Binomial(n, θ) com n = 10 e θ = 0.3 entao

P(X = x) = p(x) =

(10

x

)

0.3x(1− 0.3)10−x , x = 0, . . . , 10.

Os comandos abaixo calculam as probabilidades e probabilidadesacumuladas,

> x = 0:10

> px = choose(10, x) * (0.3)^x * (1-0.3)^(10-x)

> fx = dbinom(x,10,0.3)

> Fx = cumsum(px)

63

> cbind(x,px,fx,Fx)

x px fx Fx

[1,] 0 0.0282475249 0.0282475249 0.02824752

[2,] 1 0.1210608210 0.1210608210 0.14930835

[3,] 2 0.2334744405 0.2334744405 0.38278279

[4,] 3 0.2668279320 0.2668279320 0.64961072

[5,] 4 0.2001209490 0.2001209490 0.84973167

[6,] 5 0.1029193452 0.1029193452 0.95265101

[7,] 6 0.0367569090 0.0367569090 0.98940792

[8,] 7 0.0090016920 0.0090016920 0.99840961

[9,] 8 0.0014467005 0.0014467005 0.99985631

[10,] 9 0.0001377810 0.0001377810 0.99999410

[11,] 10 0.0000059049 0.0000059049 1.00000000

64

> par(mfrow=c(1,2))

> plot(x, px, type="h")

> plot(x, Fx, type="s")

0 2 4 6 8 10

0.00

0.05

0.10

0.15

0.20

0.25

x

px

0 2 4 6 8 10

0.0

0.2

0.4

0.6

0.8

1.0

x

Fx

65

Exemplo. Seja X ∼ N(µ, σ2) cuja funcao de densidade e,

f (x) = (2πσ2)−1/2 exp{−0.5 (x−µ)2/σ2}, x ∈ R, µ ∈ R, σ2 > 0.

Podemos criar uma funcao no R com a densidade acima,

> dnormal <- function(x,mu,sigma2){

+ logdens = -0.5*(log(2*pi*sigma2) + (x-mu)^2/sigma2)

+ return(exp(logdens))

+ }

ou usar a funcao pronta dnorm,

> args(dnorm)

function (x, mean = 0, sd = 1, log = FALSE)

NULL

66

> x = seq(-4,4,l=100)

> plot (x,dnorm(x,0,1),xlim=c(-4,3),type="l",

+ ylab=expression(f(x)))

> lines(x,dnorm(x,-1,2), col=2, lty=2)

> lines(x,dnorm(x, 1,1), col=4, lty=4, lwd=2)

> legend(-3.5,0.35,leg=c("N(0,1)","N(-1,4)","N(1,0.25)"),

+ col=c(1,2,4), lty=c(1,2,4), lwd=c(1,1,2), bty="n")

−4 −3 −2 −1 0 1 2 3

0.0

0.1

0.2

0.3

0.4

x

f(x)

N(0,1)

N(−1,4)

N(1,0.25)

67

> plot (x,pnorm(x),xlim=c(-3,3),type="l",ylab=expression(F(x)))

> lines(x,pnorm(x,-1,2),lty=2, lwd=2)

> lines(x,pnorm(x,1,.5),lty=3, lwd=2)

> legend(-2.5,.8,leg=c("N(0,1)","N(-1,4)","N(1,0.25)"),lty=1:3,bty="n")

−3 −2 −1 0 1 2 3

0.0

0.2

0.4

0.6

0.8

1.0

F(x

)

N(0,1)

N(−1,4)

N(1,0.25)

68

top related