introdução ao r - usp · introdu¸c˜ao ao r ricardo ehlers ehlers@icmc.usp.br departamento de...
Post on 04-Jul-2020
0 Views
Preview:
TRANSCRIPT
Introducao ao R
Ricardo Ehlersehlers@icmc.usp.br
Departamento de Matematica Aplicada e Estatıstica
Universidade de Sao Paulo
Introducao
“A big computer, a complex algorithm and a long time does notequal science.” Robert Gentleman
“Far better an approximate answer to the right question than theexact answer to the wrong question.” John Tukey
1
Alguns periodicos,
• Communications in Statistics - Simulation and Computation
• Computational Statistics
• Computational Statistics & Data Analysis
• Journal of Computational and Graphical Statistics
• Journal of Statistical Computation and Simulation
• Journal of Statistical Software
• The R Journal
• Statistics and Computing
2
O Pacote R
Introducao ao R
Pacote estatistico gratuito e de codigo aberto. Disponivel em,
http://www.r-project.org
para sistemas Unix, Windows e Mac OS X.
• Programavel.
• Excelentes recursos graficos.
• Inumeras ferramentas estatisticas.
• Simulacao de distribuicoes de probabilidade.
• Otimizacao numerica.
• Ajuste de varios modelos padrao (regressao, MLG, etc).
• Roda rotinas Fortran e C pre-compiladas.
• Etc.
3
Pacotes nao incluidos na distribuicao base podem ser instalados.
• Os pacotes disponiveis estao em,http://CRAN.R-project.org/web/packages/
• Topicos especiais,http://CRAN.R-project.org/web/views/
• Varios manuais estao disponiveis em,http://CRAN.R-project.org/manuals.html
4
Alguns Comandos Simples
> x = c(1,2,3,4,5,6)
> x
[1] 1 2 3 4 5 6
> y = c(x,0,0,x,1)
> y
[1] 1 2 3 4 5 6 0 0 1 2 3 4 5 6 1
> z = c(1,3,5,7,9,11)
> 3 * x + z
[1] 4 9 14 19 24 29
> range(x)
[1] 1 65
> length(z)
[1] 6
> seq(from=0, to=10, by=0.5)
[1] 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5
[16] 7.5 8.0 8.5 9.0 9.5 10.0
> seq(from=0, to=10, length=10)
[1] 0.000000 1.111111 2.222222 3.333333 4.444444 5.555556
[8] 7.777778 8.888889 10.000000
> rep(x,times=2)
[1] 1 2 3 4 5 6 1 2 3 4 5 6
> rep(x,each=2)
[1] 1 1 2 2 3 3 4 4 5 5 6 66
> log(x)
[1] 0.0000000 0.6931472 1.0986123 1.3862944 1.6094379 1.7917595
> exp(x)
[1] 2.718282 7.389056 20.085537 54.598150 148.413159 403.428793
> sqrt(x)
[1] 1.000000 1.414214 1.732051 2.000000 2.236068 2.449490
> x > 4
[1] FALSE FALSE FALSE FALSE TRUE TRUE
> ! (x > 4)
[1] TRUE TRUE TRUE TRUE FALSE FALSE
7
> x[x < 4]
[1] 1 2 3
> (x+1)[x < 4]
[1] 2 3 4
8
Representacao Numerica
Em computadores de 32 bits pode-se pensar na seguinterepresentacao para um numero inteiro u,
u =
32∑
i=1
(xi 2i−1)− 231
sendo x = (x1, . . . , x32) um vetor de 0’s e 1’s. Nesta representacao,o maior inteiro possıvel de ser armazenado e obtido fazendo todosxi = 1,
u =32∑
i=1
2i−1 − 231 = 2147483647
e o menor inteiro e obtido fazendo todos xi = 0,
u = −231 = −2147483648.
9
> u <- function(x){
+ i = 1:32
+ aux = x*2^(i-1)
+ k = sum(aux) -2^31
+ return(k)
+ }
> u(rep(0,32))
[1] -2147483648
> u(rep(1,32))
[1] 2147483647
10
No R as caracteristicas numericas da sua maquina estao na variavel.Machine,
> class(.Machine)
[1] "list"
> names(.Machine)
[1] "double.eps" "double.neg.eps" "double.xmin"
[4] "double.xmax" "double.base" "double.digits"
[7] "double.rounding" "double.guard" "double.ulp.digits"
[10] "double.neg.ulp.digits" "double.exponent" "double.min.exp"
[13] "double.max.exp" "integer.max" "sizeof.long"
[16] "sizeof.longlong" "sizeof.longdouble" "sizeof.pointer"
> .Machine$integer.max
[1] 2147483647
> is.integer(.Machine$integer.max)
[1] TRUE
> .Machine$integer.max + as.integer(1)
[1] NA
11
Aritmetica de Ponto Flutuante
Num computador, um numero finito de numeros reais pode serrepresentado. Quais dentre os infinitos numeros reais podem serrepresentados?
(−1)s︸ ︷︷ ︸
sinal
(d0d1d2 . . . dt−1︸ ︷︷ ︸
mantissa
) ( β︸︷︷︸
base
)e
sendo Emin < e < Emax, 0 ≤ di ≤ β − 1 e o numero de digitos t ea precisao.
12
Numeros finitos, infinitos e NaN
> pi / 0
[1] Inf
> is.finite(pi / 0)
[1] FALSE
> is.infinite(pi / 0)
[1] TRUE
> 0 / 0
[1] NaN
> is.nan(0/0)
[1] TRUE 13
Criando Matrizes
> matrix(0,nrow=3,ncol=3)
[,1] [,2] [,3]
[1,] 0 0 0
[2,] 0 0 0
[3,] 0 0 0
> matrix(1:9,nrow=3,ncol=3)
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
> matrix(c(-1,2.2,3,4.1,5,6,7,8.32,9), nrow=3, ncol=3)
[,1] [,2] [,3]
[1,] -1.0 4.1 7.00
[2,] 2.2 5.0 8.32
[3,] 3.0 6.0 9.00
14
> matrix(1:9, nrow=3, ncol=4)
[,1] [,2] [,3] [,4]
[1,] 1 4 7 1
[2,] 2 5 8 2
[3,] 3 6 9 3
> A = matrix(1:9, nrow=3, ncol=3, byrow=TRUE)
> A
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 9
15
> A = matrix(0, nrow=3, ncol=3)
> A[1,1] = 2
> A[1,3] = 4
> A[3,2] = 5
> A[3,3] = 6
> A[2,1] = 3
> A
[,1] [,2] [,3]
[1,] 2 0 4
[2,] 3 0 0
[3,] 0 5 6
16
Operando com matrizes
> nrow(A)
[1] 3
> ncol(A)
[1] 3
> A[1,]
[1] 2 0 4
> A[,2]
[1] 0 0 5
17
> A[1:2,2:3]
[,1] [,2]
[1,] 0 4
[2,] 0 0
> diag(A)
[1] 2 0 6
> diag(diag(A))
[,1] [,2] [,3]
[1,] 2 0 0
[2,] 0 0 0
[3,] 0 0 6
18
Transposta, determinante e inversa de uma matriz
> t(A)
[,1] [,2] [,3]
[1,] 2 3 0
[2,] 0 0 5
[3,] 4 0 6
> det(A)
[1] 60
> solve(A)
[,1] [,2] [,3]
[1,] 0.00 0.3333333 0.0
[2,] -0.30 0.2000000 0.2
[3,] 0.25 -0.1666667 0.0
19
Autovalores e autovetores de uma matriz
> eigen(A)
eigen() decomposition
$values
[1] 7.468906+0.000000i 0.265547+2.821842i 0.265547-2.821842i
$vectors
[,1] [,2] [,3]
[1,] 0.5744240+0i 0.0559305+0.5943459i 0.0559305-0.5943459i
[2,] 0.2307262+0i 0.6318703+0.0000000i 0.6318703+0.0000000i
[3,] 0.7853677+0i -0.4435397-0.2182595i -0.4435397+0.2182595i
20
Operacoes com matrizes
> B = matrix(c(1,2.3,-3,4,5,6.7,7,-8,-9.1),
+ nrow=3, ncol=3)
> B
[,1] [,2] [,3]
[1,] 1.0 4.0 7.0
[2,] 2.3 5.0 -8.0
[3,] -3.0 6.7 -9.1
> A * B
[,1] [,2] [,3]
[1,] 2.0 0.0 28.0
[2,] 6.9 0.0 0.0
[3,] 0.0 33.5 -54.6
21
> A %*% B
[,1] [,2] [,3]
[1,] -10.0 34.8 -22.4
[2,] 3.0 12.0 21.0
[3,] -6.5 65.2 -94.6
> cbind(A,B)
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 2 0 4 1.0 4.0 7.0
[2,] 3 0 0 2.3 5.0 -8.0
[3,] 0 5 6 -3.0 6.7 -9.1
> rbind(A,B)
[,1] [,2] [,3]
[1,] 2.0 0.0 4.0
[2,] 3.0 0.0 0.0
[3,] 0.0 5.0 6.0
[4,] 1.0 4.0 7.0
[5,] 2.3 5.0 -8.0
[6,] -3.0 6.7 -9.1
22
Listas
> lista = list(a=1:5, b=c("x","y","z"), c=c(-1,4,7), d=TRUE)
> names(lista)
[1] "a" "b" "c" "d"
> lista
$a
[1] 1 2 3 4 5
$b
[1] "x" "y" "z"
$c
[1] -1 4 7
$d
[1] TRUE23
Leitura de Dados
> data(package="MASS")
Data sets in package MASS:
Aids2 Australian AIDS Survival Data
Animals Brain and Body Weights for 28 Species
Boston Housing Values in Suburbs of Boston
Cars93 Data from 93 Cars on Sale in the USA in 1993
Cushings Diagnostic Tests on Patients with Cushings Syndrome
DDT DDT in Kale
GAGurine Level of GAG in Urine of Children
Insurance Numbers of Car Insurance claims
Melanoma Survival from Malignant Melanoma
OME Tests of Auditory Perception in Children with
OME
Pima.te Diabetes in Pima Indian Women
Pima.tr Diabetes in Pima Indian Women
Pima.tr2 Diabetes in Pima Indian Women
Rabbit Blood Pressure in Rabbits
Rubber Accelerated Testing of Tyre Rubber
SP500 Returns of the Standard and Poors 500
Sitka Growth Curves for Sitka Spruce Trees in 1988
Sitka89 Growth Curves for Sitka Spruce Trees in 1989
Skye AFM Compositions of Aphyric Skye Lavas
Traffic Effect of Swedish Speed Limits on Accidents
...
24
> library(MASS)
> Animals
body brain
Mountain beaver 1.350 8.1
Cow 465.000 423.0
Grey wolf 36.330 119.5
Goat 27.660 115.0
Guinea pig 1.040 5.5
Dipliodocus 11700.000 50.0
Asian elephant 2547.000 4603.0
Donkey 187.100 419.0
Horse 521.000 655.0
Potar monkey 10.000 115.0
Cat 3.300 25.6
Giraffe 529.000 680.0
25
Gorilla 207.000 406.0
Human 62.000 1320.0
African elephant 6654.000 5712.0
Triceratops 9400.000 70.0
Rhesus monkey 6.800 179.0
Kangaroo 35.000 56.0
Golden hamster 0.120 1.0
Mouse 0.023 0.4
Rabbit 2.500 12.1
Sheep 55.500 175.0
Jaguar 100.000 157.0
Chimpanzee 52.160 440.0
Rat 0.280 1.9
Brachiosaurus 87000.000 154.5
Mole 0.122 3.0
Pig 192.000 180.0
26
> road
deaths drivers popden rural temp fuel
Alabama 968 158 64.0 66.0 62 119.0
Alaska 43 11 0.4 5.9 30 6.2
Arizona 588 91 12.0 33.0 64 65.0
Arkanas 640 92 34.0 73.0 51 74.0
Calif 4743 952 100.0 118.0 65 105.0
Colo 566 109 17.0 73.0 42 78.0
Conn 325 167 518.0 5.1 37 95.0
Dela 118 30 226.0 3.4 41 20.0
DC 115 35 12524.0 0.0 44 23.0
Florida 1545 298 91.0 57.0 67 216.0
Georgia 1302 203 68.0 83.0 54 162.0
Idaho 262 41 8.1 40.0 36 29.0
Ill 2207 544 180.0 102.0 33 350.0
Ind 1410 254 129.0 89.0 37 196.0
Iowa 833 150 49.0 100.0 30 109.0
27
Kansas 669 136 27.0 124.0 42 94.0
Kent 911 147 76.0 65.0 44 104.0
Louis 1037 146 72.0 40.0 65 109.0
Maine 1196 46 31.0 19.0 30 37.0
Maryl 616 157 314.0 29.0 44 113.0
Mass 766 255 655.0 17.0 37 166.0
Mich 2120 403 137.0 95.0 33 306.0
Minn 841 189 43.0 110.0 22 132.0
Miss 648 85 46.0 59.0 57 77.0
Mo 1289 234 63.0 100.0 40 180.0
Mont 259 38 4.6 72.0 29 31.0
28
> help(road)
road package:MASS R Documentation
Road Accident Deaths in US States
Description:
A data frame with the annual deaths in road accidents for half the
US states.
Usage:
road
Format:
Columns are:
"state" name.
"deaths" number of deaths.
"drivers" number of drivers (in 10,000s).
"popden" population density in people per square mile.
"rural" length of rural roads, in 1000s of miles.
"temp" average daily maximum temperature in January.
"fuel" fuel consumption in 10,000,000 US gallons per year.
Source:
Imperial College, London M.Sc. exercise
29
> class(road)
[1] "data.frame"
> rownames(road)
[1] "Alabama" "Alaska" "Arizona" "Arkanas" "Calif" "Colo"
[8] "Dela" "DC" "Florida" "Georgia" "Idaho" "Ill"
[15] "Iowa" "Kansas" "Kent" "Louis" "Maine" "Maryl"
[22] "Mich" "Minn" "Miss" "Mo" "Mont"
> colnames(road)
[1] "deaths" "drivers" "popden" "rural" "temp" "fuel"
> dim(road)
[1] 26 6
30
> transform(road, temp2 = temp**2)
deaths drivers popden rural temp fuel temp2
Alabama 968 158 64.0 66.0 62 119.0 3844
Alaska 43 11 0.4 5.9 30 6.2 900
Arizona 588 91 12.0 33.0 64 65.0 4096
Arkanas 640 92 34.0 73.0 51 74.0 2601
Calif 4743 952 100.0 118.0 65 105.0 4225
...
31
> summary(road)
deaths drivers popden rural
Min. : 43.0 Min. : 11.0 Min. : 0.40 Min. : 0.00
1st Qu.: 571.5 1st Qu.: 86.5 1st Qu.: 31.75 1st Qu.: 30.00
Median : 799.5 Median :148.5 Median : 66.00 Median : 65.50
Mean :1000.7 Mean :191.2 Mean : 595.74 Mean : 60.71
3rd Qu.:1265.8 3rd Qu.:226.2 3rd Qu.: 135.00 3rd Qu.: 93.50
Max. :4743.0 Max. :952.0 Max. :12524.00 Max. :124.00
temp fuel
Min. :22.00 Min. : 6.20
1st Qu.:33.75 1st Qu.: 67.25
Median :41.50 Median :104.50
Mean :43.69 Mean :115.24
3rd Qu.:53.25 3rd Qu.:154.50
Max. :67.00 Max. :350.00
> colMeans(road)
deaths drivers popden rural temp fuel
1000.65385 191.19231 595.73462 60.70769 43.69231 115.23846
32
> ttemp = cut(road$temp, breaks=c(22,35,60,67))
> ttemp
[1] (60,67] (22,35] (60,67] (35,60] (60,67] (35,60] (35,60] (35,60]
[10] (60,67] (35,60] (35,60] (22,35] (35,60] (22,35] (35,60] (35,60]
[19] (22,35] (35,60] (35,60] (22,35] <NA> (35,60] (35,60] (22,35]
Levels: (22,35] (35,60] (60,67]
> levels(ttemp)=c("baixa","media","alta")
> ttemp
[1] alta baixa alta media alta media media media media alta media
[13] baixa media baixa media media alta baixa media media baixa <NA>
[25] media baixa
Levels: baixa media alta
> table(ttemp)
ttemp
baixa media alta
6 14 5
33
Use o comando read.table para ler de arquivo ou URL noformato data.frame.
> write.table(road, file="road.txt")
Use scan para ler do console, de arquivo ou URL.
> x = read.table(file="road.txt",header=TRUE)
34
> x
deaths drivers popden rural temp fuel
Alabama 968 158 64.0 66.0 62 119.0
Alaska 43 11 0.4 5.9 30 6.2
Arizona 588 91 12.0 33.0 64 65.0
Arkanas 640 92 34.0 73.0 51 74.0
Calif 4743 952 100.0 118.0 65 105.0
Colo 566 109 17.0 73.0 42 78.0
Conn 325 167 518.0 5.1 37 95.0
Dela 118 30 226.0 3.4 41 20.0
DC 115 35 12524.0 0.0 44 23.0
Florida 1545 298 91.0 57.0 67 216.0
Georgia 1302 203 68.0 83.0 54 162.0
Idaho 262 41 8.1 40.0 36 29.0
Ill 2207 544 180.0 102.0 33 350.0
Ind 1410 254 129.0 89.0 37 196.0
Iowa 833 150 49.0 100.0 30 109.0
Kansas 669 136 27.0 124.0 42 94.0
Kent 911 147 76.0 65.0 44 104.0
Louis 1037 146 72.0 40.0 65 109.0
Maine 1196 46 31.0 19.0 30 37.0
Maryl 616 157 314.0 29.0 44 113.0
Mass 766 255 655.0 17.0 37 166.0
Mich 2120 403 137.0 95.0 33 306.0
Minn 841 189 43.0 110.0 22 132.0
Miss 648 85 46.0 59.0 57 77.0
Mo 1289 234 63.0 100.0 40 180.0
Mont 259 38 4.6 72.0 29 31.0
35
Programacao
> x = 1
> if (x>5) x else -x
[1] -1
> ifelse(x>5,x,-x)
[1] -1
> x= c(-3.2, 2,3,-4.5,5,6,0)
> log(x)
[1] NaN 0.6931472 1.0986123 NaN 1.6094379 1.7917595
> any(x<=0)
[1] TRUE
36
> if (any(x<=0)) x[x<=0]
[1] -3.2 -4.5 0.0
> which(x<=0)
[1] 1 4 7
> all(x==0)
[1] FALSE
37
> for (i in 1:5) print(i**2)
[1] 1
[1] 4
[1] 9
[1] 16
[1] 25
> i = 1
> while (i<=5) {
+ print(i**2)
+ i = i+1
+ }
[1] 1
[1] 4
[1] 9
[1] 16
[1] 25
38
Porem e bom evitar loops no R.
> args(apply)
function (X, MARGIN, FUN, ...)
NULL
> A
[,1] [,2] [,3]
[1,] 2 0 4
[2,] 3 0 0
[3,] 0 5 6
> apply(A, MARGIN = 1,FUN = mean)
[1] 2.000000 1.000000 3.666667
> apply(A, MARGIN = 2,FUN = sum)
[1] 5 5 10
39
> apply(A, MARGIN = 1,FUN = function(x) ifelse(x>0,log(x),0))
[,1] [,2] [,3]
[1,] 0.6931472 1.098612 0.000000
[2,] 0.0000000 0.000000 1.609438
[3,] 1.3862944 0.000000 1.791759
40
Criando Funcoes
Seja X ∼ Exponencial(2). Sua funcao de densidade deprobabilidade e,
f (x) = 2 exp(−2x)I (x ≥ 0).
> f1 <- function(x) {
+ fx = ifelse(x < 0, 0, 2 * exp(-2 * x))
+ return(fx)
+ }
41
Deixando livre o parametro da distribuicao X ∼ Exponencial(β),
f (x) = β exp(−βx)I (x ≥ 0), β > 0.
> dexp <- function(x,b) {
+ fdp = ifelse(x<=0, 0, b *exp(-b*x))
+ return(fdp)
+ }
42
> f1(x=3)
[1] 0.004957504
> dexp(x=3,b=2)
[1] 0.004957504
> integrate(dexp,-Inf,Inf,b=2)
1 with absolute error < 5e-07
> integrate(f1,0,2)
0.9816844 with absolute error < 1.1e-14
> integrate(f1,2,Inf)
0.01831564 with absolute error < 2.8e-06
43
Graficos
> par(mfrow=c(1,2))
> boxplot(road$deaths , xlab="deaths", border="red")
> boxplot(road$drivers, xlab="drivers", cex.lab=2)
01000
2000
3000
4000
deaths
0200
400
600
800
drivers
44
> par(mfrow=c(1,2))
> boxplot(road$popden , xlab="popden" , outline=FALSE)
> boxplot(road$rural , xlab="rural", horizontal=TRUE)
050
100
150
200
popden
0 40 80 120
rural
45
> par(mfrow=c(1,2))
> boxplot(road$temp, xlab="temp", width=2)
> boxplot(road$fuel, xlab="fuel", notch=TRUE)
30
40
50
60
temp
050
100
150
200
250
300
350
fuel
46
> boxplot(road, outline=FALSE)
deaths drivers popden rural temp fuel
05
00
10
00
15
00
20
00
47
> par(mfrow=c(1,3))
> hist(road[,1],main=colnames(road[1]),xlab="")
> hist(road[,2],main=colnames(road[2]),xlab="",col="grey")
> hist(road[,4],main=colnames(road[4]),xlab="",nclass=15)
deaths
Freq
uenc
y
0 2000 4000
05
1015
drivers
Freq
uenc
y
0 200 600 1000
05
1015
rural
Freq
uenc
y
0 40 80 120
01
23
4
48
> par(mfrow=c(1,3))
> for (i in 1:3) {
+ qqnorm(road[,i],main=colnames(road[i]))
+ qqline(road[,i])
+ }
−2 −1 0 1 2
010
0020
0030
0040
00
deaths
Theoretical Quantiles
Sam
ple
Qua
ntile
s
−2 −1 0 1 2
020
040
060
080
0
drivers
Theoretical Quantiles
Sam
ple
Qua
ntile
s
−2 −1 0 1 2
020
0040
0060
0080
0010
000
1200
0
popden
Theoretical Quantiles
Sam
ple
Qua
ntile
s
49
> head(airquality)
Ozone Solar.R Wind Temp Month Day
1 41 190 7.4 67 5 1
2 36 118 8.0 72 5 2
3 12 149 12.6 74 5 3
4 18 313 11.5 62 5 4
5 NA NA 14.3 56 5 5
6 28 NA 14.9 66 5 6
> names(airquality)
[1] "Ozone" "Solar.R" "Wind" "Temp" "Month" "Day"
> dim(airquality)
[1] 153 6
> attach(airquality, pos=2)
50
> par(mfrow=c(1,2))
> plot(Month,Ozone)
> boxplot(Ozone~Month)
5 6 7 8 9
050
100
150
Month
Ozo
ne
5 6 7 8 9
050
100
150
51
> par(mfrow=c(2,2), mar=c(3,3,1,0), mgp=c(1.7,1,0))
> plot(Temp,Ozone)
> plot(Ozone~Temp, subset = Month <= 6, main="Meses 5 e 6")
> plot(Ozone~Temp, subset = Month==7 | Month==8, main="Meses 7 e 8")
> plot(Ozone~Temp, subset = Month==9, main="Mes 9")
60 70 80 90
050
100
150
Temp
Ozone
60 70 80 900
20
40
60
80
100
Meses 5 e 6
TempO
zone
75 80 85 90 95
50
100
150
Meses 7 e 8
Temp
Ozone
65 70 75 80 85 90
20
40
60
80
Mes 9
Temp
Ozone
52
> par(mfrow=c(2,2), mar=c(3,3,1,0), mgp=c(1.7,1,0))
> plot(Ozone, ty="l", xlab="")
> plot(Solar.R, ty="l", xlab="")
> plot(Wind, ty="l", xlab="")
> plot(Temp, ty="l", xlab="")
0 50 100 150
050
100
150
Ozone
0 50 100 1500
50
100
200
300
Sola
r.R
0 50 100 150
510
15
20
Win
d
0 50 100 150
60
70
80
90
Tem
p
53
> plot(airquality[,1:4])
Ozone
0 100 200 300 60 70 80 90
050
100
150
010
020
030
0
Solar.R
Wind
510
1520
0 50 100 150
6070
8090
5 10 15 20
Temp
54
> library(lattice)
> barley[1:15,]
yield variety year site
1 27.00000 Manchuria 1931 University Farm
2 48.86667 Manchuria 1931 Waseca
3 27.43334 Manchuria 1931 Morris
4 39.93333 Manchuria 1931 Crookston
5 32.96667 Manchuria 1931 Grand Rapids
6 28.96667 Manchuria 1931 Duluth
7 43.06666 Glabron 1931 University Farm
8 55.20000 Glabron 1931 Waseca
9 28.76667 Glabron 1931 Morris
10 38.13333 Glabron 1931 Crookston
11 29.13333 Glabron 1931 Grand Rapids
12 29.66667 Glabron 1931 Duluth
13 35.13333 Svansota 1931 University Farm
14 47.33333 Svansota 1931 Waseca
15 25.76667 Svansota 1931 Morris
55
> table(barley$year,barley$site)
Grand Rapids Duluth University Farm Morris Crookston Waseca
1932 10 10 10 10 10 10
1931 10 10 10 10 10 10
56
> figura = barchart(yield ~ variety | site,
+ data = barley,groups = year,
+ layout = c(1,6),
+ ylab = "Barley Yield (bushels/acre)",
+ scales = list(x = list(abbreviate = TRUE,minlength = 5)))
> plot(figura)B
arl
ey Y
ield
(bu
sh
els
/acre
)
2030405060
Svnst N.462 Mnchr N.475 Velvt Ptlnd Glbrn N.457 WN.38 Trebi
Grand Rapids
2030405060
Duluth
2030405060
University Farm
2030405060
Morris
2030405060
Crookston
2030405060
Waseca
57
Grafico em 3D
> x=seq(-pi,pi,length=50)
> y = x
> f = outer(x,y,function(x,y)cos(y)/(1+x^2))
> persp(x,y,f,theta=30,phi=40)
x
y
f
58
Veja tambem:
CRAN Task View: Graphic Displays, etc.
The R Graph Gallery
R Graphics by Paul Murrell
59
Construindo Tabelas no Latex
> m = lm(deaths ~ temp + popden, data=road)
> summary(m)
Call:
lm(formula = deaths ~ temp + popden, data = road)
Residuals:
Min 1Q Median 3Q Max
-903.5 -519.4 -235.9 231.2 3236.0
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 82.35216 646.30806 0.127 0.900
temp 22.03153 14.16167 1.556 0.133
popden -0.07437 0.07559 -0.984 0.335
Residual standard error: 921.4 on 23 degrees of freedom
Multiple R-squared: 0.1287, Adjusted R-squared: 0.05295
F-statistic: 1.699 on 2 and 23 DF, p-value: 0.205
60
> library(xtable)
> tab = xtable(m,caption="Exemplo de regress~ao.",label="tab1",
+ digits=2)
> print(tab)
Estimate Std. Error t value Pr(>|t|)(Intercept) 82.35 646.31 0.13 0.90
temp 22.03 14.16 1.56 0.13popden -0.07 0.08 -0.98 0.34
Table 1: Exemplo de regressao.
61
\begin{table}[ht]
\begin{center}
\begin{tabular}{rrrrr}
\hline
& Estimate & Std. Error & t value & Pr($>$$|$t$|$) \\
\hline
(Intercept) & 82.35 & 646.31 & 0.13 & 0.90 \\
temp & 22.03 & 14.16 & 1.56 & 0.13 \\
popden & -0.07 & 0.08 & -0.98 & 0.34 \\
\hline
\end{tabular}
\caption{Exemplo de regress~ao linear.}
\label{tab1}
\end{center}
\end{table}
62
Distribuicoes de Probabilidade
Exemplo. Seja X ∼ Binomial(n, θ) com n = 10 e θ = 0.3 entao
P(X = x) = p(x) =
(10
x
)
0.3x(1− 0.3)10−x , x = 0, . . . , 10.
Os comandos abaixo calculam as probabilidades e probabilidadesacumuladas,
> x = 0:10
> px = choose(10, x) * (0.3)^x * (1-0.3)^(10-x)
> fx = dbinom(x,10,0.3)
> Fx = cumsum(px)
63
> cbind(x,px,fx,Fx)
x px fx Fx
[1,] 0 0.0282475249 0.0282475249 0.02824752
[2,] 1 0.1210608210 0.1210608210 0.14930835
[3,] 2 0.2334744405 0.2334744405 0.38278279
[4,] 3 0.2668279320 0.2668279320 0.64961072
[5,] 4 0.2001209490 0.2001209490 0.84973167
[6,] 5 0.1029193452 0.1029193452 0.95265101
[7,] 6 0.0367569090 0.0367569090 0.98940792
[8,] 7 0.0090016920 0.0090016920 0.99840961
[9,] 8 0.0014467005 0.0014467005 0.99985631
[10,] 9 0.0001377810 0.0001377810 0.99999410
[11,] 10 0.0000059049 0.0000059049 1.00000000
64
> par(mfrow=c(1,2))
> plot(x, px, type="h")
> plot(x, Fx, type="s")
0 2 4 6 8 10
0.00
0.05
0.10
0.15
0.20
0.25
x
px
0 2 4 6 8 10
0.0
0.2
0.4
0.6
0.8
1.0
x
Fx
65
Exemplo. Seja X ∼ N(µ, σ2) cuja funcao de densidade e,
f (x) = (2πσ2)−1/2 exp{−0.5 (x−µ)2/σ2}, x ∈ R, µ ∈ R, σ2 > 0.
Podemos criar uma funcao no R com a densidade acima,
> dnormal <- function(x,mu,sigma2){
+ logdens = -0.5*(log(2*pi*sigma2) + (x-mu)^2/sigma2)
+ return(exp(logdens))
+ }
ou usar a funcao pronta dnorm,
> args(dnorm)
function (x, mean = 0, sd = 1, log = FALSE)
NULL
66
> x = seq(-4,4,l=100)
> plot (x,dnorm(x,0,1),xlim=c(-4,3),type="l",
+ ylab=expression(f(x)))
> lines(x,dnorm(x,-1,2), col=2, lty=2)
> lines(x,dnorm(x, 1,1), col=4, lty=4, lwd=2)
> legend(-3.5,0.35,leg=c("N(0,1)","N(-1,4)","N(1,0.25)"),
+ col=c(1,2,4), lty=c(1,2,4), lwd=c(1,1,2), bty="n")
−4 −3 −2 −1 0 1 2 3
0.0
0.1
0.2
0.3
0.4
x
f(x)
N(0,1)
N(−1,4)
N(1,0.25)
67
> plot (x,pnorm(x),xlim=c(-3,3),type="l",ylab=expression(F(x)))
> lines(x,pnorm(x,-1,2),lty=2, lwd=2)
> lines(x,pnorm(x,1,.5),lty=3, lwd=2)
> legend(-2.5,.8,leg=c("N(0,1)","N(-1,4)","N(1,0.25)"),lty=1:3,bty="n")
−3 −2 −1 0 1 2 3
0.0
0.2
0.4
0.6
0.8
1.0
F(x
)
N(0,1)
N(−1,4)
N(1,0.25)
68
top related