-
Notifications
You must be signed in to change notification settings - Fork 11
r_keyuxi
James Chen edited this page Apr 19, 2018
·
1 revision
#R notes
Author: 18-Yuxi Ke
以正态分布为例,
- dnorm() : p.d.f.
- pnorm(): c.d.f.
- qnorm(): quantile, 分位点
- rnorm(): 根据分布生成随机数
例子:
dnorm(x, mean=0, sd=1, log = FALSE)# default
pnorm(q, mean=0, sd=1, lower.tail = TRUE, log.p = FALSE)
qnorm(p, mean=0, sd=1, lower.tail = TRUE, log.p = FALSE)
rnorm(n, mean=0, sd=1)
t.test
var.test
wilcox.test# Wilcoxon signed rank test
binom.test
prop.test
fisher.test# Fisher's exact test (2 binomial distributions)
chisq.test
- Sign Test: 检验中位数,可以使用’binom.test‘:
#sample size = 200, H1: m < 0.5
binom.test( sum(x>0), 200, 0.5, alternative = "1" )
- Signed Rank Test: 检验样本是否关于一个点对称.
Wilcoxon signed rank statistic:
wilcox.test(x, y, "t", paired = "T")
- Rank sum test: 双样本
Rankpool两组样本之后排序,求统计量,并分别求和
Mann-Whitney统计量:$$\sum_i\sum_j \psi (Y_j - X_i)$$
- t test
- Sign test (binom.test)
- Wilcoxon signed rank test
Populations
- Normal
- Unform
- Cauchy
- Laplace
# 1. cutoff points
alpha<-0.05
qt(alpha,19,lower.tail=FALSE)#自由度为19的t test
qbinom(alpha,20,0.5,lower.tail=FALSE)
qsignrank(alpha,20,lower.tail=FALSE)
# declare
m0<-0
n_power<-matrix(data=0,nrow=151,ncol=3)
u_power<-matrix(data=0,nrow=151,ncol=3)
c_power<-matrix(data=0,nrow=151,ncol=3)
l_power<-matrix(data=0,nrow=151,ncol=3)
sample<-matrix(NA,nrow=4,ncol=20)
reject<-matrix(data=0,nrow=4,ncol=3)
# cutoff points
alpha<-0.05
t_cut<-qt(alpha,19,lower.tail=FALSE)
b_cut<-qbinom(alpha,20,0.5,lower.tail=FALSE)
w_cut<-qsignrank(alpha,20,lower.tail=FALSE)
# simulate
for (i in 0:150)
{
m<-3*i/150
for ( rtrial in 1:1000 )
{
# generate random samples
sample[1,]<-rnorm(20,mean=m,sd=1)
sample[2,]<-runif(20,min=m-0.5,max=m+0.5)
sample[3,]<-rcauchy(20,location=m,scale=1)
for (num in 1:20)
{
x<-rexp(1,rate=1)
y<-rbinom(1,1,0.5)
if (y)
x<-x+m
else
x<-m-x
sample[4,num]<-x
}
# tests for each distribution
for (dis in 1:4)
{
if ((mean(sample[dis,])-m0)/sqrt(var(sample[dis,])/20)>t_cut)
reject[dis,1]<-reject[dis,1]+1
if (sum(sample[dis,]>m0)>b_cut)
reject[dis,2]<-reject[dis,2]+1
sample_rank<-rank(abs(sample[dis,]) )
if ( sum(sample_rank*(sample[dis,]>m0)) > w_cut )
reject[dis,3]<-reject[dis,3]+1
}
}
# calculate power function
reject<-reject/1000
n_power[i+1,]<-reject[1,]
u_power[i+1,]<-reject[2,]
c_power[i+1,]<-reject[3,]
l_power[i+1,]<-reject[4,]
}
# x-axis
xaxis=seq(0,3,by=3/150)
# normal
plot(xaxis,n_power[,1],type="l",col="red",xlab="theta",ylab="power",ylim=c(0,1))
lines(xaxis,n_power[,2],col="green")
lines(xaxis,n_power[,3],col="blue")
title(main="normal")
legend(2,0.3,c("T test", "B test", "W test"),col = c(2,3,4),lty = c(1, 1, 1))
# uniform
plot(xaxis,u_power[,1],type="l",col="red",xlab="theta",ylab="power",ylim=c(0,1))
lines(xaxis,u_power[,2],col="green")
lines(xaxis,u_power[,3],col="blue")
title(main="uniform")
legend(2,0.3,c("T test", "B test", "W test"),col = c(2,3,4),lty = c(1, 1, 1))
# Cauchy
plot(xaxis,c_power[,1],type="l",col="red",xlab="theta",ylab="power",ylim=c(0,1))
lines(xaxis,c_power[,2],col="green")
lines(xaxis,c_power[,3],col="blue")
title(main="Cauchy")
legend(2,0.3,c("T test", "B test", "W test"),col = c(2,3,4),lty = c(1, 1, 1))
# Laplace
plot(xaxis,l_power[,1],type="l",col="red",xlab="theta",ylab="power",ylim=c(0,1))
lines(xaxis,l_power[,2],col="green")
lines(xaxis,l_power[,3],col="blue")
title(main="Laplace")
legend(2,0.3,c("T test", "B test", "W test"),col = c(2,3,4),lty = c(1, 1, 1))
Population | Ranking |
---|---|
Normal | |
Uniform | |
Cauchy | |
Laplace |
可以看到Wilcoxon signed rank test 一直有较好的表现;t test在normal或uniform分布下表现较好,但在Cauchy分布下很不理想,$\theta$ 较大时也难以正确地拒绝原假设;Sign test在normal或uniform分布下表现一般,但对于Cauchy分布这种比较奇特的情况效果较好。
应考虑到样本具体的性质,合理用概率模型建模,譬如当分布显然很skew的时候就不宜使用t test。
Q: quantile
评估真实数据是否符合某一理论分布
p.s. 也可以用percentile画P-P plot
# generate Q-Q plot against normal hypothesis
# adapted from http://bbs.pinggu.org/thread-2732625-1-1.html
# AirPassengers is built-in data in R
expeQ <- quantile(AirPassengers,seq(0,1,0.01))
theoQ <- qnorm(seq(0,1,0.01),mean(AirPassengers),sd(AirPassengers))
plot(theoQ, expeQ, ylim=c(0,max(AirPassengers)))
abline(a=0, b=1, col="red", lwd = 2)
plot.ecdf(AirPassengers)
可以与pnorm生成的理论cdf相比较
Reference: slides from Statistical Methods and Applications