Compute a pvalue from a null distribution and observed statistic.
Learn more in vignette("infer")
.
get_p_value(x, obs_stat, direction)
# S3 method for default
get_p_value(x, obs_stat, direction)
get_pvalue(x, obs_stat, direction)
# S3 method for infer_dist
get_p_value(x, obs_stat, direction)
x  A null distribution. For simulationbased inference, a data frame
containing a distribution of 

obs_stat  A data frame containing the observed statistic (in a

direction  A character string. Options are 
A tibble containing the following columns:
term
: The explanatory variable (or intercept) in question. Only
supplied if the input had been previously passed to fit()
.
p_value
: A value in [0, 1] giving the probability that a
statistic/coefficient as or more extreme than the observed
statistic/coefficient would occur if the null hypothesis were true.
get_pvalue()
is an alias of get_p_value()
.
p_value
is a deprecated alias of get_p_value()
.
Though a true pvalue of 0 is impossible, get_p_value()
may return 0 in
some cases. This is due to the simulationbased nature of the {infer}
package; the output of this function is an approximation based on
the number of reps
chosen in the generate()
step. When the observed
statistic is very unlikely given the null hypothesis, and only a small
number of reps
have been generated to form a null distribution,
it is possible that the observed statistic will be more extreme than
every test statistic generated to form the null distribution, resulting
in an approximate pvalue of 0. In this case, the true pvalue is a small
value likely less than 3/reps
(based on a poisson approximation).
In the case that a pvalue of zero is reported, a warning message will be raised to caution the user against reporting a pvalue exactly equal to 0.
Other auxillary functions:
get_confidence_interval()
# using a simulationbased null distribution 
# find the point estimatemean number of hours worked per week
point_estimate < gss %>%
specify(response = hours) %>%
calculate(stat = "mean")
# starting with the gss dataset
gss %>%
# ...we're interested in the number of hours worked per week
specify(response = hours) %>%
# hypothesizing that the mean is 40
hypothesize(null = "point", mu = 40) %>%
# generating data points for a null distribution
generate(reps = 1000, type = "bootstrap") %>%
# finding the null distribution
calculate(stat = "mean") %>%
get_p_value(obs_stat = point_estimate, direction = "twosided")
#> # A tibble: 1 × 1
#> p_value
#> <dbl>
#> 1 0.032
# using a theoretical null distribution 
# calculate the observed statistic
obs_stat < gss %>%
specify(response = hours) %>%
hypothesize(null = "point", mu = 40) %>%
calculate(stat = "t")
# define a null distribution
null_dist < gss %>%
specify(response = hours) %>%
assume("t")
# calculate a pvalue
get_p_value(null_dist, obs_stat, direction = "both")
#> # A tibble: 1 × 1
#> p_value
#> <dbl>
#> 1 0.0376
# using a model fitting workflow 
# fit a linear model predicting number of hours worked per
# week using respondent age and degree status.
observed_fit < gss %>%
specify(hours ~ age + college) %>%
fit()
observed_fit
#> # A tibble: 3 × 2
#> term estimate
#> <chr> <dbl>
#> 1 intercept 40.6
#> 2 age 0.00596
#> 3 collegedegree 1.53
# fit 100 models to resamples of the gss dataset, where the response
# `hours` is permuted in each. note that this code is the same as
# the above except for the addition of the `generate` step.
null_fits < gss %>%
specify(hours ~ age + college) %>%
hypothesize(null = "independence") %>%
generate(reps = 100, type = "permute") %>%
fit()
null_fits
#> # A tibble: 300 × 3
#> # Groups: replicate [100]
#> replicate term estimate
#> <int> <chr> <dbl>
#> 1 1 intercept 40.7
#> 2 1 age 0.00753
#> 3 1 collegedegree 2.78
#> 4 2 intercept 41.8
#> 5 2 age 0.000256
#> 6 2 collegedegree 1.08
#> 7 3 intercept 42.7
#> 8 3 age 0.0426
#> 9 3 collegedegree 1.23
#> 10 4 intercept 42.6
#> # … with 290 more rows
get_p_value(null_fits, obs_stat = observed_fit, direction = "twosided")
#> # A tibble: 3 × 2
#> term p_value
#> <chr> <dbl>
#> 1 age 0.92
#> 2 collegedegree 0.26
#> 3 intercept 0.68
# more indepth explanation of how to use the infer package
if (FALSE) {
vignette("infer")
}