Compute a confidence interval around a summary statistic. Both
simulationbased and theoretical methods are supported, though only
type = "se"
is supported for theoretical methods.
Learn more in vignette("infer")
.
get_confidence_interval(x, level = 0.95, type = NULL, point_estimate = NULL)
get_ci(x, level = 0.95, type = NULL, point_estimate = NULL)
x  A distribution. For simulationbased inference, a data frame
containing a distribution of 

level  A numerical value between 0 and 1 giving the confidence level. Default value is 0.95. 
type  A string giving which method should be used for creating the
confidence interval. The default is 
point_estimate  A data frame containing the observed statistic (in a

A tibble containing the following columns:
term
: The explanatory variable (or intercept) in question. Only
supplied if the input had been previously passed to fit()
.
lower_ci
, upper_ci
: The lower and upper bounds of the confidence
interval, respectively.
A null hypothesis is not required to compute a confidence interval. However,
including hypothesize()
in a pipeline leading to get_confidence_interval()
will not break anything. This can be useful when computing a confidence
interval using the same distribution used to compute a pvalue.
Theoretical confidence intervals (i.e. calculated by supplying the output
of assume()
to the x
argument) require that the point estimate lies on
the scale of the data. The distribution defined in assume()
will be
recentered and rescaled to align with the point estimate, as can be shown
in the output of visualize()
when paired with shade_confidence_interval()
.
Confidence intervals are implemented for the following distributions and
point estimates:
distribution = "t"
: point_estimate
should be the output of
calculate()
with stat = "mean"
or stat = "diff in means"
distribution = "z"
: point_estimate
should be the output of
calculate()
with stat = "prop"
or stat = "diff in props"
get_ci()
is an alias of get_confidence_interval()
.
conf_int()
is a deprecated alias of get_confidence_interval()
.
Other auxillary functions:
get_p_value()
boot_dist < gss %>%
# We're interested in the number of hours worked per week
specify(response = hours) %>%
# Generate bootstrap samples
generate(reps = 1000, type = "bootstrap") %>%
# Calculate mean of each bootstrap sample
calculate(stat = "mean")
boot_dist %>%
# Calculate the confidence interval around the point estimate
get_confidence_interval(
# At the 95% confidence level; percentile method
level = 0.95
)
#> # A tibble: 1 × 2
#> lower_ci upper_ci
#> <dbl> <dbl>
#> 1 40.2 42.7
# for type = "se" or type = "biascorrected" we need a point estimate
sample_mean < gss %>%
specify(response = hours) %>%
calculate(stat = "mean")
boot_dist %>%
get_confidence_interval(
point_estimate = sample_mean,
# At the 95% confidence level
level = 0.95,
# Using the standard error method
type = "se"
)
#> # A tibble: 1 × 2
#> lower_ci upper_ci
#> <dbl> <dbl>
#> 1 40.1 42.7
# using a theoretical distribution 
# define a sampling distribution
sampling_dist < gss %>%
specify(response = hours) %>%
assume("t")
# get the confidence intervalnote that the
# point estimate is required here
get_confidence_interval(
sampling_dist,
level = .95,
point_estimate = sample_mean
)
#> # A tibble: 1 × 2
#> lower_ci upper_ci
#> <dbl> <dbl>
#> 1 40.1 42.7
# using a model fitting workflow 
# fit a linear model predicting number of hours worked per
# week using respondent age and degree status.
observed_fit < gss %>%
specify(hours ~ age + college) %>%
fit()
observed_fit
#> # A tibble: 3 × 2
#> term estimate
#> <chr> <dbl>
#> 1 intercept 40.6
#> 2 age 0.00596
#> 3 collegedegree 1.53
# fit 100 models to resamples of the gss dataset, where the response
# `hours` is permuted in each. note that this code is the same as
# the above except for the addition of the `generate` step.
null_fits < gss %>%
specify(hours ~ age + college) %>%
hypothesize(null = "independence") %>%
generate(reps = 100, type = "permute") %>%
fit()
null_fits
#> # A tibble: 300 × 3
#> # Groups: replicate [100]
#> replicate term estimate
#> <int> <chr> <dbl>
#> 1 1 intercept 44.2
#> 2 1 age 0.0765
#> 3 1 collegedegree 0.676
#> 4 2 intercept 41.5
#> 5 2 age 0.000968
#> 6 2 collegedegree 0.329
#> 7 3 intercept 41.4
#> 8 3 age 0.0131
#> 9 3 collegedegree 1.50
#> 10 4 intercept 42.0
#> # … with 290 more rows
get_confidence_interval(
null_fits,
point_estimate = observed_fit,
level = .95
)
#> # A tibble: 3 × 3
#> term lower_ci upper_ci
#> <chr> <dbl> <dbl>
#> 1 age 0.0846 0.0856
#> 2 collegedegree 2.10 2.81
#> 3 intercept 38.1 44.7
# more indepth explanation of how to use the infer package
if (FALSE) {
vignette("infer")
}