This function allows the user to define a null distribution based on
theoretical methods. In many infer pipelines, assume()
can be
used in place of generate()
and calculate()
to create a null
distribution. Rather than outputting a data frame containing a
distribution of test statistics calculated from resamples of the observed
data, assume()
outputs a more abstract type of object just containing
the distributional details supplied in the distribution
and df
arguments.
However, assume()
output can be passed to visualize()
, get_p_value()
,
and get_confidence_interval()
in the same way that simulationbased
distributions can.
To define a theoretical null distribution (for use in hypothesis testing),
be sure to provide a null hypothesis via hypothesize()
. To define a
theoretical sampling distribution (for use in confidence intervals),
provide the output of specify()
. Sampling distributions (only
implemented for t
and z
) lie on the scale of the data, and will be
recentered and rescaled to match the corresponding stat
given in
calculate()
to calculate the observed statistic.
assume(x, distribution, df = NULL, ...)
x  The output of 

distribution  The distribution in question, as a string. One of

df  Optional. The degrees of freedom parameter(s) for the 
...  Currently ignored. 
An infer theoretical distribution that can be passed to helpers
like visualize()
, get_p_value()
, and get_confidence_interval()
.
Note that the assumption being expressed here, for use in theorybased
inference, only extends to distributional assumptions: the null
distribution in question and its parameters. Statistical inference with
infer, whether carried out via simulation (i.e. based on pipelines
using generate()
and calculate()
) or theory (i.e. with assume()
),
always involves the condition that observations are independent of
each other.
infer
only supports theoretical tests on one or two means via the
t
distribution and one or two proportions via the z
.
For tests comparing two means, if n1
is the group size for one level of
the explanatory variable, and n2
is that for the other level, infer
will recognize the following degrees of freedom (df
) arguments:
min(n1  1, n2  1)
n1 + n2  2
The "parameter"
entry of the analogous stats::t.test()
call
The "parameter"
entry of the analogous stats::t.test()
call with var.equal = TRUE
By default, the package will use the "parameter"
entry of the analogous
stats::t.test()
call with var.equal = FALSE
(the default).
# construct theoretical distributions 
# F distribution
# with the `partyid` explanatory variable
gss %>%
specify(age ~ partyid) %>%
assume(distribution = "F")
#> Dropping unused factor levels DK from the supplied explanatory variable 'partyid'.
#> An F distribution with 3 and 496 degrees of freedom.
# Chisquared goodness of fit distribution
# on the `finrela` variable
gss %>%
specify(response = finrela) %>%
hypothesize(null = "point",
p = c("far below average" = 1/6,
"below average" = 1/6,
"average" = 1/6,
"above average" = 1/6,
"far above average" = 1/6,
"DK" = 1/6)) %>%
assume("Chisq")
#> A Chisquared distribution with 5 degrees of freedom.
# Chisquared test of independence
# on the `finrela` and `sex` variables
gss %>%
specify(formula = finrela ~ sex) %>%
assume(distribution = "Chisq")
#> A Chisquared distribution with 5 degrees of freedom.
# T distribution
gss %>%
specify(age ~ college) %>%
assume("t")
#> A T distribution with 423 degrees of freedom.
# Z distribution
gss %>%
specify(response = sex, success = "female") %>%
assume("z")
#> A Z distribution.
if (FALSE) {
# each of these distributions can be passed to infer helper
# functions alongside observed statistics!
# for example, a 1sample ttest 
# calculate the observed statistic
obs_stat < gss %>%
specify(response = hours) %>%
hypothesize(null = "point", mu = 40) %>%
calculate(stat = "t")
# construct a null distribution
null_dist < gss %>%
specify(response = hours) %>%
assume("t")
# juxtapose them visually
visualize(null_dist) +
shade_p_value(obs_stat, direction = "both")
# calculate a pvalue
get_p_value(null_dist, obs_stat, direction = "both")
# or, an F test 
# calculate the observed statistic
obs_stat < gss %>%
specify(age ~ partyid) %>%
hypothesize(null = "independence") %>%
calculate(stat = "F")
# construct a null distribution
null_dist < gss %>%
specify(age ~ partyid) %>%
assume(distribution = "F")
# juxtapose them visually
visualize(null_dist) +
shade_p_value(obs_stat, direction = "both")
# calculate a pvalue
get_p_value(null_dist, obs_stat, direction = "both")
}