This function allows the user to define a null distribution based on
theoretical methods. In many infer pipelines, `assume()`

can be
used in place of `generate()`

and `calculate()`

to create a null
distribution. Rather than outputting a data frame containing a
distribution of test statistics calculated from resamples of the observed
data, `assume()`

outputs a more abstract type of object just containing
the distributional details supplied in the `distribution`

and `df`

arguments.
However, `assume()`

output can be passed to `visualize()`

, `get_p_value()`

,
and `get_confidence_interval()`

in the same way that simulation-based
distributions can.

To define a theoretical null distribution (for use in hypothesis testing),
be sure to provide a null hypothesis via `hypothesize()`

. To define a
theoretical sampling distribution (for use in confidence intervals),
provide the output of `specify()`

. Sampling distributions (only
implemented for `t`

and `z`

) lie on the scale of the data, and will be
recentered and rescaled to match the corresponding `stat`

given in
`calculate()`

to calculate the observed statistic.

## Arguments

- x
The output of

`specify()`

or`hypothesize()`

, giving the observed data, variable(s) of interest, and (optionally) null hypothesis.- distribution
The distribution in question, as a string. One of

`"F"`

,`"Chisq"`

,`"t"`

, or`"z"`

.- df
Optional. The degrees of freedom parameter(s) for the

`distribution`

supplied, as a numeric vector. For`distribution = "F"`

, this should have length two (e.g.`c(10, 3)`

). For`distribution = "Chisq"`

or`distribution = "t"`

, this should have length one. For`distribution = "z"`

, this argument is not required. The package will supply a message if the supplied`df`

argument is different from recognized values. See the Details section below for more information.- ...
Currently ignored.

## Value

An infer theoretical distribution that can be passed to helpers
like `visualize()`

, `get_p_value()`

, and `get_confidence_interval()`

.

## Details

Note that the assumption being expressed here, for use in theory-based
inference, only extends to *distributional* assumptions: the null
distribution in question and its parameters. Statistical inference with
infer, whether carried out via simulation (i.e. based on pipelines
using `generate()`

and `calculate()`

) or theory (i.e. with `assume()`

),
always involves the condition that observations are independent of
each other.

`infer`

only supports theoretical tests on one or two means via the
`t`

distribution and one or two proportions via the `z`

.

For tests comparing two means, if `n1`

is the group size for one level of
the explanatory variable, and `n2`

is that for the other level, `infer`

will recognize the following degrees of freedom (`df`

) arguments:

`min(n1 - 1, n2 - 1)`

`n1 + n2 - 2`

The

`"parameter"`

entry of the analogous`stats::t.test()`

callThe

`"parameter"`

entry of the analogous`stats::t.test()`

call with`var.equal = TRUE`

By default, the package will use the `"parameter"`

entry of the analogous
`stats::t.test()`

call with `var.equal = FALSE`

(the default).

## Examples

```
# construct theoretical distributions ---------------------------------
# F distribution
# with the `partyid` explanatory variable
gss %>%
specify(age ~ partyid) %>%
assume(distribution = "F")
#> Dropping unused factor levels DK from the supplied explanatory variable
#> 'partyid'.
#> An F distribution with 3 and 496 degrees of freedom.
# Chi-squared goodness of fit distribution
# on the `finrela` variable
gss %>%
specify(response = finrela) %>%
hypothesize(null = "point",
p = c("far below average" = 1/6,
"below average" = 1/6,
"average" = 1/6,
"above average" = 1/6,
"far above average" = 1/6,
"DK" = 1/6)) %>%
assume("Chisq")
#> A Chi-squared distribution with 5 degrees of freedom.
# Chi-squared test of independence
# on the `finrela` and `sex` variables
gss %>%
specify(formula = finrela ~ sex) %>%
assume(distribution = "Chisq")
#> A Chi-squared distribution with 5 degrees of freedom.
# T distribution
gss %>%
specify(age ~ college) %>%
assume("t")
#> A T distribution with 423 degrees of freedom.
# Z distribution
gss %>%
specify(response = sex, success = "female") %>%
assume("z")
#> A Z distribution.
if (FALSE) {
# each of these distributions can be passed to infer helper
# functions alongside observed statistics!
# for example, a 1-sample t-test -------------------------------------
# calculate the observed statistic
obs_stat <- gss %>%
specify(response = hours) %>%
hypothesize(null = "point", mu = 40) %>%
calculate(stat = "t")
# construct a null distribution
null_dist <- gss %>%
specify(response = hours) %>%
assume("t")
# juxtapose them visually
visualize(null_dist) +
shade_p_value(obs_stat, direction = "both")
# calculate a p-value
get_p_value(null_dist, obs_stat, direction = "both")
# or, an F test ------------------------------------------------------
# calculate the observed statistic
obs_stat <- gss %>%
specify(age ~ partyid) %>%
hypothesize(null = "independence") %>%
calculate(stat = "F")
# construct a null distribution
null_dist <- gss %>%
specify(age ~ partyid) %>%
assume(distribution = "F")
# juxtapose them visually
visualize(null_dist) +
shade_p_value(obs_stat, direction = "both")
# calculate a p-value
get_p_value(null_dist, obs_stat, direction = "both")
}
```