Given the output of `specify()`

and/or `hypothesize()`

, this function will
return the observed statistic specified with the `stat`

argument. Some test
statistics, such as `Chisq`

, `t`

, and `z`

, require a null hypothesis. If
provided the output of `generate()`

, the function will calculate the
supplied `stat`

for each `replicate`

.

Learn more in `vignette("infer")`

.

```
calculate(
x,
stat = c("mean", "median", "sum", "sd", "prop", "count", "diff in means",
"diff in medians", "diff in props", "Chisq", "F", "slope", "correlation", "t", "z",
"ratio of props", "odds ratio"),
order = NULL,
...
)
```

x | The output from |
---|---|

stat | A string giving the type of the statistic to calculate. Current
options include |

order | A string vector of specifying the order in which the levels of
the explanatory variable should be ordered for subtraction (or division
for ratio-based statistics), where |

... | To pass options like |

A tibble containing a `stat`

column of calculated statistics.

In some cases, when bootstrapping with small samples, some generated bootstrap samples will have only one level of the explanatory variable present. For some test statistics, the calculated statistic in these cases will be NaN. The package will omit non-finite values from visualizations (with a warning) and raise an error in p-value calculations.

When using the infer package for research, or in other cases when exact
reproducibility is a priority, be sure the set the seed for R’s random
number generator. infer will respect the random seed specified in the
`set.seed()`

function, returning the same result when `generate()`

ing
data given an identical seed. For instance, we can calculate the
difference in mean `age`

by `college`

degree status using the `gss`

dataset from 10 versions of the `gss`

resampled with permutation using
the following code.

```
set.seed(1)
gss %>%
specify(age ~ college) %>%
hypothesize(null = "independence") %>%
generate(reps = 5, type = "permute") %>%
calculate("diff in means", order = c("degree", "no degree"))
```

```
## Response: age (numeric)
## Explanatory: college (factor)
## Null Hypothesis: independence
## # A tibble: 5 × 2
## replicate stat
## <int> <dbl>
## 1 1 -0.531
## 2 2 -2.35
## 3 3 0.764
## 4 4 0.280
## 5 5 0.350
```

Setting the seed to the same value again and rerunning the same code will produce the same result.

```
# set the seed
set.seed(1)
gss %>%
specify(age ~ college) %>%
hypothesize(null = "independence") %>%
generate(reps = 5, type = "permute") %>%
calculate("diff in means", order = c("degree", "no degree"))
```

```
## Response: age (numeric)
## Explanatory: college (factor)
## Null Hypothesis: independence
## # A tibble: 5 × 2
## replicate stat
## <int> <dbl>
## 1 1 -0.531
## 2 2 -2.35
## 3 3 0.764
## 4 4 0.280
## 5 5 0.350
```

Please keep this in mind when writing infer code that utilizes
resampling with `generate()`

.

`visualize()`

, `get_p_value()`

, and `get_confidence_interval()`

to extract value from this function's outputs.

Other core functions:
`generate()`

,
`hypothesize()`

,
`specify()`

```
# calculate a null distribution of hours worked per week under
# the null hypothesis that the mean is 40
gss %>%
specify(response = hours) %>%
hypothesize(null = "point", mu = 40) %>%
generate(reps = 200, type = "bootstrap") %>%
calculate(stat = "mean")
#> Response: hours (numeric)
#> Null Hypothesis: point
#> # A tibble: 200 × 2
#> replicate stat
#> <int> <dbl>
#> 1 1 39.2
#> 2 2 39.4
#> 3 3 40.1
#> 4 4 39.6
#> 5 5 40.8
#> 6 6 39.9
#> 7 7 39.9
#> 8 8 40.8
#> 9 9 39.6
#> 10 10 41.0
#> # … with 190 more rows
# calculate the corresponding observed statistic
gss %>%
specify(response = hours) %>%
calculate(stat = "mean")
#> Response: hours (numeric)
#> # A tibble: 1 × 1
#> stat
#> <dbl>
#> 1 41.4
# calculate a null distribution assuming independence between age
# of respondent and whether they have a college degree
gss %>%
specify(age ~ college) %>%
hypothesize(null = "independence") %>%
generate(reps = 200, type = "permute") %>%
calculate("diff in means", order = c("degree", "no degree"))
#> Response: age (numeric)
#> Explanatory: college (factor)
#> Null Hypothesis: independence
#> # A tibble: 200 × 2
#> replicate stat
#> <int> <dbl>
#> 1 1 -2.48
#> 2 2 -0.699
#> 3 3 -0.0113
#> 4 4 0.579
#> 5 5 0.553
#> 6 6 1.84
#> 7 7 -2.31
#> 8 8 -0.320
#> 9 9 -0.00250
#> 10 10 -1.78
#> # … with 190 more rows
# calculate the corresponding observed statistic
gss %>%
specify(age ~ college) %>%
calculate("diff in means", order = c("degree", "no degree"))
#> Response: age (numeric)
#> Explanatory: college (factor)
#> # A tibble: 1 × 1
#> stat
#> <dbl>
#> 1 0.941
# some statistics require a null hypothesis
gss %>%
specify(response = hours) %>%
hypothesize(null = "point", mu = 40) %>%
calculate(stat = "t")
#> Response: hours (numeric)
#> Null Hypothesis: point
#> # A tibble: 1 × 1
#> stat
#> <dbl>
#> 1 2.09
# more in-depth explanation of how to use the infer package
if (FALSE) {
vignette("infer")
}
```