This function is a wrapper that calls specify(), hypothesize(), and
calculate() consecutively that can be used to calculate observed
statistics from data. hypothesize() will only be called if a point
null hypothesis parameter is supplied.
Learn more in vignette("infer").
Usage
observe(
x,
formula,
response = NULL,
explanatory = NULL,
success = NULL,
null = NULL,
p = NULL,
mu = NULL,
med = NULL,
sigma = NULL,
stat = c("mean", "median", "sum", "sd", "prop", "count", "diff in means",
"diff in medians", "diff in props", "Chisq", "F", "slope", "correlation", "t", "z",
"ratio of props", "odds ratio"),
order = NULL,
...
)Arguments
- x
A data frame that can be coerced into a tibble.
- formula
A formula with the response variable on the left and the explanatory on the right. Alternatively, a
responseandexplanatoryargument can be supplied.- response
The variable name in
xthat will serve as the response. This is an alternative to using theformulaargument.- explanatory
The variable name in
xthat will serve as the explanatory variable. This is an alternative to using the formula argument.- success
The level of
responsethat will be considered a success, as a string. Needed for inference on one proportion, a difference in proportions, and corresponding z stats.- null
The null hypothesis. Options include
"independence","point", and"paired independence".independence: Should be used with both aresponseandexplanatoryvariable. Indicates that the values of the specifiedresponsevariable are independent of the associated values inexplanatory.point: Should be used with only aresponsevariable. Indicates that a point estimate based on the values inresponseis associated with a parameter. Sometimes requires supplying one ofp,mu,med, orsigma.paired independence: Should be used with only aresponsevariable giving the pre-computed difference between paired observations. Indicates that the order of subtraction between paired values does not affect the resulting distribution.
- p
The true proportion of successes (a number between 0 and 1). To be used with point null hypotheses when the specified response variable is categorical.
- mu
The true mean (any numerical value). To be used with point null hypotheses when the specified response variable is continuous.
- med
The true median (any numerical value). To be used with point null hypotheses when the specified response variable is continuous.
- sigma
The true standard deviation (any numerical value). To be used with point null hypotheses.
- stat
A string giving the type of the statistic to calculate. Current options include
"mean","median","sum","sd","prop","count","diff in means","diff in medians","diff in props","Chisq"(or"chisq"),"F"(or"f"),"t","z","ratio of props","slope","odds ratio","ratio of means", and"correlation".inferonly supports theoretical tests on one or two means via the"t"distribution and one or two proportions via the"z".- order
A string vector of specifying the order in which the levels of the explanatory variable should be ordered for subtraction (or division for ratio-based statistics), where
order = c("first", "second")means("first" - "second"), or the analogue for ratios. Needed for inference on difference in means, medians, proportions, ratios, t, and z statistics.- ...
To pass options like
na.rm = TRUEinto functions like mean(), sd(), etc. Can also be used to supply hypothesized null values for the"t"statistic or additional arguments tostats::chisq.test().
See also
Other wrapper functions:
chisq_stat(),
chisq_test(),
prop_test(),
t_stat(),
t_test()
Other functions for calculating observed statistics:
chisq_stat(),
t_stat()
Examples
# calculating the observed mean number of hours worked per week
gss |>
observe(hours ~ NULL, stat = "mean")
#> Response: hours (numeric)
#> # A tibble: 1 × 1
#> stat
#> <dbl>
#> 1 41.4
# equivalently, calculating the same statistic with the core verbs
gss |>
specify(response = hours) |>
calculate(stat = "mean")
#> Response: hours (numeric)
#> # A tibble: 1 × 1
#> stat
#> <dbl>
#> 1 41.4
# calculating a t statistic for hypothesized mu = 40 hours worked/week
gss |>
observe(hours ~ NULL, stat = "t", null = "point", mu = 40)
#> Response: hours (numeric)
#> Null Hypothesis: point
#> # A tibble: 1 × 1
#> stat
#> <dbl>
#> 1 2.09
# equivalently, calculating the same statistic with the core verbs
gss |>
specify(response = hours) |>
hypothesize(null = "point", mu = 40) |>
calculate(stat = "t")
#> Response: hours (numeric)
#> Null Hypothesis: point
#> # A tibble: 1 × 1
#> stat
#> <dbl>
#> 1 2.09
# similarly for a difference in means in age based on whether
# the respondent has a college degree
observe(
gss,
age ~ college,
stat = "diff in means",
order = c("degree", "no degree")
)
#> Response: age (numeric)
#> Explanatory: college (factor)
#> # A tibble: 1 × 1
#> stat
#> <dbl>
#> 1 0.941
# equivalently, calculating the same statistic with the core verbs
gss |>
specify(age ~ college) |>
calculate("diff in means", order = c("degree", "no degree"))
#> Response: age (numeric)
#> Explanatory: college (factor)
#> # A tibble: 1 × 1
#> stat
#> <dbl>
#> 1 0.941
# for a more in-depth explanation of how to use the infer package
if (FALSE) { # \dontrun{
vignette("infer")
} # }
