Skip to content

Declare a null hypothesis about variables selected in specify().

Learn more in vignette("infer").

Usage

hypothesize(x, null, p = NULL, mu = NULL, med = NULL, sigma = NULL)

hypothesise(x, null, p = NULL, mu = NULL, med = NULL, sigma = NULL)

Arguments

x

A data frame that can be coerced into a tibble.

null

The null hypothesis. Options include "independence", "point", and "paired independence".

  • independence: Should be used with both a response and explanatory variable. Indicates that the values of the specified response variable are independent of the associated values in explanatory.

  • point: Should be used with only a response variable. Indicates that a point estimate based on the values in response is associated with a parameter. Sometimes requires supplying one of p, mu, med, or sigma.

  • paired independence: Should be used with only a response variable giving the pre-computed difference between paired observations. Indicates that the order of subtraction between paired values does not affect the resulting distribution.

p

The true proportion of successes (a number between 0 and 1). To be used with point null hypotheses when the specified response variable is categorical.

mu

The true mean (any numerical value). To be used with point null hypotheses when the specified response variable is continuous.

med

The true median (any numerical value). To be used with point null hypotheses when the specified response variable is continuous.

sigma

The true standard deviation (any numerical value). To be used with point null hypotheses.

Value

A tibble containing the response (and explanatory, if specified) variable data with parameter information stored as well.

See also

Other core functions: calculate(), generate(), specify()

Examples

# hypothesize independence of two variables
gss %>%
 specify(college ~ partyid, success = "degree") %>%
 hypothesize(null = "independence")
#> Dropping unused factor levels DK from the supplied explanatory variable
#> 'partyid'.
#> Response: college (factor)
#> Explanatory: partyid (factor)
#> Null Hypothesis: independence
#> # A tibble: 500 × 2
#>    college   partyid
#>    <fct>     <fct>  
#>  1 degree    ind    
#>  2 no degree rep    
#>  3 degree    ind    
#>  4 no degree ind    
#>  5 degree    rep    
#>  6 no degree rep    
#>  7 no degree dem    
#>  8 degree    ind    
#>  9 degree    rep    
#> 10 no degree dem    
#> # ℹ 490 more rows

# hypothesize a mean number of hours worked per week of 40
gss %>%
  specify(response = hours) %>%
  hypothesize(null = "point", mu = 40)
#> Response: hours (numeric)
#> Null Hypothesis: point
#> # A tibble: 500 × 1
#>    hours
#>    <dbl>
#>  1    50
#>  2    31
#>  3    40
#>  4    40
#>  5    40
#>  6    53
#>  7    32
#>  8    20
#>  9    40
#> 10    40
#> # ℹ 490 more rows

# more in-depth explanation of how to use the infer package
if (FALSE) {
vignette("infer")
}