These functions extend the functionality of dplyr::sample_n() and dplyr::slice_sample() by allowing for repeated sampling of data. This operation is especially helpful while creating sampling distributions—see the examples below!

rep_sample_n(tbl, size, replace = FALSE, reps = 1, prob = NULL)

rep_slice_sample(.data, n = 1, replace = FALSE, weight_by = NULL, reps = 1)

Arguments

tbl, .data

Data frame of population from which to sample.

size, n

Sample size of each sample.

replace

Should sampling be with replacement?

reps

Number of samples of size n = size to take.

prob, weight_by

A vector of sampling weights for each of the rows in tbl—must have length equal to nrow(tbl).

Value

A tibble of size rep * size rows corresponding to reps samples of size size from tbl, grouped by replicate.

Details

The dplyr::sample_n() function (to which rep_sample_n() was originally a supplement) has been superseded by dplyr::slice_sample(). rep_slice_sample() provides a light wrapper around rep_sample_n() that has a more similar interface to slice_sample().

Examples

#> #> Attaching package: ‘dplyr’
#> The following objects are masked from ‘package:stats’: #> #> filter, lag
#> The following objects are masked from ‘package:base’: #> #> intersect, setdiff, setequal, union
library(ggplot2) # take 1000 samples of size n = 50, without replacement slices <- gss %>% rep_sample_n(size = 50, reps = 1000) slices
#> # A tibble: 50,000 x 12 #> # Groups: replicate [1,000] #> replicate year age sex college partyid hompop hours income class finrela #> <int> <dbl> <dbl> <fct> <fct> <fct> <dbl> <dbl> <ord> <fct> <fct> #> 1 1 2000 32 fema… no deg… dem 2 40 $1500… lowe… below … #> 2 1 1982 45 fema… no deg… ind 3 40 $2000… work… average #> 3 1 1984 37 fema… no deg… dem 4 15 $2500… work… average #> 4 1 2008 47 fema… degree ind 1 37 $2500… midd… above … #> 5 1 1991 40 fema… no deg… dem 2 15 $1500… work… below … #> 6 1 1998 32 male degree dem 4 40 $2500… work… average #> 7 1 1984 27 male no deg… rep 3 50 $1500… midd… average #> 8 1 1994 30 fema… no deg… dem 3 44 $2500… work… average #> 9 1 1988 40 male no deg… dem 5 50 $2500… work… below … #> 10 1 2018 37 male degree ind 4 40 $2500… work… above … #> # … with 49,990 more rows, and 1 more variable: weight <dbl>
# compute the proportion of respondents with a college # degree in each replicate p_hats <- slices %>% group_by(replicate) %>% summarize(prop_college = mean(college == "degree"))
#> `summarise()` ungrouping output (override with `.groups` argument)
# plot sampling distribution ggplot(p_hats, aes(x = prop_college)) + geom_density() + labs( x = "p_hat", y = "Number of samples", title = "Sampling distribution of p_hat" )
# sampling with probability weights. Note probabilities are automatically # renormalized to sum to 1 library(tibble) df <- tibble( id = 1:5, letter = factor(c("a", "b", "c", "d", "e")) ) rep_sample_n(df, size = 2, reps = 5, prob = c(.5, .4, .3, .2, .1))
#> # A tibble: 10 x 3 #> # Groups: replicate [5] #> replicate id letter #> <int> <int> <fct> #> 1 1 1 a #> 2 1 3 c #> 3 2 1 a #> 4 2 3 c #> 5 3 3 c #> 6 3 4 d #> 7 4 1 a #> 8 4 2 b #> 9 5 2 b #> 10 5 1 a