· r-2 rstats dplyr

R: dplyr - Select 'random' rows from a data frame

Frequently I find myself wanting to take a sample of the rows in a data frame where just taking the head isn't enough.

Let's say we start with the following data frame:


data = data.frame(
    letter = sample(LETTERS, 50000, replace = TRUE),
    number = sample (1:10, 50000, replace = TRUE)
    )

And we'd like to sample 10 rows to see what it contains. We'll start by generating 10 random numbers to represent row numbers using the runif function:


> randomRows = sample(1:length(data[,1]), 10, replace=T)
> randomRows
 [1]  8723 18772  4964 36134 27467 31890 16313 12841 49214 15621

We can then pass that list of row numbers into dplyr's slice function like so:


> data %>% slice(randomRows)
   letter number
1       Z      4
2       F      1
3       Y      6
4       R      6
5       Y      4
6       V     10
7       R      6
8       D      6
9       J      7
10      E      2

If we're using that code throughout our code then we might want to pull out a function like so:


pickRandomRows = function(df, numberOfRows = 10) {
  df %>% slice(runif(numberOfRows,0, length(df[,1])))
}

And then call it like so:


> data %>% pickRandomRows()
   letter number
1       W      5
2       Y      3
3       E      6
4       Q      8
5       M      9
6       H      9
7       E     10
8       T      2
9       I      5
10      V      4

> data %>% pickRandomRows(7)
  letter number
1      V      7
2      N      4
3      W      1
4      N      8
5      G      7
6      V      1
7      N      7

Update

Antonios pointed out via email that we could just make use of the in-built sample_n function which I didn't know about until now:


> data %>% sample_n(10)
      letter number
29771      U      1
48666      T     10
30635      A      1
34865      X      7
20140      A      3
41715      T     10
43786      E     10
18284      A      7
21406      S      8
35542      J      8
  • LinkedIn
  • Tumblr
  • Reddit
  • Google+
  • Pinterest
  • Pocket