# R: dplyr - Select 'random' rows from a data frame

Frequently I find myself wanting to take a sample of the rows in a data frame where just taking the head isn't enough.

Let's say we start with the following data frame:

```
data = data.frame(
letter = sample(LETTERS, 50000, replace = TRUE),
number = sample (1:10, 50000, replace = TRUE)
)
```

And we'd like to sample 10 rows to see what it contains. We'll start by generating 10 random numbers to represent row numbers using the runif function:

```
> randomRows = sample(1:length(data[,1]), 10, replace=T)
> randomRows
[1] 8723 18772 4964 36134 27467 31890 16313 12841 49214 15621
```

We can then pass that list of row numbers into dplyr's slice function like so:

```
> data %>% slice(randomRows)
letter number
1 Z 4
2 F 1
3 Y 6
4 R 6
5 Y 4
6 V 10
7 R 6
8 D 6
9 J 7
10 E 2
```

If we're using that code throughout our code then we might want to pull out a function like so:

```
pickRandomRows = function(df, numberOfRows = 10) {
df %>% slice(runif(numberOfRows,0, length(df[,1])))
}
```

And then call it like so:

```
> data %>% pickRandomRows()
letter number
1 W 5
2 Y 3
3 E 6
4 Q 8
5 M 9
6 H 9
7 E 10
8 T 2
9 I 5
10 V 4
> data %>% pickRandomRows(7)
letter number
1 V 7
2 N 4
3 W 1
4 N 8
5 G 7
6 V 1
7 N 7
```

### Update

Antonios pointed out via email that we could just make use of the in-built sample_n function which I didn't know about until now:

```
> data %>% sample_n(10)
letter number
29771 U 1
48666 T 10
30635 A 1
34865 X 7
20140 A 3
41715 T 10
43786 E 10
18284 A 7
21406 S 8
35542 J 8
```

##### About the author

Mark Needham is a Developer Relations Engineer for Neo4j, the world's leading graph database.