# R: Calculating the difference between ordered factor variables

In my continued exploration of Wimbledon data I wanted to work out whether a player had done as well as their seeding suggested they should.

I therefore wanted to work out the difference between the round they reached and the round they were expected to reach. A 'round' in the dataset is an ordered factor variable.

These are all the possible values:

```
rounds = c("Did not enter", "Round of 128", "Round of 64", "Round of 32", "Round of 16", "Quarter-Finals", "Semi-Finals", "Finals", "Winner")
```

And if we want to factorise a couple of strings into this factor we would do it like this:

```
round = factor("Finals", levels = rounds, ordered = TRUE)
expected = factor("Winner", levels = rounds, ordered = TRUE)
> round
[1] Finals
9 Levels: Did not enter < Round of 128 < Round of 64 < Round of 32 < Round of 16 < Quarter-Finals < ... < Winner
> expected
[1] Winner
9 Levels: Did not enter < Round of 128 < Round of 64 < Round of 32 < Round of 16 < Quarter-Finals < ... < Winner
```

In this case the difference between the actual round and expected round should be -1 - the player was expected to win the tournament but lost in the final. We can calculate that differnce by calling the unclass function on each variable:

```
> unclass(round) - unclass(expected)
[1] -1
attr(,"levels")
[1] "Did not enter" "Round of 128" "Round of 64" "Round of 32" "Round of 16" "Quarter-Finals"
[7] "Semi-Finals" "Finals" "Winner"
```

That still seems to have some remnants of the factor variable so to get rid of that we can cast it to a numeric value:

```
> as.numeric(unclass(round) - unclass(expected))
[1] -1
```

And that's it! We can now go and apply this calculation to all seeds to see how they got on.

##### About the author

Mark Needham is a Developer Relations Engineer for Neo4j, the world's leading graph database.