· r-2

# R: Calculating the difference between ordered factor variables

In my continued exploration of Wimbledon data I wanted to work out whether a player had done as well as their seeding suggested they should.

I therefore wanted to work out the difference between the round they reached and the round they were expected to reach. A 'round' in the dataset is an ordered factor variable.

These are all the possible values:

``````
rounds = c("Did not enter", "Round of 128", "Round of 64", "Round of 32", "Round of 16", "Quarter-Finals", "Semi-Finals", "Finals", "Winner")
``````

And if we want to factorise a couple of strings into this factor we would do it like this:

``````
round = factor("Finals", levels = rounds, ordered = TRUE)
expected = factor("Winner", levels = rounds, ordered = TRUE)

> round
 Finals
9 Levels: Did not enter < Round of 128 < Round of 64 < Round of 32 < Round of 16 < Quarter-Finals < ... < Winner

> expected
 Winner
9 Levels: Did not enter < Round of 128 < Round of 64 < Round of 32 < Round of 16 < Quarter-Finals < ... < Winner
``````

In this case the difference between the actual round and expected round should be -1 - the player was expected to win the tournament but lost in the final. We can calculate that differnce by calling the unclass function on each variable:

``````

> unclass(round) - unclass(expected)
 -1
attr(,"levels")
 "Did not enter"  "Round of 128"   "Round of 64"    "Round of 32"    "Round of 16"    "Quarter-Finals"
 "Semi-Finals"    "Finals"         "Winner"
``````

That still seems to have some remnants of the factor variable so to get rid of that we can cast it to a numeric value:

``````
> as.numeric(unclass(round) - unclass(expected))
 -1
``````

And that's it! We can now go and apply this calculation to all seeds to see how they got on.