30 Oct 2011

Canonical Identifiers

Duncan and I had an interesting problem recently where we had to make it possible to search within an 'item' to find possible sub items that exist inside it.

The URI for the item was something like this:

/items/234

Let’s say Item 234 contains the following sub items:

Mark
duncan

We have a search box on the page which allows us to type in the name of a sub item and go the sub item’s page if it exists or see an error message if it doesn’t.

If the user types in the sub item name exactly right then there’s no problem:

items/234?subItem=Mark

redirects to:

items/234/subItem/Mark

It becomes more interesting if the user gets the case of the sub item wrong e.g. they type 'mark' instead of 'Mark'.

It’s not very friendly in terms of user experience to give the user an error message if they do that so I suggested that we just make the look up of the sub item case insensitive

items/234/subItem/mark

would therefore find us the 'Mark' sub item.

Duncan pointed out that we’d now have more than 1 URI for the same document which isn’t particularly great since theoretically there should be a one to one mapping between a URI and a given document.

He pointed out that we could do a look up to find the 'canonical identifier' before we did the redirect such that if you typed in 'mark':

items/234?subItem=mark

would redirect to:

items/234/subItem/Mark

The logic for checking the existence of a sub item would be the bit that’s case insensitive and makes it more user friendly.

About the author

I'm currently working on short form content at ClickHouse. I publish short 5 minute videos showing how to solve data problems on YouTube @LearnDataWithMark. I previously worked on graph analytics at Neo4j, where I also co-authored the O'Reilly Graph Algorithms Book with Amy Hodler.