Encoding user entered data
I previously wrote about protecting websites from cross site scripting in the ASP.NET MVC framework by encoding user input when we are going to display it in the browser.
We can either choose to encode data like this or we can encode it straight away when we get it.
There did not seem to be a consensus on the best approach in a discussion on the ASP.NET forums but we believe it is far better to encode the data when it is outgoing rather than incoming.
If we encode the data when it comes in then we need to remember which data was pre-encoded and then un-encode it when it is going to be displayed in a non-HTML format.
If we had a user enter their name as "John O’Shea' for example then we would end up storing the ' in the name as its HTML representation and might accidentally end up sending them correspondence referring to them as the wrong name.
Dave suggested 'fixing it [encoding issue] as close to the problem as possible' as a useful guideline to follow.
In this case our problem is that when the data we received from the user is displayed there could be some potentially dangerous code in there and we don’t want to allow it to be executed on our page.
We then work off the assumption that any web applications that use our data should know how to encode the data since that falls under their responsibility. Any other applications can use the data as is.
I drew this diagram to clear it up in my head:
About the author
I'm currently working on short form content at ClickHouse. I publish short 5 minute videos showing how to solve data problems on YouTube @LearnDataWithMark. I previously worked on graph analytics at Neo4j, where I also co-authored the O'Reilly Graph Algorithms Book with Amy Hodler.