R: String to Date or NA
I’ve been trying to clean up a CSV file which contains some rows with dates and some not - I only want to keep the cells which do have dates so I’ve been trying to work out how to do that.
My first thought was that I’d try and find a function which would convert the contents of the cell into a date if it was in date format and NA if not. I could then filter out the NA values using the http://www.statmethods.net/input/missingdata.html function.
I started out with the http://www.statmethods.net/input/dates.html function...
> as.Date("2014-01-01")
[1] "2014-01-01"
> as.Date("foo")
Error in charToDate(x) :
character string is not in a standard unambiguous format
...but that throws an error if we have a non date value so it’s not so useful in this case.
Instead we can make use of the http://stackoverflow.com/questions/14755425/what-are-the-standard-unambiguous-date-formats function which does exactly what we want:
> strptime("2014-01-01", "%Y-%m-%d")
[1] "2014-01-01 GMT"
> strptime("foo", "%Y-%m-%d")
[1] NA
We can then feed those values into is.na..
> strptime("2014-01-01", "%Y-%m-%d") %>% is.na()
[1] FALSE
> strptime("foo", "%Y-%m-%d") %>% is.na()
[1] TRUE
...and we have exactly the behaviour we were looking for.
About the author
I'm currently working on short form content at ClickHouse. I publish short 5 minute videos showing how to solve data problems on YouTube @LearnDataWithMark. I previously worked on graph analytics at Neo4j, where I also co-authored the O'Reilly Graph Algorithms Book with Amy Hodler.