R: dplyr - mutate with strptime (incompatible size/wrong result size)
Having worked out how to translate a string into a date or NA if it wasn’t the appropriate format the next thing I wanted to do was store the result of the transformation in my data frame.
I started off with this:
data = data.frame(x = c("2014-01-01", "2014-02-01", "foo"))
> data
x
1 2014-01-01
2 2014-02-01
3 foo
And when I tried to do the date translation ran into the following error:
> data %>% mutate(y = strptime(x, "%Y-%m-%d"))
Error: wrong result size (11), expected 3 or 1
As I understand it this error is telling us that we are trying to put a value into the data frame which represents 11 rows rather than 3 rows or 1 row.
It turns out that storing POSIXlts in a data frame isn’t such a good idea! In this case we can use the as.character function to create a character vector which can be stored in the data frame:</a>
> data %>% mutate(y = strptime(x, "%Y-%m-%d") %>% as.character())
x y
1 2014-01-01 2014-01-01
2 2014-02-01 2014-02-01
3 foo <NA>
We can then get rid of the NA row by using the is.na function:
> data %>% mutate(y = strptime(x, "%Y-%m-%d") %>% as.character()) %>% filter(!is.na(y))
x y
1 2014-01-01 2014-01-01
2 2014-02-01 2014-02-01
And a final tweak so that we have 100% pipelining goodness:
> data %>%
mutate(y = x %>% strptime("%Y-%m-%d") %>% as.character()) %>%
filter(!is.na(y))
x y
1 2014-01-01 2014-01-01
2 2014-02-01 2014-02-01
About the author
I'm currently working on short form content at ClickHouse. I publish short 5 minute videos showing how to solve data problems on YouTube @LearnDataWithMark. I previously worked on graph analytics at Neo4j, where I also co-authored the O'Reilly Graph Algorithms Book with Amy Hodler.