R: Deriving a new data frame column based on containing string
I’ve been playing around with R data frames a bit more and one thing I wanted to do was derive a new column based on the text contained in the existing column.
I started with something like this:
> x = data.frame(name = c("Java Hackathon", "Intro to Graphs", "Hands on Cypher"))
> x
name
1 Java Hackathon
2 Intro to Graphs
3 Hands on Cypher
And I wanted to derive a new column based on whether or not the session was a practical one. The grepl function seemed to be the best tool for the job:
> grepl("Hackathon|Hands on|Hands On", x$name)
[1] TRUE FALSE TRUE
We can then add a column to our data frame with that output:
x$practical = grepl("Hackathon|Hands on|Hands On", x$name)
And we end up with the following:
> x
name practical
1 Java Hackathon TRUE
2 Intro to Graphs FALSE
3 Hands on Cypher TRUE
Not too tricky but it took me a bit too long to figure it out so I thought I’d save future Mark some time!
About the author
I'm currently working on short form content at ClickHouse. I publish short 5 minute videos showing how to solve data problems on YouTube @LearnDataWithMark. I previously worked on graph analytics at Neo4j, where I also co-authored the O'Reilly Graph Algorithms Book with Amy Hodler.