String operations in R
Efficiently checking and/or changing strings in complex data sets is often very time consuming, but R has a rich set of functions for managing strings (e.g. sentences or paragraphs). We will focus on the stringr library in this sessions, introducing a few key functions. You will need:
install.packages("stringr") #if you don't have it already
The example data we will use can be downloaded here!
# Sample code from the introductory session
text.dat <- read.csv("textExample.csv") #or point R to where you saved the file from the above link
head(text.dat) # to have a look at the start of the data...
# Let's count the number of times a certain string appears in the vegetation strata description
text.dat$acacia <- str_count(text.dat$VegStrataDescr, pattern = "acacia")
text.dat$grass <- str_count(text.dat$VegStrataDescr, pattern = "grass")
text.dat$tree <- str_count(text.dat$VegStrataDescr, pattern = "tree")
# You may want to try this with other terms as well...
# We often need to change from small caps to LARGE caps or the other way around. tolower() is a useful function for this.
text.dat$VegStrataDescr <- tolower(text.dat$VegStrataDescr) # Converts the whole string to small caps
# Let's make a copy of our R object and do some string replacement
text.dat$NewStrataDescr <- text.dat$VegStrataDescr
text.dat$NewStrataDescr <- str_replace(text.dat$VegStrataDescr, pattern = "grass", replacement = "grassland") # we will look at using regular expression or even grep for this in a later seminar.