library(stringr)
<- c("140102", "210301", "220501", "220502", "230101", "INFO", "230102")
codes codes
[1] "140102" "210301" "220501" "220502" "230101" "INFO" "230102"
Aurélien Ginolhac
November 27, 2023
Opening help pages of functions used regularly is worth it. often, one discovers some hidden gems in argument functions. Here is an example with the str_extract()
function from the package stringr
.
Let’s look at some strings that encode some code:
Plus some other strings that are not relevant here like INFO
.
library(stringr)
codes <- c("140102", "210301", "220501", "220502", "230101", "INFO", "230102")
codes
[1] "140102" "210301" "220501" "220502" "230101" "INFO" "230102"
From those we want to obtain the following:
If we use str_sub()
we can extract the second group of two digits, it does not work for INFO
that should be NA
.
We need a regular expression that specify 6 digits and take the second group of two digits.
str_extract()
takes a regular expression as argument. We can use the following regex to extract the second group of two digits:
Fine for detecting the INFO
and exclude it but we don’t extract since the 2 previous and 2 following any characters (.
) are included in the regex.
Look arounds are extremely powerful but very complex. I never get it right the first time. I always need to look up the help pages.
But already we see the grouping popping up.
This is where I was usually stopped and return to look arounds.
But, str_extract()
has a group
argument that allows to extract the group of interest.
Works! and easy to understand.
For example, the second group (5th and 6th characters):
Reading help pages help!