I have a dataset that has been created by a survey field house and they introduced curly apostrophes for some labels. When I import it with the haven package, those curly apostrophes stay on. I would like to replace them all to straight apostrophes. In an ideal case, this would be done for all of those apostrophes, but if I do that it changes the order of all levels, which is something I want to avoid.
So as a second solution, I want to change all the factors such as "Don’t know" to "Don't know" (or for ease of use right now to "DK").
When I am using mutate_if, I get a warning message when that level doesn't exist (which is fair enough).
tbl_dat <- tibble(var01 = factor(c("Don’t know", "Cat 1")),
var02 = factor(c("No answer", "Cat 2")))
tbl_dat <- tbl_dat %>%
mutate_if(is.factor,
fct_recode,
"DK" = "Don’t know")
Warning:
Warning in
mutate()
: ℹ In argument:var01 = (function (.f, ...) ...
. Caused by warning: ! Unknown levels inf
: Don’t know
How can I keep the factor levels (i.e. not change it to character and then back to factor) but do it across all variables without having to pre-select the variables that have that level? In an ideal case I don't want to ignore or suppress warnings but rather avoid them in the first place.
I have a dataset that has been created by a survey field house and they introduced curly apostrophes for some labels. When I import it with the haven package, those curly apostrophes stay on. I would like to replace them all to straight apostrophes. In an ideal case, this would be done for all of those apostrophes, but if I do that it changes the order of all levels, which is something I want to avoid.
So as a second solution, I want to change all the factors such as "Don’t know" to "Don't know" (or for ease of use right now to "DK").
When I am using mutate_if, I get a warning message when that level doesn't exist (which is fair enough).
tbl_dat <- tibble(var01 = factor(c("Don’t know", "Cat 1")),
var02 = factor(c("No answer", "Cat 2")))
tbl_dat <- tbl_dat %>%
mutate_if(is.factor,
fct_recode,
"DK" = "Don’t know")
Warning:
Warning in
mutate()
: ℹ In argument:var01 = (function (.f, ...) ...
. Caused by warning: ! Unknown levels inf
: Don’t know
How can I keep the factor levels (i.e. not change it to character and then back to factor) but do it across all variables without having to pre-select the variables that have that level? In an ideal case I don't want to ignore or suppress warnings but rather avoid them in the first place.
You can get around this by checking if there is a level in each factor variable first before recoding using if
and else
statements:
library(dplyr)
library(forcats)
tbl_dat %>%
mutate(across(where(is.factor), ~ if ("Don’t know" %in% levels(.)) {
fct_recode(., "DK" = "Don’t know")
} else {.}
))
# var01 var02
# <fct> <fct>
# 1 DK No answer
# 2 Cat 1 Cat 2
A general approach to replace all levels containing single and double curly quotes with straight quotes.
library(dplyr)
library(forcats)
tbl_dat |>
mutate(across(where(is.factor), \(v) fct_relabel(v, \(l) chartr("‘’“”", "''\"\"", l))))
# A tibble: 2 × 2
var01 var02
<fct> <fct>
1 Don't know No answer
2 Cat 1 Cat 2
tbl_dat[] <- lapply(tbl_dat, function(x) {levels(x)[levels(x) %in% c('DK', 'Don’t know')] <- 'Don\'t know'; x})
– rawr Commented Jan 7 at 2:47