r - Recode Factor if Level is Available - Stack Overflow

admin2025-04-29  0

I have a dataset that has been created by a survey field house and they introduced curly apostrophes for some labels. When I import it with the haven package, those curly apostrophes stay on. I would like to replace them all to straight apostrophes. In an ideal case, this would be done for all of those apostrophes, but if I do that it changes the order of all levels, which is something I want to avoid.

So as a second solution, I want to change all the factors such as "Don’t know" to "Don't know" (or for ease of use right now to "DK").

When I am using mutate_if, I get a warning message when that level doesn't exist (which is fair enough).

tbl_dat <- tibble(var01 = factor(c("Don’t know", "Cat 1")),
              var02 = factor(c("No answer", "Cat 2")))

tbl_dat <- tbl_dat %>% 
  mutate_if(is.factor,
            fct_recode,
            "DK" = "Don’t know")

Warning:

Warning in mutate(): ℹ In argument: var01 = (function (.f, ...) .... Caused by warning: ! Unknown levels in f: Don’t know

How can I keep the factor levels (i.e. not change it to character and then back to factor) but do it across all variables without having to pre-select the variables that have that level? In an ideal case I don't want to ignore or suppress warnings but rather avoid them in the first place.

I have a dataset that has been created by a survey field house and they introduced curly apostrophes for some labels. When I import it with the haven package, those curly apostrophes stay on. I would like to replace them all to straight apostrophes. In an ideal case, this would be done for all of those apostrophes, but if I do that it changes the order of all levels, which is something I want to avoid.

So as a second solution, I want to change all the factors such as "Don’t know" to "Don't know" (or for ease of use right now to "DK").

When I am using mutate_if, I get a warning message when that level doesn't exist (which is fair enough).

tbl_dat <- tibble(var01 = factor(c("Don’t know", "Cat 1")),
              var02 = factor(c("No answer", "Cat 2")))

tbl_dat <- tbl_dat %>% 
  mutate_if(is.factor,
            fct_recode,
            "DK" = "Don’t know")

Warning:

Warning in mutate(): ℹ In argument: var01 = (function (.f, ...) .... Caused by warning: ! Unknown levels in f: Don’t know

How can I keep the factor levels (i.e. not change it to character and then back to factor) but do it across all variables without having to pre-select the variables that have that level? In an ideal case I don't want to ignore or suppress warnings but rather avoid them in the first place.

Share Improve this question edited Jan 7 at 1:36 jpsmith 18.1k6 gold badges23 silver badges45 bronze badges asked Jan 7 at 1:32 C MoreC More 311 silver badge2 bronze badges 1
  • avoid refactoring, that is unnecessary. just change the labels tbl_dat[] <- lapply(tbl_dat, function(x) {levels(x)[levels(x) %in% c('DK', 'Don’t know')] <- 'Don\'t know'; x}) – rawr Commented Jan 7 at 2:47
Add a comment  | 

2 Answers 2

Reset to default 1

You can get around this by checking if there is a level in each factor variable first before recoding using if and else statements:

library(dplyr)
library(forcats)

tbl_dat %>%
  mutate(across(where(is.factor), ~ if ("Don’t know" %in% levels(.)) {
    fct_recode(., "DK" = "Don’t know")
  } else {.}
  ))

#   var01 var02    
#   <fct> <fct>    
# 1 DK    No answer
# 2 Cat 1 Cat 2  

A general approach to replace all levels containing single and double curly quotes with straight quotes.

library(dplyr)
library(forcats)

tbl_dat |> 
  mutate(across(where(is.factor), \(v) fct_relabel(v, \(l) chartr("‘’“”", "''\"\"", l))))

# A tibble: 2 × 2
  var01      var02    
  <fct>      <fct>    
1 Don't know No answer
2 Cat 1      Cat 2  
转载请注明原文地址:http://anycun.com/QandA/1745935315a91341.html