r - Replace multiple string values in dataframe with alternative strings - Stack Overflow

admin2025-04-22  2

I have a large dataframe df1 with string values in multiple columns:

df1 <- 
    data.frame(col1 = rep(c("A", "B", "C"),3),
               col2 = rep(c("C", "A", "B"),3),
               col3 = 1:9)

  col1 col2 col3
1    A    C    1
2    B    A    2
3    C    B    3
4    A    C    4
5    B    A    5
6    C    B    6
7    A    C    7
8    B    A    8
9    C    B    9

I want to replace some of the string values with alternate values.

I have a second dataframe df2 with the values to be changed col1 and the alternate values col2.

df2 <- data.frame(col1 = c("A", "B"), 
                  col2 = c("D", "E"))

  col1 col2
1    A    D
2    B    E

So in this example I want all instances of "A" and "B" appearing in df1 to be replaced with "D" and "E" respectively, as per df2.

The final output would look like:

  col1 col2 col3
1    D    C    1
2    E    D    2
3    C    E    3
4    D    C    4
5    E    D    5
6    C    E    6
7    D    C    7
8    E    D    8
9    C    E    9

I have tried using code using across and lapply but I am having trouble linking to the second dataframe, where normally I would use a join.

I have a large dataframe df1 with string values in multiple columns:

df1 <- 
    data.frame(col1 = rep(c("A", "B", "C"),3),
               col2 = rep(c("C", "A", "B"),3),
               col3 = 1:9)

  col1 col2 col3
1    A    C    1
2    B    A    2
3    C    B    3
4    A    C    4
5    B    A    5
6    C    B    6
7    A    C    7
8    B    A    8
9    C    B    9

I want to replace some of the string values with alternate values.

I have a second dataframe df2 with the values to be changed col1 and the alternate values col2.

df2 <- data.frame(col1 = c("A", "B"), 
                  col2 = c("D", "E"))

  col1 col2
1    A    D
2    B    E

So in this example I want all instances of "A" and "B" appearing in df1 to be replaced with "D" and "E" respectively, as per df2.

The final output would look like:

  col1 col2 col3
1    D    C    1
2    E    D    2
3    C    E    3
4    D    C    4
5    E    D    5
6    C    E    6
7    D    C    7
8    E    D    8
9    C    E    9

I have tried using code using across and lapply but I am having trouble linking to the second dataframe, where normally I would use a join.

Share Improve this question asked Jan 21 at 14:52 SeanSean 1856 bronze badges
Add a comment  | 

6 Answers 6

Reset to default 4

You could make use of the superseded function recode: This is one of the uses that I am unable to replicate using case_match/ case_when. It was easy to use. Note that you could reverse the df ie rev(df2) and use fct_recode which is not superseded yet.

df1 %>% 
   mutate(across(col1:col2, ~recode(.x, !!!deframe(df2))))

  col1 col2 col3
1    D    C    1
2    E    D    2
3    C    E    3
4    D    C    4
5    E    D    5
6    C    E    6
7    D    C    7
8    E    D    8
9    C    E    9

NB: If any knows how to replicate the same using case_match/case_when please go ahead add the solution


you could also use str_replace_all Though It works in this scenario, it is not advisable since it might end up replacing portions of the strings instead of the whole string:

df1 %>% 
  mutate(across(col1:col2, ~str_replace_all(.x, deframe(df2))))

  col1 col2 col3
1    D    C    1
2    E    D    2
3    C    E    3
4    D    C    4
5    E    D    5
6    C    E    6
7    D    C    7
8    E    D    8
9    C    E    9

This is easy with package data.table:

library(data.table)
setDT(df1)
setDT(df2)

#reshape to long format
df1 <- melt(df1, id.vars = "col3")

#update join
df1[df2, value := i.col2, on = c("value==col1")]

#reshape to wide format
dcast(df1, col3 ~ variable)

#Key: <col3>
#    col3   col1   col2
#   <int> <char> <char>
#1:     1      D      C
#2:     2      E      D
#3:     3      C      E
#4:     4      D      C
#5:     5      E      D
#6:     6      C      E
#7:     7      D      C
#8:     8      E      D
#9:     9      C      E

This is one of those things that's surprising it doesn't have a nice helper function, but we can write one simply:

find_replace = function(x, find, replace) {
  stopifnot(length(find) == length(replace))
  for(i in seq_along(find)) {
    x[x == find[i]] = replace[i]
  }
  x
}

df1 |>
  mutate(across(c(col1, col2), \(x) 
                find_replace(x, find = df2$col1, replace = df2$col2)
        ))
#   col1 col2 col3
# 1    D    C    1
# 2    E    D    2
# 3    C    E    3
# 4    D    C    4
# 5    E    D    5
# 6    C    E    6
# 7    D    C    7
# 8    E    D    8
# 9    C    E    9

You can try "named vectors" as dictionary for lookup, and then coalesce

dict <- with(df2, setNames(col2, col1))
df1 %>%
  mutate(across(1:2, ~ coalesce(dict[.x], .x)))

which gives

  col1 col2 col3
1    D    C    1
2    E    D    2
3    C    E    3
4    D    C    4
5    E    D    5
6    C    E    6
7    D    C    7
8    E    D    8
9    C    E    9

I would use unlist() + match() + [ and stay with base. You might want to generalise this.

v0.1

df1[c("col1", "col2")] = local({
  U = unlist(df1[c("col1", "col2")], use.names = FALSE)
  i = match(U, df2$col1, nomatch = 9999L) # 9999L := Placeholder
  j = i == 9999L
  M = df2$col2[i]
  M[j] = U[j]
  M 
})
> df1
1    D    C    1
2    E    D    2
3    C    E    3
4    D    C    4
5    E    D    5
6    C    E    6
7    D    C    7
8    E    D    8
9    C    E    9

local({ .. }) is convenient if we neither want to create a custom function nor clutter the environment with a lot of variables, which are used only once.

For a small number of replacement options, you could use the vectorized case_when() function from the dplyr package:

df1$col1 <- case_when(
    df1$col1 == "A" ~ "D",   # A -> D
    df1$col1 == "B" ~ "E",   # B -> E
    TRUE ~ df1$col1          # otherwise don't change the value
)

# same logic for df1$col2
转载请注明原文地址:http://anycun.com/QandA/1745302393a90595.html