I have this simple data frame,
var_1 <- c(10, 5, 6, 0)
var_2 <- c(0, 0, 3, 0)
var_3 <- c(2, 0, 9, 0)
df <- data.frame(var_1, var_2, var_3)
df
> df
var_1 var_2 var_3
1 10 0 2
2 5 0 0
3 6 3 9
4 0 0 0
I want to create a column gathering information from the columns named with var_
. In the first row, I have 10 under var_1 and 2 under var_3. So the entry in the resulting column should be `var_1 (10), var_3(2) and so on. The rows with all zeros should be recorded as NA. The resulting data frame should look like this following:
var_1 var_2 var_3 res
1 10 0 2 var_1 (10), var_3 (2)
2 5 0 0 var_1 (5)
3 6 3 9 Var_2 (3)
4 0 0 0 <NA>
I can somehow gather the variable names but cannot get the values with it. Any idea how to achieve that res
column?
I have this simple data frame,
var_1 <- c(10, 5, 6, 0)
var_2 <- c(0, 0, 3, 0)
var_3 <- c(2, 0, 9, 0)
df <- data.frame(var_1, var_2, var_3)
df
> df
var_1 var_2 var_3
1 10 0 2
2 5 0 0
3 6 3 9
4 0 0 0
I want to create a column gathering information from the columns named with var_
. In the first row, I have 10 under var_1 and 2 under var_3. So the entry in the resulting column should be `var_1 (10), var_3(2) and so on. The rows with all zeros should be recorded as NA. The resulting data frame should look like this following:
var_1 var_2 var_3 res
1 10 0 2 var_1 (10), var_3 (2)
2 5 0 0 var_1 (5)
3 6 3 9 Var_2 (3)
4 0 0 0 <NA>
I can somehow gather the variable names but cannot get the values with it. Any idea how to achieve that res
column?
res <- mapply(sprintf, names(df), df, fmt = "%s (%d)")
res[df == 0] <- NA
res <- apply(res, 1, \(x) paste(na.omit(x), collapse = ", "))
res[res == ""] <- NA
df$res <- res
df
# var_1 var_2 var_3 res
#1 10 0 2 var_1 (10), var_3 (2)
#2 5 0 0 var_1 (5)
#3 6 3 9 var_1 (6), var_2 (3), var_3 (9)
#4 0 0 0 <NA>
You can try
idx <- which(df > 0, TRUE)
v <- sprintf("%s (%s)", names(df)[idx[, "col"]], df[idx])
df$res <- c(by(v, factor(idx[, "row"], levels = 1:nrow(df)), toString))
which gives
> df
var_1 var_2 var_3 res
1 10 0 2 var_1 (10), var_3 (2)
2 5 0 0 var_1 (5)
3 6 3 9 var_1 (6), var_2 (3), var_3 (9)
4 0 0 0 <NA>
Apply the indicated anonymous function to each row. It sets y to the non-zero elements in the current row and if that has nonzero length returns the indicated sprintf
and otherwise NA.
If there can be other columns not to be included then replace the first argument of apply
with pick(starts_with("var_"))
.
library(dplyr)
df %>%
mutate(res = apply(., 1, function(x, y = x[x != 0])
if (length(y)) toString(sprintf("%s (%d)", names(y), y)) else NA
))
giving
var_1 var_2 var_3 res
1 10 0 2 var_1 (10), var_3 (2)
2 5 0 0 var_1 (5)
3 6 3 9 var_1 (6), var_2 (3), var_3 (9)
4 0 0 0 <NA>
Similar but different. Base R
transform(df, res = {
n = names(df)
apply(df, 1, \(x) {
i = x > 0
if(any(i)) toString(sprintf('%s (%i)', n[i], x[i])) else NA
})
})
var_1 var_2 var_3 res
1 10 0 2 var_1 (10), var_3 (2)
2 5 0 0 var_1 (5)
3 6 3 9 var_1 (6), var_2 (3), var_3 (9)
4 0 0 0 <NA>
Keep it as a list column, instead of converting to collapsed character column:
df$res <- apply(df[, grep("^var", colnames(df), value = TRUE) ], 1, function(i) i[ i != 0 ])
#df
# var_1 var_2 var_3 res
# 1 10 0 2 10, 2
# 2 5 0 0 5
# 3 6 3 9 6, 3, 9
# 4 0 0 0
df$res[[ 1 ]]
# var_1 var_3
# 10 2
sapply(df$res, "[", "var_1")
# var_1 var_1 var_1 <NA>
# 10 5 6 NA