r - Ignoring NA cases when getting column index of lowest value in row - Stack Overflow

admin2025-04-17  2

I have an array of retail item prices, and am trying to create a new column that indicates which store has the lowest price.

  • The rows are the items, while the columns are the retailer names.

  • Not all of the retailers carry each product, so I need to ignore cells where the value is NA (instead of treating NA values as the lowest-cost item).

Setting everything up, I have

library(dplyr)

# Data

itemName <- c('Apples', 'Bananas', 'Milk', 'Eggs', 'Cereal')
Aldi <- c(NA, 0.55, 1.95, 157.00, 1.10)
Walmart <- c(0.75, 0.69, 2.15, 192.00, 1.10)
Costco <- c(NA, NA, 0.16, 75.00, NA)

Creating the df with a column that identifies the lowest price

retailPRICE <- data.frame(itemName, Aldi, Walmart, Costco)%>%
mutate(cheapestPRICE = pmin(Aldi, Walmart, Costco, na.rm = TRUE))

This is where I run into trouble

I'm trying to create a column that looks at all the prices, and pulls in the name of the store with the cheapest price.

retailPRICE$cheapestSTORE <- names(retailPRICE)
 [minCol(replace(retailPRICE, is.na(retailPRICE), 0), 
 ties.method = "first")]

What I want

| itemName | Aldi   | Walmart  |  Costco  | cheapestPRICE | cheapestSTORE |
| -------- | ------ | -------- | -------- | -------------   --------------
| Apples   | NA     | 0.75     | NA       | 0.75          | Walmart       |
| Bananas  | 0.55   | 0.69     | NA       | 0.55          | Aldi          |
| Milk     | 1.95   | 2.15     | 0.16     | 0.16          | Costco        |
| Eggs     | 157.00 | 192.00   | 75.00    | 75.00         | Costco        |
| Cereal   | 1.10   | 1.10     | NA       | 1.10          | Aldi          |

Problem the first

Attempting to create the cheapestSTORE column gives the following error: Error in FUN(left) : invalid argument to unary operator

  • Why? Is this because itemName is a character column?
  • Previous iterations of this code did not include the itemName column and I did not have this issue (don't worry, I had plenty of others).
  • I do need to keep the itemName to do eventually do a join, once I get this mess sorted out

Problem the second

Before I ran into the Error in FUN issue, I had a problem with how minCol was identifying the lowest value column. It was seeing NAs as the lowest value, so it would pull in the name of that store. I need to be able to exclude any NA entries from the assessment of which column is being indexed.

An example of what I was getting

| itemName | Aldi   | Walmart  |  Costco  | cheapestPRICE | cheapestSTORE |
| -------- | ------ | -------- | -------- | -------------   --------------
| Apples   | NA     | 0.75     | NA       | 0.75          | Aldi          |
| Bananas  | 0.55   | 0.69     | NA       | 0.55          | Costco        |
| Milk     | 1.95   | 2.15     | 0.16     | 0.16          | Costco        |
| Eggs     | 157.00 | 192.00   | 75.00    | 75.00         | Costco        |
| Cereal   | 1.10   | 1.10     | NA       | 1.10          | Costco        |

As you can see, the only row that is being evaluated correctly is Eggs. All of the other items are out of stock in at least one store, which is causing the cheapestSTORE column to identify wrong location as being the cheapest.

Can you please help me figure out what I should be doing when creating the cheapestSTORE column? Thank you!!

I have an array of retail item prices, and am trying to create a new column that indicates which store has the lowest price.

  • The rows are the items, while the columns are the retailer names.

  • Not all of the retailers carry each product, so I need to ignore cells where the value is NA (instead of treating NA values as the lowest-cost item).

Setting everything up, I have

library(dplyr)

# Data

itemName <- c('Apples', 'Bananas', 'Milk', 'Eggs', 'Cereal')
Aldi <- c(NA, 0.55, 1.95, 157.00, 1.10)
Walmart <- c(0.75, 0.69, 2.15, 192.00, 1.10)
Costco <- c(NA, NA, 0.16, 75.00, NA)

Creating the df with a column that identifies the lowest price

retailPRICE <- data.frame(itemName, Aldi, Walmart, Costco)%>%
mutate(cheapestPRICE = pmin(Aldi, Walmart, Costco, na.rm = TRUE))

This is where I run into trouble

I'm trying to create a column that looks at all the prices, and pulls in the name of the store with the cheapest price.

retailPRICE$cheapestSTORE <- names(retailPRICE)
 [minCol(replace(retailPRICE, is.na(retailPRICE), 0), 
 ties.method = "first")]

What I want

| itemName | Aldi   | Walmart  |  Costco  | cheapestPRICE | cheapestSTORE |
| -------- | ------ | -------- | -------- | -------------   --------------
| Apples   | NA     | 0.75     | NA       | 0.75          | Walmart       |
| Bananas  | 0.55   | 0.69     | NA       | 0.55          | Aldi          |
| Milk     | 1.95   | 2.15     | 0.16     | 0.16          | Costco        |
| Eggs     | 157.00 | 192.00   | 75.00    | 75.00         | Costco        |
| Cereal   | 1.10   | 1.10     | NA       | 1.10          | Aldi          |

Problem the first

Attempting to create the cheapestSTORE column gives the following error: Error in FUN(left) : invalid argument to unary operator

  • Why? Is this because itemName is a character column?
  • Previous iterations of this code did not include the itemName column and I did not have this issue (don't worry, I had plenty of others).
  • I do need to keep the itemName to do eventually do a join, once I get this mess sorted out

Problem the second

Before I ran into the Error in FUN issue, I had a problem with how minCol was identifying the lowest value column. It was seeing NAs as the lowest value, so it would pull in the name of that store. I need to be able to exclude any NA entries from the assessment of which column is being indexed.

An example of what I was getting

| itemName | Aldi   | Walmart  |  Costco  | cheapestPRICE | cheapestSTORE |
| -------- | ------ | -------- | -------- | -------------   --------------
| Apples   | NA     | 0.75     | NA       | 0.75          | Aldi          |
| Bananas  | 0.55   | 0.69     | NA       | 0.55          | Costco        |
| Milk     | 1.95   | 2.15     | 0.16     | 0.16          | Costco        |
| Eggs     | 157.00 | 192.00   | 75.00    | 75.00         | Costco        |
| Cereal   | 1.10   | 1.10     | NA       | 1.10          | Costco        |

As you can see, the only row that is being evaluated correctly is Eggs. All of the other items are out of stock in at least one store, which is causing the cheapestSTORE column to identify wrong location as being the cheapest.

Can you please help me figure out what I should be doing when creating the cheapestSTORE column? Thank you!!

Share Improve this question asked Jan 31 at 17:22 Amanda BallengerAmanda Ballenger 234 bronze badges 2
  • 3 You have ties for Cereal, do you expect to get just one cheapest store for this or both? – margusl Commented Jan 31 at 17:42
  • Future me would love if this would return both values, as it would help us consolidate our purchase orders to a smaller number of vendors. Current me needs to have this done in the next hour, so I'll come back to edit that later if you have a suggestion! :) – Amanda Ballenger Commented Jan 31 at 19:11
Add a comment  | 

6 Answers 6

Reset to default 2

Because your data has prices in multiple columns, it is not tidy in the sense of Wickham 2014. I would suggest you reshape your data to long format first, so that each column represent a single variable: product, store and price.

library(dplyr); library(tidyr)

retailPRICE <- data.frame(itemName, Aldi, Walmart, Costco)

retailPRICElong <- retailPRICE |> 
  pivot_longer(-itemName, names_to = 'Store', values_to = 'Price') |> 
  drop_na()

We simply drop all the not-in-stock items by dropping NA values.

Then these operations become much easier, and lead to code that is simple and easy to read:

summarise(
  retailPRICElong,
  cheapestPrice = min(Price),
  cheapestStore = Store[which.min(Price)],   # in the case of ties, first one
  .by = itemName
)
# A tibble: 5 × 3
  itemName cheapestPrice cheapestStore
  <chr>            <dbl> <chr>        
1 Apples            0.75 Walmart      
2 Bananas           0.55 Aldi         
3 Milk              0.16 Costco       
4 Eggs             75    Costco       
5 Cereal            1.1  Aldi

For this case in particular, all you would need is:

slice_min(retailPRICElong, Price, by = itemName)

which also includes ties by default

# A tibble: 6 × 3
  itemName Store   Price
  <chr>    <chr>   <dbl>
1 Apples   Walmart  0.75
2 Bananas  Aldi     0.55
3 Milk     Costco   0.16
4 Eggs     Costco  75   
5 Cereal   Aldi     1.1 
6 Cereal   Walmart  1.1

In base R you can use apply() and which.min() here:

retailPRICE$cheapestSTORE <- apply(retailPRICE[, c("Aldi", "Walmart", "Costco")], 1, \(x) {
  if (all(is.na(x))) NA
  else names(x)[which.min(x)]  
})

#   itemName   Aldi Walmart Costco cheapestPRICE cheapestSTORE
# 1   Apples     NA    0.75     NA          0.75       Walmart
# 2  Bananas   0.55    0.69     NA          0.55          Aldi
# 3     Milk   1.95    2.15   0.16          0.16        Costco
# 4     Eggs 157.00  192.00  75.00         75.00        Costco
# 5   Cereal   1.10    1.10     NA          1.10          Aldi

Though this matches your desired outcome, as others mentioned ties may be an issue. If you wanted to concatenate all stores with the lowest price, you could do:

retailPRICE$cheapestSTORE <- apply(retailPRICE[, c("Aldi", "Walmart", "Costco")], 1, function(x) {
  if (all(is.na(x))) {NA}
  else{
    paste(names(x)[x %in% min(x, na.rm = TRUE)], collapse = ", ")
    }
})

#   itemName   Aldi Walmart Costco cheapestPRICE cheapestSTORE
# 1   Apples     NA    0.75     NA          0.75       Walmart
# 2  Bananas   0.55    0.69     NA          0.55          Aldi
# 3     Milk   1.95    2.15   0.16          0.16        Costco
# 4     Eggs 157.00  192.00  75.00         75.00        Costco
# 5   Cereal   1.10    1.10     NA          1.10 Aldi, Walmart

If you like to stick with current presentation of your data, do Rfast::rowMins():

i = Rfast::rowMins(as.matrix(DF[-1]))
DF$cheapestPrice = DF[-1][cbind(seq(nrow(DF)), i)] 
DF$cheapestStore = names(DF[-1])[i]
  itemName   Aldi Walmart Costco cheapestPrice cheapestStore
1   Apples     NA    0.75     NA          0.75       Walmart
2  Bananas   0.55    0.69     NA          0.55          Aldi
3     Milk   1.95    2.15   0.16          0.16        Costco
4     Eggs 157.00  192.00  75.00         75.00        Costco
5   Cereal   1.10    1.10     NA          1.10          Aldi

This assumes that you are okay with selecting only the first minimum when there are several, which aligns with the currently desired output.

Data

itemName <- c('Apples', 'Bananas', 'Milk', 'Eggs', 'Cereal')
Aldi <- c(NA, 0.55, 1.95, 157.00, 1.10)
Walmart <- c(0.75, 0.69, 2.15, 192.00, 1.10)
Costco <- c(NA, NA, 0.16, 75.00, NA)
DF = data.frame(itemName, Aldi, Walmart, Costco)

Try pmin + max.col like below

df %>%
    mutate(
        cheapestPrice = pmin(Aldi, Costco, Walmart, na.rm = TRUE),
        cheapestStore = names(.[-1])[max.col(-replace_na(.[-1], Inf), ties.method = "first")]
    )

which gives

  itemName   Aldi Walmart Costco cheapestPrice cheapestStore
1   Apples     NA    0.75     NA          0.75       Walmart
2  Bananas   0.55    0.69     NA          0.55          Aldi
3     Milk   1.95    2.15   0.16          0.16        Costco
4     Eggs 157.00  192.00  75.00         75.00        Costco
5   Cereal   1.10    1.10     NA          1.10          Aldi
retailPRICE$cheapestSTORE = colnames(retailPRICE)[apply(retailPRICE, 1, which.min)]

This should do it. You will get some warnings due to the names and missing values but would get the result you want.

Try this oneliner that replaces NA with Inf and then uses max.col on the negative values.

> names(retailPRICE)[2:4][-replace(retailPRICE[2:4], is.na(retailPRICE[2:4]), Inf) |> max.col()]
[1] "Walmart" "Aldi"    "Costco"  "Costco"  "Aldi" 

Note, that max.col uses random ties method as default, but you may also set it to e.g. 'first', see ?max.col.

Also note that this will fail if no store stocks the respective product and only a random store is selected. In this case, replace the values with NA.

retailPRICE$cheapestPRICE[rowSums(is.na(retailPRICE[2:4])) == 3] <- NA
转载请注明原文地址:http://anycun.com/QandA/1744854506a88556.html