I have a matrix of raw data, we can call it Matrix supermarketproducts with several details of supermarket products, such as the name, brand and description for example. I used the readxl function to import the raw data from an excel sheet.
I want to create a list of these products, but broken down into their respective properties or functions.
For example, i tried to create an empty list H of product properties such as "Heavy Duty", "Floral", "Quick-drying" and etc.
If an item from the matrix, let's say Detergent A, has any mention of the word floral in its description i want this exact 'Detergent A' name to be copied into the "Floral" sublist in my empty list H.
I faced two issues. Firstly, I am not sure how to only create a for loop that passes through only the third column, the 'description' in my matrix for my case only. Secondly, even by running my for loop through the entire matrix, i am unable to copy the words from the matrix into my new H list properly.
I am a totally new beginner in R so please bear with me, and appreciate any help. Thanks
H=list('Heavy-duty'=character(),
'Floral'=character(),
'Quick-drying'=character()
)
for (i in 1:dim(supermarketproducts)[1]) {
for (j in 1:dim(supermarketproducts)[2]) {
if (supermarketproducts[i,j]=='Floral'){
H[i] <- supermarketproducts[i,j]
}
}
}
As queried, this is the dput of my supermarketproducts.
structure(list(Name = c("Scentclean", "Fluent", "Detergentwash",
"Washtime", "Simplysuds", "Surftide"), Brand = c("X", "Y", "Z",
"A", "Brand", "C"), Description = c("Say goodbye to tough stains and hello to fresh, clean laundry",
"There are two things that let you know your clothes are clean: they smell good and they look good, with a nice floral scent",
"Thanks to a formula that dissolves completely every time, clothes come out looking bright and fresh, even when washed in cold water.",
"Combining concentrated detergents, powerful stain removers and color protectors into one convenient laundry pac, providing you the ease of drop-in-and-done. Along with the refreshing, invigorating scent of Spring Meadow.",
"Say goodbye to tough stains and hello to fresh, clean laundry for your delicates",
"Containing 10 concentrated cleaning actives, the heavy-duty cleaning agent gets between fibers to clean hidden dirt you didn’t even know was there. Available in Tide’s beloved Original scent that infuses your laundry with floral and fruity notes."
)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-6L))
Ultimately, i hope to obtain a H list as such.
[Heavy-duty] "Surftide"
[Floral] "Surftide" "Fluent"
[Quick-drying]
I have a matrix of raw data, we can call it Matrix supermarketproducts with several details of supermarket products, such as the name, brand and description for example. I used the readxl function to import the raw data from an excel sheet.
I want to create a list of these products, but broken down into their respective properties or functions.
For example, i tried to create an empty list H of product properties such as "Heavy Duty", "Floral", "Quick-drying" and etc.
If an item from the matrix, let's say Detergent A, has any mention of the word floral in its description i want this exact 'Detergent A' name to be copied into the "Floral" sublist in my empty list H.
I faced two issues. Firstly, I am not sure how to only create a for loop that passes through only the third column, the 'description' in my matrix for my case only. Secondly, even by running my for loop through the entire matrix, i am unable to copy the words from the matrix into my new H list properly.
I am a totally new beginner in R so please bear with me, and appreciate any help. Thanks
H=list('Heavy-duty'=character(),
'Floral'=character(),
'Quick-drying'=character()
)
for (i in 1:dim(supermarketproducts)[1]) {
for (j in 1:dim(supermarketproducts)[2]) {
if (supermarketproducts[i,j]=='Floral'){
H[i] <- supermarketproducts[i,j]
}
}
}
As queried, this is the dput of my supermarketproducts.
structure(list(Name = c("Scentclean", "Fluent", "Detergentwash",
"Washtime", "Simplysuds", "Surftide"), Brand = c("X", "Y", "Z",
"A", "Brand", "C"), Description = c("Say goodbye to tough stains and hello to fresh, clean laundry",
"There are two things that let you know your clothes are clean: they smell good and they look good, with a nice floral scent",
"Thanks to a formula that dissolves completely every time, clothes come out looking bright and fresh, even when washed in cold water.",
"Combining concentrated detergents, powerful stain removers and color protectors into one convenient laundry pac, providing you the ease of drop-in-and-done. Along with the refreshing, invigorating scent of Spring Meadow.",
"Say goodbye to tough stains and hello to fresh, clean laundry for your delicates",
"Containing 10 concentrated cleaning actives, the heavy-duty cleaning agent gets between fibers to clean hidden dirt you didn’t even know was there. Available in Tide’s beloved Original scent that infuses your laundry with floral and fruity notes."
)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-6L))
Ultimately, i hope to obtain a H list as such.
[Heavy-duty] "Surftide"
[Floral] "Surftide" "Fluent"
[Quick-drying]
This will give you your expected output, based in your test data, assuming the test data is in an object named df
.
library(tidyverse)
h <- c("Heavy-duty", "Floral", "Quick-drying")
answer <- lapply(
h,
function(term) {
df %>%
filter(str_detect(Description, fixed(term, ignore_case = TRUE))) %>%
pull(Name)
}
)
names(answer) <- h
answer
$`Heavy-duty`
[1] "Surftide"
$Floral
[1] "Fluent" "Surftide"
$`Quick-drying`
character(0)
You need the ignore_case = TRUE
because the strings you need to detect do not match the case used in your Description
s.
R is a vectorised language (meaning that it is designed to work with objects of length greater than one by default). That means "if I'm thinking of using a for
loop, there's probably a better way" is a useful maxim. That's the case here. We can examine every element of Description
in a single function call. We loop over the various search terms using lapply
.
I believe that lapply
usually has many advantages over for
loops, not least that it uses forced rather than lazy evaluation and it removes the need for pre-initialisation of the return value.
There are plenty of other Q&As on SO that go into more detail about the differences between lapply
(and its siblings) and for
.
I do not really understand why you are calling your supermarket
a matrix
, since you have probably read supermarket
with a read-function from {tidyverse}
. Hence, the class
is
> class(supermarket)
[1] "tbl_df" "tbl" "data.frame"
A whole {tdiyverse}
is not needed for the reading. That said, it seems you do not try to use {tidyverse}
to solve what you are after. Here is how your approach can be realised in base:
lapply(setNames(c("Heavy-duty", "Floral", "Quick-drying"), c("Heavy-duty", "Floral", "Quick-drying")),
\(x) supermarket$Name[grep(x, supermarket$Description, ignore.case = TRUE)])
giving
$`Heavy-duty`
[1] "Surftide"
$Floral
[1] "Fluent" "Surftide"
$`Quick-drying`
character(0)
The
setNames(c("Heavy-duty", "Floral", "Quick-drying"), c("Heavy-duty", "Floral", "Quick-drying"))
is just a trick to avoid cluttering our environment with variables we do not need more than once. If this is different we can change to
H = c("Heavy-duty", "Floral", "Quick-drying")
names(H) = H
# lapply(H, \(x) .. )
A possible solution is using outer
+ grepl
(but it is not as time/memory-wisely efficient @Fride's or @Limey's solutions)
with(
supermarket,
`dimnames<-`(
outer(H, Description, Vectorize(grepl), ignore.case = TRUE),
list(H, Name)
)
)
which gives
Scentclean Fluent Detergentwash Washtime Simplysuds Surftide
Heavy-duty FALSE FALSE FALSE FALSE FALSE TRUE
Floral FALSE TRUE FALSE FALSE FALSE TRUE
Quick-drying FALSE FALSE FALSE FALSE FALSE FALSE
Another implementation, but similar to other existing solutions, is
lapply(
setNames(H, H),
\(v) with(supermarket, Name[grepl(v, Description, ignore.case = TRUE)])
)
gives
$`Heavy-duty`
[1] "Surftide"
$Floral
[1] "Fluent" "Surftide"
$`Quick-drying`
character(0)
supermarket <- structure(list(Name = c(
"Scentclean", "Fluent", "Detergentwash",
"Washtime", "Simplysuds", "Surftide"
), Brand = c(
"X", "Y", "Z",
"A", "Brand", "C"
), Description = c(
"Say goodbye to tough stains and hello to fresh, clean laundry",
"There are two things that let you know your clothes are clean: they smell good and they look good, with a nice floral scent",
"Thanks to a formula that dissolves completely every time, clothes come out looking bright and fresh, even when washed in cold water.",
"Combining concentrated detergents, powerful stain removers and color protectors into one convenient laundry pac, providing you the ease of drop-in-and-done. Along with the refreshing, invigorating scent of Spring Meadow.",
"Say goodbye to tough stains and hello to fresh, clean laundry for your delicates",
"Containing 10 concentrated cleaning actives, the heavy-duty cleaning agent gets between fibers to clean hidden dirt you didn’t even know was there. Available in Tide’s beloved Original scent that infuses your laundry with floral and fruity notes."
)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(
NA,
-6L
))
H <- c("Heavy-duty", "Floral", "Quick-drying")
readxl
creates a tibble (a data.frame). Please give us a sample of your data by providing the output ofdput(supermarketproducts)
. If that is too large, give usdput(head(supermarketproducts))
. Then, show us the expected output because that is unclear from your description. I suspect thesplit
function could be useful to you. – Roland Commented Jan 31 at 7:28