r - Visualise success rates for multiple metrics - Stack Overflow

admin2025-05-02  1

I have 4 columns specifying whether the specific metric was fullfilled in each trial or not.

mydata <- data.frame(trial = c(1,2,3,4,5,6,7,...), # eg. up to 27 000
                     metricA = c('success', 'failed', 'failed', 'success',...), 
                     metricB = c('failed', 'success', 'success', 'success',...), 
                     metricC = c('failed', 'failed', 'success', 'failed',...),
                     metricD = c('success', 'success', 'failed', 'success',...),
                     )

The metric columns are as long as the trial column, so that for each trial it is known whether it failed or succeeded in each metric.

Now I would like to visualise how many trials were successful or failed for each metric and across metrics. I.e. 10% of trials that succeeded in metric a failed in metric C and so on. I want to visualise it with a Venn diagramm. This is the code I have produced:

mydata <- read.csv("trials-metrics.csv")

mA<-mydata$metricA 
mB<-mydata$metricB 
mC<-mydata$metricC 
mD<-mydata$metricD 

x <- list(
  A = mA, 
  B = mB, 
  C = mC,
  D = mD
)

ggVennDiagram(x, category.names = c("A","B","C","D"))

This produces the following plot.

Most likely, this type of Venn diagram only compares shared values between groups. Therefore, I assume I need to produce a unique value for each combination of metric outcomes. How can I implement this? Or am I missing something else?

I have found this similar entry, where the same Venn diagram was successfully produced with the same "True/False" type of dimeric data.

Making a Venn Diagram from a Dataframe

I am incredibly new to R, so the most parsimonious code solution woud be greatly appreciated.

I have 4 columns specifying whether the specific metric was fullfilled in each trial or not.

mydata <- data.frame(trial = c(1,2,3,4,5,6,7,...), # eg. up to 27 000
                     metricA = c('success', 'failed', 'failed', 'success',...), 
                     metricB = c('failed', 'success', 'success', 'success',...), 
                     metricC = c('failed', 'failed', 'success', 'failed',...),
                     metricD = c('success', 'success', 'failed', 'success',...),
                     )

The metric columns are as long as the trial column, so that for each trial it is known whether it failed or succeeded in each metric.

Now I would like to visualise how many trials were successful or failed for each metric and across metrics. I.e. 10% of trials that succeeded in metric a failed in metric C and so on. I want to visualise it with a Venn diagramm. This is the code I have produced:

mydata <- read.csv("trials-metrics.csv")

mA<-mydata$metricA 
mB<-mydata$metricB 
mC<-mydata$metricC 
mD<-mydata$metricD 

x <- list(
  A = mA, 
  B = mB, 
  C = mC,
  D = mD
)

ggVennDiagram(x, category.names = c("A","B","C","D"))

This produces the following plot.

Most likely, this type of Venn diagram only compares shared values between groups. Therefore, I assume I need to produce a unique value for each combination of metric outcomes. How can I implement this? Or am I missing something else?

I have found this similar entry, where the same Venn diagram was successfully produced with the same "True/False" type of dimeric data.

Making a Venn Diagram from a Dataframe

I am incredibly new to R, so the most parsimonious code solution woud be greatly appreciated.

Share Improve this question edited Jan 2 at 15:37 M-- 29.8k10 gold badges70 silver badges106 bronze badges asked Jan 2 at 11:52 CobalaminCobalamin 371 silver badge5 bronze badges 1
  • When there are too many combinations, consider using upsetPlot instead of Venn diagram r-graph-gallery.com/upset-plot.html – Yacine Hajji Commented Jan 2 at 12:37
Add a comment  | 

1 Answer 1

Reset to default 3

As you already guessed and as we are dealing with sets, we have to make the elements unique, e.g. you can use the trial column and filter for e.g. the successes per metric.

Using some random example data:

n <- 1000
set.seed(123)

mydata <- data.frame(
  trial = seq_len(n), # eg. up to 27 000
  metricA = sample(c("success", "failed"), n, replace = TRUE),
  metricB = sample(c("success", "failed"), n, replace = TRUE),
  metricC = sample(c("success", "failed"), n, replace = TRUE),
  metricD = sample(c("success", "failed"), n, replace = TRUE)
)

library(ggVennDiagram)

x <- list(
  A = mydata$trial[mydata$metricA == "success"],
  B = mydata$trial[mydata$metricB == "success"],
  C = mydata$trial[mydata$metricC == "success"],
  D = mydata$trial[mydata$metricD == "success"]
)
ggVennDiagram(x, category.names = LETTERS[1:4])

Or instead of creating the list manually you might consider using lapply:

xx <- lapply(mydata[-1], \(x) mydata$trial[x == "success"])
ggVennDiagram(xx, category.names = LETTERS[seq_along(xx)])

转载请注明原文地址:http://anycun.com/QandA/1746122264a91980.html