math - how to compare lists with their frequency - Stack Overflow

admin2025-04-28  2

I'm interested in finding similarities between two lists. I have the count of duplicates in the first column, and the pattern is in the second column. What would be the most logical way to compare these two lists so I don't need to do it manually?

List1 is:
11 | 55
4  | 31
4  | 1
3  | 22
2  | 13
1  | 81```

List2 is:
7  | 31
6  | 22
6  | 13
4  | 88
3  | 14
1  | 55

I'm interested in finding similarities between two lists. I have the count of duplicates in the first column, and the pattern is in the second column. What would be the most logical way to compare these two lists so I don't need to do it manually?

List1 is:
11 | 55
4  | 31
4  | 1
3  | 22
2  | 13
1  | 81```

List2 is:
7  | 31
6  | 22
6  | 13
4  | 88
3  | 14
1  | 55
Share Improve this question asked Jan 8 at 13:41 rollTHERoadrollTHERoad 11 bronze badge 1
  • Sort them on the pattern column, then you can compare them in one pass to find where a pattern exists in one list and not the other, or where their counts differ. Is that the kind of thing you were meaning? – Simon Goater Commented Jan 8 at 14:06
Add a comment  | 

1 Answer 1

Reset to default 0

Firstly these can be better stored, when you come to search these lists for the value it will currently be O(n). The data can be better stored in the form of a Bag (a set which is a set that allows duplicates) of patterns, or a dictionary where the keys are the patterns. These are likely to be implemented with a binary tree or a hash table, leading to O(log n) or O(1) searches.

You will need to iterate over the patterns stored in both bags, a language agnostic way to achieve this would be to form a set (so you don't get duplicates) of all the patterns in both bags. However with many languages you may manage to avoid storing a new set by writing a custom iterator or generator with knowlege of both bags.

When it comes to interpret the results of the comparison, this will vary by how detailed an output you want, do you just want to know if there is any difference, or a count of total differences, or to know which patterns differ, or the total differences by pattern, and do you need to know which list has more/less of each pattern?

Assuming you just want the total differences the algorithm would be to:

  • loop over all the patters
  • look up the count for the 2 bags
  • find the difference of the 2 counts
  • add the absolute difference to a running total
转载请注明原文地址:http://anycun.com/QandA/1745853397a91248.html