r - Unexpected output of dplyr::top_n - Stack Overflow

admin2025-04-22  1

This is the expected output of dplyr::top_n!

To select Top 2

> mtcars %>% dplyr::arrange(desc(mpg)) %>% dplyr::top_n(2, mpg)

                mpg cyl disp hp drat    wt  qsec vs am gear carb
Toyota Corolla 33.9   4 71.1 65 4.22 1.835 19.90  1  1    4    1
Fiat 128       32.4   4 78.7 66 4.08 2.200 19.47  1  1    4    1

To select Top 3

> mtcars %>% dplyr::arrange(desc(mpg)) %>% dplyr::top_n(3, mpg)
                mpg cyl disp  hp drat    wt  qsec vs am gear carb
Toyota Corolla 33.9   4 71.1  65 4.22 1.835 19.90  1  1    4    1
Fiat 128       32.4   4 78.7  66 4.08 2.200 19.47  1  1    4    1
Honda Civic    30.4   4 75.7  52 4.93 1.615 18.52  1  1    4    2
Lotus Europa   30.4   4 95.1 113 3.77 1.513 16.90  1  1    5    2

But why is that, when I select Top 4 ??

> mtcars %>% dplyr::arrange(desc(mpg)) %>% dplyr::top_n(4, mpg)
                mpg cyl disp  hp drat    wt  qsec vs am gear carb
Toyota Corolla 33.9   4 71.1  65 4.22 1.835 19.90  1  1    4    1
Fiat 128       32.4   4 78.7  66 4.08 2.200 19.47  1  1    4    1
Honda Civic    30.4   4 75.7  52 4.93 1.615 18.52  1  1    4    2
Lotus Europa   30.4   4 95.1 113 3.77 1.513 16.90  1  1    5    2

I expected this

                mpg cyl disp  hp drat    wt  qsec vs am gear carb
Toyota Corolla 33.9   4 71.1  65 4.22 1.835 19.90  1  1    4    1
Fiat 128       32.4   4 78.7  66 4.08 2.200 19.47  1  1    4    1
Honda Civic    30.4   4 75.7  52 4.93 1.615 18.52  1  1    4    2
Lotus Europa   30.4   4 95.1 113 3.77 1.513 16.90  1  1    5    2
Fiat X1-9      27.3   4 79.0  66 4.08 1.935 18.90  1  1    4    1

Can anybody please explain, what I am missing?

This is the expected output of dplyr::top_n!

To select Top 2

> mtcars %>% dplyr::arrange(desc(mpg)) %>% dplyr::top_n(2, mpg)

                mpg cyl disp hp drat    wt  qsec vs am gear carb
Toyota Corolla 33.9   4 71.1 65 4.22 1.835 19.90  1  1    4    1
Fiat 128       32.4   4 78.7 66 4.08 2.200 19.47  1  1    4    1

To select Top 3

> mtcars %>% dplyr::arrange(desc(mpg)) %>% dplyr::top_n(3, mpg)
                mpg cyl disp  hp drat    wt  qsec vs am gear carb
Toyota Corolla 33.9   4 71.1  65 4.22 1.835 19.90  1  1    4    1
Fiat 128       32.4   4 78.7  66 4.08 2.200 19.47  1  1    4    1
Honda Civic    30.4   4 75.7  52 4.93 1.615 18.52  1  1    4    2
Lotus Europa   30.4   4 95.1 113 3.77 1.513 16.90  1  1    5    2

But why is that, when I select Top 4 ??

> mtcars %>% dplyr::arrange(desc(mpg)) %>% dplyr::top_n(4, mpg)
                mpg cyl disp  hp drat    wt  qsec vs am gear carb
Toyota Corolla 33.9   4 71.1  65 4.22 1.835 19.90  1  1    4    1
Fiat 128       32.4   4 78.7  66 4.08 2.200 19.47  1  1    4    1
Honda Civic    30.4   4 75.7  52 4.93 1.615 18.52  1  1    4    2
Lotus Europa   30.4   4 95.1 113 3.77 1.513 16.90  1  1    5    2

I expected this

                mpg cyl disp  hp drat    wt  qsec vs am gear carb
Toyota Corolla 33.9   4 71.1  65 4.22 1.835 19.90  1  1    4    1
Fiat 128       32.4   4 78.7  66 4.08 2.200 19.47  1  1    4    1
Honda Civic    30.4   4 75.7  52 4.93 1.615 18.52  1  1    4    2
Lotus Europa   30.4   4 95.1 113 3.77 1.513 16.90  1  1    5    2
Fiat X1-9      27.3   4 79.0  66 4.08 1.935 18.90  1  1    4    1

Can anybody please explain, what I am missing?

Share Improve this question edited Jan 22 at 15:57 Adrian Mole 52k192 gold badges59 silver badges98 bronze badges asked Jan 21 at 15:23 Mohammad Tanvir AhamedMohammad Tanvir Ahamed 977 bronze badges 5
  • 3 This function is superseded, use slice_max instead, it has methods for ties - dplyr.tidyverse.org/reference/slice.html – zx8754 Commented Jan 21 at 15:27
  • thanks @zx8754, about superseded, what it said (lifecycle.r-lib.org/articles/stages.html#superseded) "A superseded function will not emit a warning (since there’s no risk if you keep using it)" – Mohammad Tanvir Ahamed Commented Jan 21 at 15:32
  • thanks @Onyambu. If I understand properly, the words "naturally or densely", are subjective. I was expecting the expected numerical outcome. – Mohammad Tanvir Ahamed Commented Jan 21 at 15:36
  • 2 top_n(4) youd expect 4. And that is what the code gives you. eg when two olympians share a gold, the next closest gets a bronze ie no silver. so we have 1,1,3 not 1,1,2 – Onyambu Commented Jan 21 at 15:47
  • 1 Try mtcars %>% left_join(distinct(., mpg) %>% top_n(4), ., by = "mpg") – G. Grothendieck Commented Jan 21 at 15:50
Add a comment  | 

1 Answer 1

Reset to default 5

top_n is superseded and should not be used, use slice_max instead.

That said, slice_max(mtcars, mpg, n = 4) will give the same result than top_n(mtcars, mpg, n = 3). This is because, under the hood, they use dplyr::min_rank to calculate ranks. slice_max(mtcars, mpg, n = 4) is equivalent to mtcars %>% filter(min_rank(desc(mpg)) <= 4).

min_rank handles ties like so (see ?min_rank):

min_rank() gives every tie the same (smallest) value so that c(10, 20, 20, 30) gets ranks c(1, 2, 2, 4). It's the way that ranks are usually computed in sports and is equivalent to rank(ties.method = "min").

In your case of n = 4, the prompt returns 4 rows, because that's what it should return. min_rank(desc(c(33.9, 32.4, 30.4, 30.4, 27.3))) returns 1 2 3 3 5, hence the fifth observation is indeed <= 4.


How to get the wanted result? You can use dense_rank to do so, which has another way of evaluating ties by removing integer gaps between ranks.

mtcars %>% filter(dense_rank(desc(mpg)) <= 4)

#                 mpg cyl disp  hp drat    wt  qsec vs am gear carb
# Fiat 128       32.4   4 78.7  66 4.08 2.200 19.47  1  1    4    1
# Honda Civic    30.4   4 75.7  52 4.93 1.615 18.52  1  1    4    2
# Toyota Corolla 33.9   4 71.1  65 4.22 1.835 19.90  1  1    4    1
# Fiat X1-9      27.3   4 79.0  66 4.08 1.935 18.90  1  1    4    1
# Lotus Europa   30.4   4 95.1 113 3.77 1.513 16.90  1  1    5    2
转载请注明原文地址:http://anycun.com/QandA/1745300853a90572.html