c++ - Why is GCC not autovectorising this code unless I expicitly specify any of the possible cost model? - Stack Overflow

admin2025-04-17  4

I have a little minimal sample algorithm (please ignore if the algorithm itself doesn't make sense and could be changed to be different, its just a contrived sample to demonstrate what I'm seeing).

When I compile it on any recent GCC with appropriate flags it doesn't autovectorise ().

If I add any of: -fvect-cost-model=dynamic -fvect-cost-model=cheap -fvect-cost-model=very-cheap -fvect-cost-model=unlimited

Then it does vectorise, but this doesn't make sense to me because -fvect-cost-model=dynamic is the default that should already be implied so I don't understand why setting that would change anything.

Why isn't this autovectorising without this flag and why does this flag change that? Please help me understand!

Compile flags:

-std=c++20 -O3 -fopt-info-all-vec -ffast-math -march=core-avx2

Algorithm:

void foo(const size_t n, float * __restrict__  a, float * __restrict__  b, float  * __restrict__  c) 
{
    float total=0.0f;
    float sum=0.0f;
    float max=0.0f;
    size_t count=0;

    for (size_t i = 0; i < n; ++i) {
        float temp = *b;
        const bool not_zero = temp != 0.0;
        if (i % 4 == 0) {
        sum += *b * not_zero;
        count += not_zero;
        }
        max = std::max(temp, max);
        *b = temp * *c;
        total += *b;
        *a *= *b;
        a += 1;
        b += 1;
        c += 1;
    }

    std::cout << total  << sum << max << count;
}

I have a little minimal sample algorithm (please ignore if the algorithm itself doesn't make sense and could be changed to be different, its just a contrived sample to demonstrate what I'm seeing).

When I compile it on any recent GCC with appropriate flags it doesn't autovectorise (https://godbolt.org/z/KvhKP9bsE).

If I add any of: -fvect-cost-model=dynamic -fvect-cost-model=cheap -fvect-cost-model=very-cheap -fvect-cost-model=unlimited

Then it does vectorise, but this doesn't make sense to me because -fvect-cost-model=dynamic is the default that should already be implied so I don't understand why setting that would change anything.

Why isn't this autovectorising without this flag and why does this flag change that? Please help me understand!

Compile flags:

-std=c++20 -O3 -fopt-info-all-vec -ffast-math -march=core-avx2

Algorithm:

void foo(const size_t n, float * __restrict__  a, float * __restrict__  b, float  * __restrict__  c) 
{
    float total=0.0f;
    float sum=0.0f;
    float max=0.0f;
    size_t count=0;

    for (size_t i = 0; i < n; ++i) {
        float temp = *b;
        const bool not_zero = temp != 0.0;
        if (i % 4 == 0) {
        sum += *b * not_zero;
        count += not_zero;
        }
        max = std::max(temp, max);
        *b = temp * *c;
        total += *b;
        *a *= *b;
        a += 1;
        b += 1;
        c += 1;
    }

    std::cout << total  << sum << max << count;
}
Share Improve this question edited Feb 1 at 13:31 Darth-CodeX 2,4573 gold badges10 silver badges27 bronze badges asked Feb 1 at 12:38 Malcolm MacLeodMalcolm MacLeod 6747 silver badges18 bronze badges 4
  • 1 I can't see any difference in the result of your own compiler explorer link, with and without -fvect-cost-model=dynamic ... are you certain you're looking at the right thing? – Useless Commented Feb 1 at 13:44
  • Not tried it, but this could be helpful: stackoverflow.com/questions/50586468/… – gerum Commented Feb 1 at 13:46
  • @Useless Yeah pretty sure. Here it is without setting the cost model: godbolt.org/z/3bsz1Yv59 compiler output says: main.cpp:11:26: missed: couldn't vectorize loop main.cpp:12:22: missed: not vectorized: no vectype for stmt: _1 = *b_61; Here it is with: godbolt.org/z/eoE8v5Yzo Compiler output says: main.cpp:11:26: optimized: loop vectorized using 32 byte vectors main.cpp:11:26: optimized: loop vectorized using 16 byte vectors main.cpp:4:6: note: vectorized 1 loops in function. – Malcolm MacLeod Commented Feb 1 at 13:49
  • note that cmake's RelWithDebInfo isn't just release with debug symbols, it reduces the optimisation settings too – Alan Birtles Commented Feb 1 at 15:50
Add a comment  | 

1 Answer 1

Reset to default 1

It turns out I was mistaken and that only 3 of the four cost models vectorise; -fvect-cost-model=very-cheap does not. This happens to be the default on Godbolt which was revealed after using -Q --help=optimizers to check.

Godbolt for some reason is configured to default to very-cheap as the cost model, this differs from the standard GCC settings as well as the settings on every machine where I've checked.

I'm not sure why this is, perhaps its because they want to make it more obvious if there are any parts of a loop that still remain scalar... (speculating here though)

转载请注明原文地址:http://anycun.com/QandA/1744828202a88181.html