9

I was working on highly "vectorizable" code and noted that regarding the C++ __restrict keyword/extension ~, Clang's behavior is different and impractical compared to GCC even in a simple case.

For compiler generated code, the slowdown is about 15x (in my specific case, not the exemple below).

Here is the code (also available at https://godbolt.org/z/sdGd43x75):

struct Param {
    int *x;
};

int foo(int *a, int *b) {
    *a = 5;
    *b = 6;
    // No significant optimization here, as expected (for clang/gcc)
    return *a + *b;
}

int foo(Param a, Param b) {
    *a.x = 5;
    *b.x = 6;
    // No significant optimization here, as expected (for clang/gcc)
    return *a.x + *b.x;
}

/////////////////////

struct ParamR {
    // "Restricted pointers assert that members point to disjoint storage"
    // https://en.cppreference.com/w/c/language/restrict, is restrict's 
    // interpretation for C can be used in C++ (for __restrict too ?) ?
    int *__restrict x;
};

int rfoo(int *__restrict a, int *__restrict b) {
    *a = 5;
    *b = 6;
    // Significant optimization here, as expected (for clang/gcc)
    return *a + *b;
}

int rfoo(ParamR a, ParamR b) {
    *a.x = 5;
    *b.x = 6;
    // No significant optimization here, NOT expected (clang fails?, gcc optimizes)
    return *a.x + *b.x;
}

int rfoo(ParamR *__restrict a, ParamR *__restrict b) {
    *a->x = 5;
    *b->x = 6;
    // No significant optimization here, NOT expected (clang fails?, gcc optimizes)
    return *a->x + *b->x;
}

This happens for both C++ (__restrict) and C code (using the std restrict).

How can I make Clang understand that the pointer will always point to disjoint storage ?

6
  • Does this answer your question? Why does clang ignore __restrict__? Commented Dec 13, 2021 at 18:50
  • It appears the bug still exists or perhaps it's a new variation of the same Commented Dec 13, 2021 at 18:56
  • It's somewhat of a duplicate, both are about a bug about clang TBAA (it seems). In my case I use __restrict on a member variable which clang does not notice, in stackoverflow.com/questions/50365141/…, clang fails for an even simpler case (__restrict on function argument). If it's a variation of the same bug, it's been 3 years since publicly noticed and it's still not fixed. Commented Dec 13, 2021 at 21:01
  • It looks like LLVM just doesn't care, tbaa is designed to help memcpy not handle this. noalias seems to be the way they really implement it. But that doesn't apply to member variables. Commented Dec 13, 2021 at 21:41
  • Yet another clang bug/missed opportunity: godbolt.org/z/s8qzr3P3v, even though this recent LLVM dev mtg (youtube.com/watch?v=08XwXB3GHck) is saying that clang should see through this simple cases. Commented Jan 8, 2022 at 18:45

1 Answer 1

2

It appears to be a bug. Well I don't know if I should call it a bug as it does create correct behavior for the program, let's say it is a missed opportunity in the optimizer.

I have tried a few workarounds and the only thing that worked is to always pass a pointer as a restrict parameter. Like so:

int rfoo(int *__restrict a, int *__restrict b) {
    *a = 5;
    *b = 6;
    // Significant optimization here, as expected (for clang/gcc)
    return *a + *b;
}

// change this:
int rfoo(ParamR a, ParamR b) {
    *a.x = 5;
    *b.x = 6;
    // No significant optimization here, NOT expected (clang fails?, gcc optimizes)
    return *a.x + *b.x;
}

// to this:
int rfoo2(ParamR a, ParamR b) {
    return rfoo(a.x, b.x);
}

Output from clang 12.0.0:

rfoo(ParamR, ParamR):                       # @rfoo(ParamR, ParamR)
        mov     dword ptr [rdi], 5
        mov     dword ptr [rsi], 6
        mov     eax, dword ptr [rdi]
        add     eax, 6
        ret
rfoo2(ParamR, ParamR):                      # @rfoo2(ParamR, ParamR)
        mov     dword ptr [rdi], 5
        mov     dword ptr [rsi], 6
        mov     eax, 11
        ret

Now this is terrible inconvenient, especially for more complex code, but if the performance difference is that great and important and you can't change to gcc it might be something considering doing.

Sign up to request clarification or add additional context in comments.

2 Comments

I would dare call this a bug in the optimizer as it misses a significant opportunity in a simple program.
For a sound compiler to perform an optimizing transform, it must not only ensure that the transform would likely improve performance, but also that all all possible corner cases where the transform might adversely affect program behavior have been adequately considered and handled. Even if all corner cases are in fact handled, the fact that a compiler writer blocks optimizations in cases which would be difficult to prove sound is hardly a bug.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.