Skip to content

Conversation

magnumripper
Copy link
Member

Found when reviewing all uses of barriers in our kernels.

I recall we've had problems with intermittent failures from these two formats. The global barrier was added by me in good faith at some point in time. It did help a little but still not safe. Before that, there wasn't even the barrier...

I believe all other kernels' use of barriers are just fine.

@solardiz
Copy link
Member

solardiz commented Sep 2, 2025

Looks like these specific formats have just one cmp_all-alike flag, rather than anything per computed hash. If so, should we maybe prefer to reset this flag in cmp_all rather than in crypt_all? Otherwise (as with this PR's changes), cmp_all probably just keeps returning true if there are matches for any of the loaded hashes, whereas it could only return true if called for loaded hashes that match any of the computed hashes.

Edit: OTOH, clearing the flag just once per all cmp_all calls is probably quicker when there are no successful cracks by the entire crypt_all call.

@magnumripper
Copy link
Member Author

Looks like these specific formats have just one cmp_all-alike flag, rather than anything per computed hash.

Yes and that's also the reason for race conditions. Looking at it today I can't understand why it was made this way, but it was very early in our steps towards mastering GPU formats.

If so, should we maybe prefer to reset this flag in cmp_all rather than in crypt_all? Otherwise (as with this PR's changes), cmp_all probably just keeps returning true if there are matches for any of the loaded hashes, whereas it could only return true if called for loaded hashes that match any of the computed hashes.

How would cmp_all know there aren't more matching binaries?

@solardiz
Copy link
Member

solardiz commented Sep 2, 2025

How would cmp_all know there aren't more matching binaries?

It wouldn't, but it doesn't need to since it's called again for each loaded hash. That's when bitmap+hash table are not in use (for the current salt if applicable) - when they are, cmp_all is skipped.

@magnumripper
Copy link
Member Author

Oh, I see what you mean now. In my mind the cmp kernel was called from within crypt_all. The result should obviously be cleared prior to calling the cmp kernel, so in cmp_all. I'll fix that.

Only gid 0 cleared the "result" buffer, then a global barrier. The barrier
only synchronizes within a single workgroup so this is not safe. The only
safe and sensible way of clearing it is on host side.

Then multiple threads (potentially) wrote to the same "result" location
without atomic operations. This is a data race and result is undefined in
OpenCL. We have to use atomic operations.
@solardiz
Copy link
Member

solardiz commented Sep 2, 2025

BTW, why do you also add volatile? Does it matter?

@magnumripper
Copy link
Member Author

magnumripper commented Sep 2, 2025

BTW, why do you also add volatile? Does it matter?

It's required for atomic functions. IIRC there's not always a warning if you forget, and it may even (intermittently) work just fine although perhaps unsafe so could possibly be a bug really hard to spot.

@magnumripper magnumripper merged commit 43b0806 into openwall:bleeding-jumbo Sep 2, 2025
32 of 33 checks passed
@magnumripper magnumripper deleted the sha512-free-race branch September 2, 2025 05:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants