MultiHeadAttention Layer Bug Fix #3998

kumarutkarsh1248 · 2025-08-29T19:47:07Z

This PR address issue

Fixed the forward pass implementation in the MultiHeadAttention layer to produce the correct output.
Made relevant changes in the tests, except for the gradient tests.

Next Steps:
Get feedback on the changes and extend the fixes to the backward pass and gradient computation.

…ltilayer class" This reverts commit 0428af5.

rcurtin · 2025-08-31T20:09:31Z

I pushed changes for the backward and gradient pass. While debugging those, I realized that the forward pass on the softmax didn't work correctly when the masks used the intended values of -Inf. So, I implemented a custom forward masked softmax pass manually. It's not particularly pretty but it works correctly.

I believe there are a ton of optimizations that can be made to this implementation (mostly in memory allocations), but we can return to that later---correctness is the important part here.

kumarutkarsh1248 · 2025-09-01T09:35:18Z

Using std::numeric_limits::lowest() is actually risky, I just realized. When I was testing the forward pass locally, I set the attention mask to all zeros, so I didn’t notice this problem at the time.
Do you think there’s any way we can still use softmax.forward() later when we optimize the implementation?

rcurtin · 2025-09-01T13:54:54Z

Yeah, the idea was to actually use -Inf as the masking value, not -DBL_MAX.

I'm not sure if using softmax.Forward() will be possible---I think the implementation here will change a lot when it gets optimized. We'll see?

rcurtin

I think everything is good here---wouldn't mind getting more eyes on the review if anyone has the time. @kumarutkarsh1248 do you think you can add a note to HISTORY.md?

github-actions

Second approval provided automatically after 24 hours. 👍

kumarutkarsh1248 and others added 12 commits June 17, 2024 11:22

Revert "removed unnecessary deletion of layer inside destructor of mu…

8cbac28

…ltilayer class" This reverts commit 0428af5.

Merge branch 'mlpack:master' into master

9432857

Merge branch 'mlpack:master' into master

81bfcfe

Merge branch 'mlpack:master' into master

4565f27

multihead_attention issue 3969 resolved in forward method

0b3d237

multiHeadAttention forward pass bug fixed

ddddee8

fixed multiHeadAttention forward pass bug

475cb71

Merge branch 'mlpack:master' into fix-multihead-attention

9fad52f

style fix

3b905fa

Fix forward pass of MultiheadAttention.

3f74656

Temporary working commit, partially debugged.

3925378

Use custom masked softmax implementation and fix gradient computation.

512eb07

Remove unnecessary changes.

4fbf666

Fix style.

4d7bea5

rcurtin approved these changes Sep 2, 2025

View reviewed changes

github-actions bot approved these changes Sep 4, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

MultiHeadAttention Layer Bug Fix #3998

MultiHeadAttention Layer Bug Fix #3998

kumarutkarsh1248 commented Aug 29, 2025 •

edited

Loading

Uh oh!

rcurtin commented Aug 31, 2025

Uh oh!

kumarutkarsh1248 commented Sep 1, 2025 •

edited

Loading

Uh oh!

rcurtin commented Sep 1, 2025

Uh oh!

rcurtin left a comment

Uh oh!

github-actions bot left a comment

Uh oh!

Uh oh!

Uh oh!

MultiHeadAttention Layer Bug Fix #3998

Are you sure you want to change the base?

MultiHeadAttention Layer Bug Fix #3998

Conversation

kumarutkarsh1248 commented Aug 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rcurtin commented Aug 31, 2025

Uh oh!

kumarutkarsh1248 commented Sep 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rcurtin commented Sep 1, 2025

Uh oh!

rcurtin left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kumarutkarsh1248 commented Aug 29, 2025 •

edited

Loading

kumarutkarsh1248 commented Sep 1, 2025 •

edited

Loading