Skip to content

Conversation

kumarutkarsh1248
Copy link
Contributor

@kumarutkarsh1248 kumarutkarsh1248 commented Aug 29, 2025

This PR address issue

Fixed the forward pass implementation in the MultiHeadAttention layer to produce the correct output.
Made relevant changes in the tests, except for the gradient tests.

Next Steps:
Get feedback on the changes and extend the fixes to the backward pass and gradient computation.

@rcurtin
Copy link
Member

rcurtin commented Aug 31, 2025

I pushed changes for the backward and gradient pass. While debugging those, I realized that the forward pass on the softmax didn't work correctly when the masks used the intended values of -Inf. So, I implemented a custom forward masked softmax pass manually. It's not particularly pretty but it works correctly.

I believe there are a ton of optimizations that can be made to this implementation (mostly in memory allocations), but we can return to that later---correctness is the important part here.

@kumarutkarsh1248
Copy link
Contributor Author

kumarutkarsh1248 commented Sep 1, 2025

Using std::numeric_limits::lowest() is actually risky, I just realized. When I was testing the forward pass locally, I set the attention mask to all zeros, so I didn’t notice this problem at the time.
Do you think there’s any way we can still use softmax.forward() later when we optimize the implementation?

@rcurtin
Copy link
Member

rcurtin commented Sep 1, 2025

Yeah, the idea was to actually use -Inf as the masking value, not -DBL_MAX.

I'm not sure if using softmax.Forward() will be possible---I think the implementation here will change a lot when it gets optimized. We'll see?

Copy link
Member

@rcurtin rcurtin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think everything is good here---wouldn't mind getting more eyes on the review if anyone has the time. @kumarutkarsh1248 do you think you can add a note to HISTORY.md?

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second approval provided automatically after 24 hours. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants