-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
MultiHeadAttention Layer Bug Fix #3998
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
MultiHeadAttention Layer Bug Fix #3998
Conversation
…ltilayer class" This reverts commit 0428af5.
I pushed changes for the backward and gradient pass. While debugging those, I realized that the forward pass on the softmax didn't work correctly when the masks used the intended values of I believe there are a ton of optimizations that can be made to this implementation (mostly in memory allocations), but we can return to that later---correctness is the important part here. |
Using |
Yeah, the idea was to actually use I'm not sure if using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think everything is good here---wouldn't mind getting more eyes on the review if anyone has the time. @kumarutkarsh1248 do you think you can add a note to HISTORY.md
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Second approval provided automatically after 24 hours. 👍
This PR address issue
Fixed the forward pass implementation in the MultiHeadAttention layer to produce the correct output.
Made relevant changes in the tests, except for the gradient tests.
Next Steps:
Get feedback on the changes and extend the fixes to the backward pass and gradient computation.