-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Embedding layer #3999
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Embedding layer #3999
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adapting this layer @kumarutkarsh1248! I have a bunch of comments---let me know what you think. As implemented, does this work in the mlpack-onnx networks you have put it into?
* @file methods/ann/layer/lookup.hpp | ||
* @author Marcus Edel | ||
* | ||
* Definition of the Lookup (embedding) layer class. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think we can just rename the layer Embedding
because that's the more commonly used name?
@@ -0,0 +1,151 @@ | |||
/** | |||
* @file methods/ann/layer/lookup.hpp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you adapted this out of not_adapted/
, can you remove those files? Or, alternately, use git mv
to move the old files to this directory and update them (that will preserve the git history for the files).
@@ -0,0 +1,151 @@ | |||
/** | |||
* @file methods/ann/layer/lookup.hpp | |||
* @author Marcus Edel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also feel free to add your name here :)
const MatType& gy, | ||
MatType& g) | ||
{ | ||
Log::Fatal << "Lookup cannot be used as an intermediate layer." << std::endl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Log::Fatal << "Lookup cannot be used as an intermediate layer." << std::endl; | |
Log::Fatal << "Lookup cannot be used as an intermediate layer;" | |
<< " it must be the first layer in a network!" << std::endl; |
Just a little clarification; this makes the error message a bit more actionable for the user (who may not immediately realize the meaning of the term "intermediate layer").
MakeAlias(const_cast<CubeType&>(errorTemp), error, seqLength, embeddingSize, | ||
batchSize, 0, false); | ||
|
||
gradient.set_size(arma::size(weights)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gradient.set_size(arma::size(weights)); |
The gradient should already be set to the right size.
module.Parameters().randu(); | ||
|
||
// Test the Forward function. | ||
input = arma::zeros(seqLength, batchSize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we test input with different values? You could use arma::randi<>
. It would also be good to add a test ensuring that an exception is thrown if an input has value greater than the vocabulary size.
// } function; | ||
|
||
// REQUIRE(CheckGradient(function) <= 1e-6); | ||
// } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should definitely uncomment this function before merge; does it not currently work?
// input.set_size(seqLength, batchSize); | ||
// for (size_t i = 0; i < input.n_elem; ++i) | ||
// { | ||
// input(i) = math::RandInt(1, vocabSize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it looks here like the semantic is that the inputs should take values in [1, vocabSize]
. But everything else in mlpack (and in C++) is zero-indexed---so I would expect that the index should take values in [0, vocabSize)
. Can you make that change? That would also remove the need for the - 1
that you are doing in the code, which could cause underflows if the user passes a 0 (which is actually what the test above is doing).
@@ -44,6 +44,7 @@ | |||
#include <mlpack/methods/ann/layer/linear_recurrent.hpp> | |||
#include <mlpack/methods/ann/layer/linear3d.hpp> | |||
#include <mlpack/methods/ann/layer/log_softmax.hpp> | |||
#include <mlpack/methods/ann/layer/lookup.hpp> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't forget to add the layer type to the list of serializable layers in serialization.hpp
too. 👍
* regularizer). | ||
*/ | ||
Lookup(const size_t vocabSize = 0, const size_t embeddingSize = 0, | ||
RegularizerType regularizer = RegularizerType()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it ever valid to have a vocab size of 0 or an embedding size of 0? I think it might be better to remove those default parameters, and also remove the default constructor, so that the user must specify both the vocab size and the embedding dimension.
No description provided.