-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
modify mlm example to support all backends #1856
base: master
Are you sure you want to change the base?
Conversation
@@ -336,37 +376,48 @@ def metrics(self): | |||
|
|||
|
|||
def create_masked_language_bert_model(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, shall I subclass the MaskedLanguageModel instead, and get rid of the tf custom training step?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe you should be able to remove the train_step()
and instead implement the compute_loss()
method. If you write the compute_loss()
method using only keras.ops
functions, it will work with all backends.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that no additional logic is required here, and using SparseCategoricalCrossEntropy
as Loss function; does it require an override for compute_loss()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR!
import os | ||
|
||
# set backend ["tensorflow", "jax", "torch"] | ||
os.environ["KERAS_BACKEND"] = "tensorflow" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it works with all backends, you can just remove this line.
@@ -336,37 +376,48 @@ def metrics(self): | |||
|
|||
|
|||
def create_masked_language_bert_model(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe you should be able to remove the train_step()
and instead implement the compute_loss()
method. If you write the compute_loss()
method using only keras.ops
functions, it will work with all backends.
added subclassing implementation. Certain obstacles remain as mentioned:
mlm_model = models.load_model("bert_mlm.keras", custom_objects={"MaskedLanguageModel": MaskedLanguageModel})
#mlm_model.layers :
#[<Embedding name=word_embedding, built=True>,
#<PositionEmbedding name=position_embedding, built=True>,
#<BertEncoderLayer name=bert_encoder_layer, built=True>,
#<Dense name=mlm_cls, built=True>]
mlm_model.get_layer("bert_encoder_layer").get_layer("encoder_0_ffn_layernormalization") This throws error saying AttributeError: 'BertEncoderLayer' object has no attribute 'get_layer'
# Load pretrained bert model
mlm_model = models.load_model(
"bert_mlm_imdb.keras", custom_objects={"MaskedLanguageBertModel": MaskedLanguageBertModel}
)
pretrained_bert_model = Model(
mlm_model.input, mlm_model.get_layer("encoder_0_ffn_layernormalization").output
) This was possible in functional declaration of the Model, but not in sub-classing (in last commit). Kindly suggest w.r.t last commit Thanks! |
With reference to the draft PR, here's an attempt to make to example truly backend-agnostic. In the process, I raised the following issues: keras-team/keras#18410, and keras-team/keras#19665. While the latter got resolved, the former persists still. Kindly review the changes, and let me know how shall we proceed with the issues mentioned.
cc: @fchollet