Skip to content

Commit

Permalink
Reorder README sections and add a section about loading TF/Flax check…
Browse files Browse the repository at this point in the history
…points

- add a section about loading TF/Flax checkpoints
- reorder README sections: in the examples, the section "Distributed training" is now at the top 
- add logo to README
  • Loading branch information
regisss authored Apr 8, 2022
1 parent 635281b commit bed44f9
Show file tree
Hide file tree
Showing 3 changed files with 30 additions and 14 deletions.
5 changes: 5 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,11 @@ See the License for the specific language governing permissions and
limitations under the License.
-->

<p align="center">
<img src="readme_logo.png" />
</p>


# Optimum Habana

🤗 Optimum Habana is the interface between the 🤗 Transformers library and [Habana's Gaudi processor](https://docs.habana.ai/en/latest/index.html).
Expand Down
39 changes: 25 additions & 14 deletions examples/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,31 @@ limitations under the License.
This folder contains actively maintained examples of use of 🤗 Optimum Habana for question answering and text classification.


## Distributed training

All the PyTorch scripts in this repository work out of the box with distributed training. To launch one of them on _n_ HPUs,
use the following command:

```bash
python gaudi_spawn.py \
--world_size number_of_hpu_you_have --use_mpi \
path_to_script.py --args1 --args2 ... --argsN
```
where `--argX` is an argument of the script to run in a distributed way.
Examples are given for question answering [here](https://github.com/huggingface/optimum-habana/blob/main/examples/question-answering/README.md#multi-card-training) and for text classification [here](https://github.com/huggingface/optimum-habana/tree/main/examples/text-classification#multi-card-training).


## Loading from a Tensorflow/Flax checkpoint file instead of a PyTorch model

If a model also has Tensorflow or Flax checkpoints, you can load them instead of a PyTorch checkpoint by specifying `from_tf=True` or `from_flax=True` in the model instantiation.

You can try it for SQuAD [here](https://github.com/huggingface/optimum-habana/blob/688a857d5308a87a502eec7657f744429125d6f1/examples/question-answering/run_qa.py#L310) or for MRPC [here](https://github.com/huggingface/optimum-habana/blob/688a857d5308a87a502eec7657f744429125d6f1/examples/text-classification/run_glue.py#L338).

You can check if a model has such checkpoints on the [Hub](https://huggingface.co/models). You can also specify a URL or a path to a Tensorflow/Flax checkpoint in `model_args.model_name_or_path`.

> Resuming from a checkpoint will only work with a PyTorch checkpoint.

## Running quick tests

Most examples are equipped with a mechanism to truncate the number of dataset samples to the desired length. This is useful for debugging purposes, for example to quickly check that all stages of the programs can complete, before running the same setup on the full dataset which may take hours to complete.
Expand Down Expand Up @@ -56,17 +81,3 @@ A few notes on this integration:

- you will need to be logged in to the Hugging Face website locally for it to work, the easiest way to achieve this is to run `huggingface-cli login` and then type your username and password when prompted. You can also pass along your authentication token with the `--hub_token` argument.
- the `output_dir` you pick will either need to be a new folder or a local clone of the distant repository you are using.


## Distributed training

All the PyTorch scripts in this repository work out of the box with distributed training. To launch one of them on _n_ HPUs,
use the following command:

```bash
python gaudi_spawn.py \
--world_size number_of_hpu_you_have --use_mpi \
path_to_script.py --args1 --args2 ... --argsN
```
where `--argX` is an argument of the script to run in a distributed way.
Examples are given for question answering [here](https://github.com/huggingface/optimum-habana/blob/main/examples/question-answering/README.md#multi-card-training) and for text classification [here](https://github.com/huggingface/optimum-habana/tree/main/examples/text-classification#multi-card-training).
Binary file added readme_logo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit bed44f9

Please sign in to comment.