FastAI XLA Extensions Library

The FastAI XLA Extensions library package allows your fastai/Pytorch models to run on TPUs using the Pytorch-XLA library.

Documentation Site

You can view the documentation here: https://butchland.github.io/fastai_xla_extensions

Install

pip install fastai_xla_extensions

How to use

Configure TPU Environment Access

The Pytorch XLA package requires an environment supporting TPUs (Kaggle kernels, GCP or Colab environments required).

Nominally, Pytorch XLA also supports GPUs so please see the Pytorch XLA site for more instructions.

If running on Colab, make sure the Runtime Type is set to TPU.

Install fastai

Use the latest fastai and fastcore versions

#hide_output
#colab
!pip install -Uqq fastcore --upgrade
!pip install -Uqq fastai --upgrade

Install Pytorch XLA

This is the official way to install Pytorch-XLA 1.7 as per the instructions here

#hide_output
#colab
!pip install -Uqq cloud-tpu-client==0.10 https://storage.googleapis.com/tpu-pytorch/wheels/torch_xla-1.7-cp36-cp36m-linux_x86_64.whl

Check if XLA is available

Import the libraries

Import the fastai and fastai_xla_extensions libraries

#colab
#hide_output
import fastai_xla_extensions.core

from fastai.vision.all import *

Example

Build a MNIST classifier -- adapted from fastai course Lesson 4 notebook

Load MNIST dataset

path = untar_data(URLs.MNIST_TINY)

Create Fastai DataBlock

datablock = DataBlock(
    blocks=(ImageBlock,CategoryBlock),
    get_items=get_image_files,
    splitter=GrandparentSplitter(),
    get_y=parent_label,
    item_tfms=Resize(28),
    batch_tfms=aug_transforms(do_flip=False,min_scale=0.8)
)

#colab
datablock.summary(path)

Setting-up type transforms pipelines
Collecting items from /root/.fastai/data/mnist_tiny
Found 1428 items
2 datasets of sizes 709,699
Setting up Pipeline: PILBase.create
Setting up Pipeline: parent_label -> Categorize -- {'vocab': None, 'sort': True, 'add_na': False}

Building one sample
  Pipeline: PILBase.create
    starting from
      /root/.fastai/data/mnist_tiny/train/7/7770.png
    applying PILBase.create gives
      PILImage mode=RGB size=28x28
  Pipeline: parent_label -> Categorize -- {'vocab': None, 'sort': True, 'add_na': False}
    starting from
      /root/.fastai/data/mnist_tiny/train/7/7770.png
    applying parent_label gives
      7
    applying Categorize -- {'vocab': None, 'sort': True, 'add_na': False} gives
      TensorCategory(1)

Final sample: (PILImage mode=RGB size=28x28, TensorCategory(1))


Collecting items from /root/.fastai/data/mnist_tiny
Found 1428 items
2 datasets of sizes 709,699
Setting up Pipeline: PILBase.create
Setting up Pipeline: parent_label -> Categorize -- {'vocab': None, 'sort': True, 'add_na': False}
Setting up after_item: Pipeline: Resize -- {'size': (28, 28), 'method': 'crop', 'pad_mode': 'reflection', 'resamples': (2, 0), 'p': 1.0} -> ToTensor
Setting up before_batch: Pipeline: 
Setting up after_batch: Pipeline: IntToFloatTensor -- {'div': 255.0, 'div_mask': 1} -> Warp -- {'magnitude': 0.2, 'p': 1.0, 'draw_x': None, 'draw_y': None, 'size': None, 'mode': 'bilinear', 'pad_mode': 'reflection', 'batch': False, 'align_corners': True, 'mode_mask': 'nearest'} -> RandomResizedCropGPU -- {'size': None, 'min_scale': 0.8, 'ratio': (1, 1), 'mode': 'bilinear', 'valid_scale': 1.0, 'p': 1.0} -> Brightness -- {'max_lighting': 0.2, 'p': 1.0, 'draw': None, 'batch': False}

Building one batch
Applying item_tfms to the first sample:
  Pipeline: Resize -- {'size': (28, 28), 'method': 'crop', 'pad_mode': 'reflection', 'resamples': (2, 0), 'p': 1.0} -> ToTensor
    starting from
      (PILImage mode=RGB size=28x28, TensorCategory(1))
    applying Resize -- {'size': (28, 28), 'method': 'crop', 'pad_mode': 'reflection', 'resamples': (2, 0), 'p': 1.0} gives
      (PILImage mode=RGB size=28x28, TensorCategory(1))
    applying ToTensor gives
      (TensorImage of size 3x28x28, TensorCategory(1))

Adding the next 3 samples

No before_batch transform to apply

Collating items in a batch

Applying batch_tfms to the batch built
  Pipeline: IntToFloatTensor -- {'div': 255.0, 'div_mask': 1} -> Warp -- {'magnitude': 0.2, 'p': 1.0, 'draw_x': None, 'draw_y': None, 'size': None, 'mode': 'bilinear', 'pad_mode': 'reflection', 'batch': False, 'align_corners': True, 'mode_mask': 'nearest'} -> RandomResizedCropGPU -- {'size': None, 'min_scale': 0.8, 'ratio': (1, 1), 'mode': 'bilinear', 'valid_scale': 1.0, 'p': 1.0} -> Brightness -- {'max_lighting': 0.2, 'p': 1.0, 'draw': None, 'batch': False}
    starting from
      (TensorImage of size 4x3x28x28, TensorCategory([1, 1, 1, 1]))
    applying IntToFloatTensor -- {'div': 255.0, 'div_mask': 1} gives
      (TensorImage of size 4x3x28x28, TensorCategory([1, 1, 1, 1]))
    applying Warp -- {'magnitude': 0.2, 'p': 1.0, 'draw_x': None, 'draw_y': None, 'size': None, 'mode': 'bilinear', 'pad_mode': 'reflection', 'batch': False, 'align_corners': True, 'mode_mask': 'nearest'} gives
      (TensorImage of size 4x3x28x28, TensorCategory([1, 1, 1, 1]))
    applying RandomResizedCropGPU -- {'size': None, 'min_scale': 0.8, 'ratio': (1, 1), 'mode': 'bilinear', 'valid_scale': 1.0, 'p': 1.0} gives
      (TensorImage of size 4x3x27x27, TensorCategory([1, 1, 1, 1]))
    applying Brightness -- {'max_lighting': 0.2, 'p': 1.0, 'draw': None, 'batch': False} gives
      (TensorImage of size 4x3x27x27, TensorCategory([1, 1, 1, 1]))

Create the dataloader

dls = datablock.dataloaders(path)

#colab
dls.show_batch()

Create a Fastai CNN Learner

learner = cnn_learner(dls, resnet18, metrics=accuracy)

#colab
learner.summary()

epoch	train_loss	valid_loss	accuracy	time
0	None	None	00:00

Sequential (Input shape: ['64 x 3 x 28 x 28'])
================================================================
Layer (type)         Output Shape         Param #    Trainable 
================================================================
Conv2d               64 x 64 x 14 x 14    9,408      True      
________________________________________________________________
BatchNorm2d          64 x 64 x 14 x 14    128        True      
________________________________________________________________
ReLU                 64 x 64 x 14 x 14    0          False     
________________________________________________________________
MaxPool2d            64 x 64 x 7 x 7      0          False     
________________________________________________________________
Conv2d               64 x 64 x 7 x 7      36,864     True      
________________________________________________________________
BatchNorm2d          64 x 64 x 7 x 7      128        True      
________________________________________________________________
ReLU                 64 x 64 x 7 x 7      0          False     
________________________________________________________________
Conv2d               64 x 64 x 7 x 7      36,864     True      
________________________________________________________________
BatchNorm2d          64 x 64 x 7 x 7      128        True      
________________________________________________________________
Conv2d               64 x 64 x 7 x 7      36,864     True      
________________________________________________________________
BatchNorm2d          64 x 64 x 7 x 7      128        True      
________________________________________________________________
ReLU                 64 x 64 x 7 x 7      0          False     
________________________________________________________________
Conv2d               64 x 64 x 7 x 7      36,864     True      
________________________________________________________________
BatchNorm2d          64 x 64 x 7 x 7      128        True      
________________________________________________________________
Conv2d               64 x 128 x 4 x 4     73,728     True      
________________________________________________________________
BatchNorm2d          64 x 128 x 4 x 4     256        True      
________________________________________________________________
ReLU                 64 x 128 x 4 x 4     0          False     
________________________________________________________________
Conv2d               64 x 128 x 4 x 4     147,456    True      
________________________________________________________________
BatchNorm2d          64 x 128 x 4 x 4     256        True      
________________________________________________________________
Conv2d               64 x 128 x 4 x 4     8,192      True      
________________________________________________________________
BatchNorm2d          64 x 128 x 4 x 4     256        True      
________________________________________________________________
Conv2d               64 x 128 x 4 x 4     147,456    True      
________________________________________________________________
BatchNorm2d          64 x 128 x 4 x 4     256        True      
________________________________________________________________
ReLU                 64 x 128 x 4 x 4     0          False     
________________________________________________________________
Conv2d               64 x 128 x 4 x 4     147,456    True      
________________________________________________________________
BatchNorm2d          64 x 128 x 4 x 4     256        True      
________________________________________________________________
Conv2d               64 x 256 x 2 x 2     294,912    True      
________________________________________________________________
BatchNorm2d          64 x 256 x 2 x 2     512        True      
________________________________________________________________
ReLU                 64 x 256 x 2 x 2     0          False     
________________________________________________________________
Conv2d               64 x 256 x 2 x 2     589,824    True      
________________________________________________________________
BatchNorm2d          64 x 256 x 2 x 2     512        True      
________________________________________________________________
Conv2d               64 x 256 x 2 x 2     32,768     True      
________________________________________________________________
BatchNorm2d          64 x 256 x 2 x 2     512        True      
________________________________________________________________
Conv2d               64 x 256 x 2 x 2     589,824    True      
________________________________________________________________
BatchNorm2d          64 x 256 x 2 x 2     512        True      
________________________________________________________________
ReLU                 64 x 256 x 2 x 2     0          False     
________________________________________________________________
Conv2d               64 x 256 x 2 x 2     589,824    True      
________________________________________________________________
BatchNorm2d          64 x 256 x 2 x 2     512        True      
________________________________________________________________
Conv2d               64 x 512 x 1 x 1     1,179,648  True      
________________________________________________________________
BatchNorm2d          64 x 512 x 1 x 1     1,024      True      
________________________________________________________________
ReLU                 64 x 512 x 1 x 1     0          False     
________________________________________________________________
Conv2d               64 x 512 x 1 x 1     2,359,296  True      
________________________________________________________________
BatchNorm2d          64 x 512 x 1 x 1     1,024      True      
________________________________________________________________
Conv2d               64 x 512 x 1 x 1     131,072    True      
________________________________________________________________
BatchNorm2d          64 x 512 x 1 x 1     1,024      True      
________________________________________________________________
Conv2d               64 x 512 x 1 x 1     2,359,296  True      
________________________________________________________________
BatchNorm2d          64 x 512 x 1 x 1     1,024      True      
________________________________________________________________
ReLU                 64 x 512 x 1 x 1     0          False     
________________________________________________________________
Conv2d               64 x 512 x 1 x 1     2,359,296  True      
________________________________________________________________
BatchNorm2d          64 x 512 x 1 x 1     1,024      True      
________________________________________________________________
AdaptiveAvgPool2d    64 x 512 x 1 x 1     0          False     
________________________________________________________________
AdaptiveMaxPool2d    64 x 512 x 1 x 1     0          False     
________________________________________________________________
Flatten              64 x 1024            0          False     
________________________________________________________________
BatchNorm1d          64 x 1024            2,048      True      
________________________________________________________________
Dropout              64 x 1024            0          False     
________________________________________________________________
Linear               64 x 512             524,288    True      
________________________________________________________________
ReLU                 64 x 512             0          False     
________________________________________________________________
BatchNorm1d          64 x 512             1,024      True      
________________________________________________________________
Dropout              64 x 512             0          False     
________________________________________________________________
Linear               64 x 2               1,024      True      
________________________________________________________________

Total params: 11,704,896
Total trainable params: 11,704,896
Total non-trainable params: 0

Optimizer used: <function Adam at 0x7f1399bb1730>
Loss function: FlattenedLoss of CrossEntropyLoss()

Callbacks:
  - TrainEvalCallback
  - XLAOptCallback
  - Recorder
  - ProgressCallback

Set Learner to XLA mode

This will setup the learner to use the XLA Device

#colab
learner.to_xla()

<fastai.learner.Learner at 0x7fb6809e7f28>

Using the lr_find works

#colab
learner.lr_find()

SuggestedLRs(lr_min=0.02089296132326126, lr_steep=0.0010000000474974513)

Run one cycle training.

#colab
learner.fit_one_cycle(5,lr_max=slice(1e-4,0.02))

epoch	train_loss	valid_loss	accuracy	time
0	0.749318	0.348496	0.861230	00:12
1	0.634834	0.676399	0.791130	00:03
2	0.621724	0.506193	0.834049	00:03
3	0.553649	0.503763	0.824034	00:03
4	0.500875	0.435466	0.846924	00:03

Further fine-tuning

#colab
learner.fit_one_cycle(5,slice(7e-4, 1e-3))

epoch	train_loss	valid_loss	accuracy	time
0	0.355362	0.312893	0.889843	00:07
1	0.361367	0.271736	0.895565	00:03
2	0.324752	0.205274	0.932761	00:03
3	0.282965	0.197236	0.925608	00:03
4	0.246091	0.187480	0.939914	00:03

Model params are using TPU

#colab
one_param(learner.model).device

device(type='xla', index=1)

Plot loss seems to be working fine.

#colab
learner.recorder.plot_loss()

Performance troubleshooting

To check if your model is hiting an aten operation (an operation that is not handled by accelerator device and returned to CPU for default implementation) you can check it with ands then you can report to pytorch xla team.

from fastai_xla_extensions.utils import print_aten_ops

print_aten_ops()

Samples

Other examples of fastai notebooks using the fastai_xla_extensions package are also available here:

More samples will be added in the future as we fix issues and implement more capabilities.

Status

The fastai XLA extensions library is still in very early development phase (not even alpha) which means there's still a lot of things not working.

Use it at your own risk.

If you wish to contribute to the project, fork it and make pull request.

This project uses nbdev -- a jupyter notebook first development environment and is being developed on Colab.

Name		Name	Last commit message	Last commit date
Latest commit History 222 Commits
.github/workflows		.github/workflows
archive_nbs		archive_nbs
docs		docs
explore_nbs		explore_nbs
fastai_xla_extensions		fastai_xla_extensions
nbs		nbs
samples		samples
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
settings.ini		settings.ini
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FastAI XLA Extensions Library

Documentation Site

Install

How to use

Configure TPU Environment Access

Install fastai

Install Pytorch XLA

Check if XLA is available

Import the libraries

Example

Set Learner to XLA mode

Performance troubleshooting

Samples

Status

About

Releases

Packages

Languages

License

tyoc213/fastai_xla_extensions

Folders and files

Latest commit

History

Repository files navigation

FastAI XLA Extensions Library

Documentation Site

Install

How to use

Configure TPU Environment Access

Install fastai

Install Pytorch XLA

Check if XLA is available

Import the libraries

Example

Set Learner to XLA mode

Performance troubleshooting

Samples

Status

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages