Neural Architecture Search with Controller RNN

Basic implementation of Controller RNN from Neural Architecture Search with Reinforcement Learning and Learning Transferable Architectures for Scalable Image Recognition.

Uses Keras to define and train children / generated networks, which are defined in Tensorflow by the Controller RNN.
Define a state space by using StateSpace, a manager which adds states and handles communication between the Controller RNN and the user.
Controller manages the training and evaluation of the Controller RNN
NetworkManager handles the training and reward computation of a Keras model

Usage

At a high level : For full training details, please see train.py.

# construct a state space
state_space = StateSpace()

# add states
state_space.add_state(name='kernel', values=[1, 3])
state_space.add_state(name='filters', values=[16, 32, 64])

# create the managers
controller = Controller(tf_session, num_layers, state_space)
manager = NetworkManager(dataset, epochs=max_epochs, batchsize=batchsize)

# For number of trials
  sample_state = ...
  actions = controller.get_actions(sample_state)
  reward = manager.get_reward(actions)
  controller.train()

Implementation details

This is a very limited project.

Note: The controller eventually overfits to produce a single model always (generally a high performance model, but not always the highest). It seems there os a requirement to have a pool of controllers to avoid this overfitting, which would need distributed training. However, since the exploration rate is so high, there is a large set of very good models that are generated during the training process, and the controller yields a pretty strong model as well.
It doesnt have support for skip connections via 'anchor points' etc. (though it may not be that hard to implement it as a special state)
A lot of the details of the Controller were found in slides etc and not in the paper so I had to take many liberties when re-implementing it.
Learning rate, regularization strength, discount factor, exploration, beta value for EWA accuracy, clipping range etc are all random values (which make somewhat sense to me)
Single GPU model only. There would need to be a lot of modifications to this for multi GPU training (and I have just 1)

Implementation details were found from:

http://rll.berkeley.edu/deeprlcoursesp17/docs/quoc_barret.pdf

Result

I tried a toy CNN model with 4 CNN layers with different filter sizes (16, 32, 64) and kernel sizes (1, 3) to maximise score in 10 epochs of training on CIFAR-10.

After 50 steps, it converges to the "state space" of (3x3, 64)-(3x3, 64)-(3x3, 32)-(3x3, 64). Interestingly, this model performs very slightly better than a 4 x (3x3, 64) model, at least in the first 10 epochs.

Requirements

Keras >= 1.2.1
Tensorflow-gpu >= 1.2

Acknowledgements

Code heavily inspired by wallarm/nascell-automl

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
images		images
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
controller.py		controller.py
manager.py		manager.py
model.py		model.py
nascell.py		nascell.py
train.py		train.py
train_history.csv		train_history.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Neural Architecture Search with Controller RNN

Usage

Implementation details

Result

Requirements

Acknowledgements

About

Releases

Packages

Contributors 2

Languages

License

titu1994/neural-architecture-search

Folders and files

Latest commit

History

Repository files navigation

Neural Architecture Search with Controller RNN

Usage

Implementation details

Result

Requirements

Acknowledgements

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages