This is the implementation of the network based on best normal and reduction cells found in DARTS paper.
Normal cell keeps the feature maps size same and channels as well. It uses depthwise seperable convolutional operations and dilation separable convolution. All operations have stride = 1, kernel size = 3 and padding to preserve the spatial size.
Reduction cell mains to reduce the spatial size by half and double the number of channels of inputs. Max pooling is used to reduce the spatial size and conv layer with 1x1 kernel is used to double the channels.
Normal cell and reduction cells are stacked to make a network which is trained on CIFAR10 dataset.
In the figure above, cell with stride 1 is normal cell and the one with stride 2 is reduction cell. In this implementation N = 2, so there are 8 layers in the network, 2 reduction cells and 6 normal cells.
As given in the paper, 1x1 convolutional operations are inserted to make the inputs compatible for further operations in the cells.
Network is trained with hyperparameters used by authors such as learning rate, optimizer and scheduler.
Due to less computation power available network is trained for 8 epochs only after which it achieves 78% accuracy on test data