Combine Recurrent Neural Network and Convolutional layer on audio data. The target keyword is "雅婷姊".
I. One convolutional layer
II. Two GRU layers
III. Several dropout and batch normalization layer
IV. Fully-connected layer with sigmoid
I. One convolutional layer
II. Two GRU layers and modified first into bi-directional and residual
III. Several dropout and batch normalization layer
IV. Fully-connected layer with sigmoid