reichenbach/switch-transformer-classification

Tensorflow Keras Implementation of Switch Transformers for Text Classification.

This repo contains the models Switch Transformers for Text Classification.

Credits: Khalid Salama - Original Author

Background Information

Introduction

In this example, we demonstrates implementation of the Switch Transformer model for text classification. For the purpose of this example, we are imdb dataset present in Keras Module.

What is specialty of Switch Transformer ?

The Switch Transformer replaces the feed forward network (FFN) layer in the standard Transformer with a Mixture of Expert (MoE) routing layer, where each expert operates independently on the tokens in the sequence. This allows increasing the model size without increasing the computation needed to process each example.

Note that, for training the Switch Transformer efficiently, data and model parallelism need to be applied, so that expert modules can run simultaneously, each on its own accelerator. While the implementation described in the paper uses the TensorFlow Mesh framework for distributed training, this example presents a simple, non-distributed implementation of the Switch Transformer model for demonstration purposes.