Yet Another (JAX) Transformer#

Following along this book, you will implement every component of the Transformer architecture in JAX. There won’t be no tricks or custom things, we will stick to the design as detailed in Attention Is All You Need.

Warning

The document is currently WIP 🙃

Why YAJT?#

There exist excellent walkthroughs on the Transformer architecture (e.g., [1], [2]). YAJT builds on those, but we also tried to

  1. improve the educational impact. Along the way, we will touch fundamental ML/NLP topics such as gradient-based optimization or text tokenization and let you train your brand new transformers to solve language modeling and machine translation. Moreover, we briefly touch upon social implications and gender bias.

  2. implement everything, from scratch, and in a low-level JAX.

Credits#

Authors: Giuseppe Attanasio and Moreno La Quatra.

The content of this book was originally devised as NLP tutorial of the second Mediterranean Machine Learning Summer School by the AI Education Foundation.

Index#