* Add grad checkpointing and PE in the transformer example * Remove other frameworks from LM example * Remove the other frameworks from MNIST example * Improve the transformer LM example * Fix black and change LR