DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models Paper • 2309.14509 • Published Sep 25, 2023 • 17
GPT-Fathom: Benchmarking Large Language Models to Decipher the Evolutionary Path towards GPT-4 and Beyond Paper • 2309.16583 • Published Sep 28, 2023 • 13