Noam Shazeer

論文

2017

Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer

2020

GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding

2022

Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
ST-MoE: Designing Stable and Transferable Sparse Expert Models

Type to search.