Noam Shazeer

論文

2017

  • Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer

2020

  • GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding

2022

  • Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
  • ST-MoE: Designing Stable and Transferable Sparse Expert Models

./ ../pages

Type to search.