romanohu
Home About Page Search
Home  »  Pages  »  Research  »  Book_src  »  03_research  »  Authors  »  Overseas  »  Noam Shazeer

Noam Shazeer

論文

2017

  • Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer

2020

  • GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding

2022

  • Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
  • ST-MoE: Designing Stable and Transferable Sparse Expert Models
@2026 romanohu | links |