· · ·
Mixtral of Experts
Mixtral 8x7B — Mistral AI — 2024-01
Mistral's sparse mixture-of-experts model with eight expert blocks per layer and two active per token. The report introduced an open-weights MoE architecture with strong performance at low active-parameter cost.
References
- arXiv arxiv.org/abs/2401.04088
- Org page Mistral AI
- Released 2024-01