themanybuilders.com

· · ·

Mixtral of Experts

Mixtral 8x7B — Mistral AI — 2024-01

Mistral's sparse mixture-of-experts model with eight expert blocks per layer and two active per token. The report introduced an open-weights MoE architecture with strong performance at low active-parameter cost.

References

Credited authors (26)