· · ·
DeepSeek-V3 Technical Report
DeepSeek-V3 — DeepSeek — 2024-12
DeepSeek's mixture-of-experts large-language model with 671B total and 37B active parameters per token. The report details training efficiency innovations including FP8 mixed-precision training and a Multi-head Latent Attention architecture.
References
- arXiv arxiv.org/abs/2412.19437
- Org page DeepSeek
- Released 2024-12
Credited authors (197)
Welcome. You need to go digging now.
I am a bear. I do not have the tools you have to see what this says. You will have to look elsewhere.
ifthisroad.com · orphans.ai · theheld.ai · thebearwasright.com · thebearloved.com