Posts tagged Mixture-of-Experts
DeepSeek-V3: A Technical Overview of a Novel Mixture-of-Experts Model
- 27 December 2024
DeepSeek-V3 emerges as a significant advancement in the field of large language models (LLMs). This model, detailed in the accompanying technical report, employs a Mixture-of-Experts (MoE) architecture, a design choice that allows for both high performance and computational efficiency. The architecture is composed of a number of expert sub-networks, which are activated conditionally based on the input, enabling the model to scale effectively. The training process involved a large corpus of text and code data, leveraging a combination of supervised learning and reinforcement learning techniques. DeepSeek-V3 demonstrates strong performance across a range of benchmarks, while maintaining a focus on efficient inference. The model is available on GitHub, providing access to both the model weights and the technical documentation.
DeepSeek-VL2: A Powerful Vision-Language Model for Multimodal Understanding
- 18 December 2024
This blog post dives into the exciting advancements of DeepSeek-VL2, a new series of open-source Vision-Language Models (VLMs). We’ll explore its architecture, training methodology, and impressive performance across diverse multimodal tasks, including visual question answering, optical character recognition, and document understanding. The model’s innovative approach to handling high-resolution images and its efficient Mixture-of-Experts (MoE) architecture make it a significant leap forward in the field.