Posts tagged Mixture-of-Experts

DeepSeek-V3: A Technical Overview of a Novel Mixture-of-Experts Model

27 December 2024

DeepSeek-V3 emerges as a significant advancement in the field of large language models (LLMs). This model, detailed in the accompanying technical report, employs a Mixture-of-Experts (MoE) architecture, a design choice that allows for both high performance and computational efficiency. The architecture is composed of a number of expert sub-networks, which are activated conditionally based on the input, enabling the model to scale effectively. The training process involved a large corpus of text and code data, leveraging a combination of supervised learning and reinforcement learning techniques. DeepSeek-V3 demonstrates strong performance across a range of benchmarks, while maintaining a focus on efficient inference. The model is available on GitHub, providing access to both the model weights and the technical documentation.

Read more ...

DeepSeek-VL2: A Powerful Vision-Language Model for Multimodal Understanding

18 December 2024

This blog post dives into the exciting advancements of DeepSeek-VL2, a new series of open-source Vision-Language Models (VLMs). We’ll explore its architecture, training methodology, and impressive performance across diverse multimodal tasks, including visual question answering, optical character recognition, and document understanding. The model’s innovative approach to handling high-resolution images and its efficient Mixture-of-Experts (MoE) architecture make it a significant leap forward in the field.

Read more ...

Posts tagged Mixture-of-Experts

DeepSeek-V3: A Technical Overview of a Novel Mixture-of-Experts Model

DeepSeek-VL2: A Powerful Vision-Language Model for Multimodal Understanding

Recent Posts

Categories

Tags