Posts tagged LLM

DeepSeek-V3: A Technical Overview of a Novel Mixture-of-Experts Model

27 December 2024

DeepSeek-V3 emerges as a significant advancement in the field of large language models (LLMs). This model, detailed in the accompanying technical report, employs a Mixture-of-Experts (MoE) architecture, a design choice that allows for both high performance and computational efficiency. The architecture is composed of a number of expert sub-networks, which are activated conditionally based on the input, enabling the model to scale effectively. The training process involved a large corpus of text and code data, leveraging a combination of supervised learning and reinforcement learning techniques. DeepSeek-V3 demonstrates strong performance across a range of benchmarks, while maintaining a focus on efficient inference. The model is available on GitHub, providing access to both the model weights and the technical documentation.

Read more ...

Prompt Tuning

10 April 2024

Training large pretrained language models is very time-consuming and compute-intensive. As they continue to grow in size, there is increasing interest in more efficient training methods such as prompting.

Read more ...

LoRA (Low Rank Adaptation of Large Language Models)

10 April 2024

Before diving into LoRA, it’s essential to grasp the concept of Matrix Decomposition.

Read more ...

Posts tagged LLM

DeepSeek-V3: A Technical Overview of a Novel Mixture-of-Experts Model

Prompt Tuning

LoRA (Low Rank Adaptation of Large Language Models)

Recent Posts

Categories

Tags