Posts tagged LLM

DeepSeek-V3: A Technical Overview of a Novel Mixture-of-Experts Model

DeepSeek-V3 emerges as a significant advancement in the field of large language models (LLMs). This model, detailed in the accompanying technical report, employs a Mixture-of-Experts (MoE) architecture, a design choice that allows for both high performance and computational efficiency. The architecture is composed of a number of expert sub-networks, which are activated conditionally based on the input, enabling the model to scale effectively. The training process involved a large corpus of text and code data, leveraging a combination of supervised learning and reinforcement learning techniques. DeepSeek-V3 demonstrates strong performance across a range of benchmarks, while maintaining a focus on efficient inference. The model is available on GitHub, providing access to both the model weights and the technical documentation.

Read more ...


Prompt Tuning

Training large pretrained language models is very time-consuming and compute-intensive. As they continue to grow in size, there is increasing interest in more efficient training methods such as prompting.

Read more ...


LoRA (Low Rank Adaptation of Large Language Models)

Before diving into LoRA, it’s essential to grasp the concept of Matrix Decomposition.

Read more ...