Blogs#
A Journey Through Time- The Transformation of AI Development - March 01, 2023
I started studying about Machine Learning in 2016, when I took a famous course by Andrew Ng on Machine Learning. After a lot of breaks in between, I was able to complete it and learnt the basics.
A society of Generative AI agents ! - August 19, 2023
Imagine, you visit a place where a group of bots have created a society of their own. They have their own shops, homes, schools and they interact with knowns and unknowns, build relationships, do get-togethers, plan a party or say they create an entire ecosystem of their own just like humans do. It may be hard to imagine but fun to think of.
LoRA (Low Rank Adaptation of Large Language Models) - April 10, 2024
Before diving into LoRA, it’s essential to grasp the concept of Matrix Decomposition.
Prompt Tuning - April 10, 2024
Training large pretrained language models is very time-consuming and compute-intensive. As they continue to grow in size, there is increasing interest in more efficient training methods such as prompting.
DeepSeek-VL2: A Powerful Vision-Language Model for Multimodal Understanding - December 18, 2024
This blog post dives into the exciting advancements of DeepSeek-VL2, a new series of open-source Vision-Language Models (VLMs). We’ll explore its architecture, training methodology, and impressive performance across diverse multimodal tasks, including visual question answering, optical character recognition, and document understanding. The model’s innovative approach to handling high-resolution images and its efficient Mixture-of-Experts (MoE) architecture make it a significant leap forward in the field.
Byte Latent Transformer: A New Approach to Language Processing - December 22, 2024
Let’s understand “Byte Latent Transformer: Patches Scale Better Than Tokens.” This paper introduces a novel approach to handling large language models tokens, potentially offering significant improvements in efficiency and performance. We will explore the core concepts, methodology, and potential implications of this research, drawing insights from the original paper and related discussions.
Building Reliable Local Agents with LangGraph, LLaMA3, and Elasticsearch - December 25, 2024
This post details the construction of a robust local agent, leveraging the capabilities of LangGraph, LLaMA3, and Elasticsearch. We address the challenge of creating reliable local agents, which often suffer from inconsistencies and inaccuracies due to their reliance on limited context. This blog post is a companion piece to the original article published by Elastic Search Labs, which can be found here: https://www.elastic.co/search-labs/blog/local-rag-agent-elasticsearch-langgraph-llama3. Our solution employs LangGraph to orchestrate the agent’s workflow, LLaMA3 for its reasoning and generation capabilities, and Elasticsearch for efficient and accurate information retrieval. This combination allows for a more dependable agent, capable of handling complex tasks with improved performance.
DeepSeek-V3: A Technical Overview of a Novel Mixture-of-Experts Model - December 27, 2024
DeepSeek-V3 emerges as a significant advancement in the field of large language models (LLMs). This model, detailed in the accompanying technical report, employs a Mixture-of-Experts (MoE) architecture, a design choice that allows for both high performance and computational efficiency. The architecture is composed of a number of expert sub-networks, which are activated conditionally based on the input, enabling the model to scale effectively. The training process involved a large corpus of text and code data, leveraging a combination of supervised learning and reinforcement learning techniques. DeepSeek-V3 demonstrates strong performance across a range of benchmarks, while maintaining a focus on efficient inference. The model is available on GitHub, providing access to both the model weights and the technical documentation.
Qwen 2.5 Coder Models: Enhanced Code Generation and Reasoning - December 27, 2024
The landscape of code generation is constantly evolving, demanding models that can not only produce syntactically correct code but also understand complex logic and reasoning. The Qwen 2.5 Coder models address this challenge by providing advanced capabilities in code synthesis and comprehension. These models, available in various parameter sizes, offer a solution for developers seeking more robust and reliable code generation tools. This introduction highlights the key problem these models solve and the solution they provide. For those interested in exploring these models, a comprehensive collection is available on Hugging Face: Qwen 2.5 Coder All Versions. Additionally, a Google Colab notebook is provided for hands-on experimentation: Qwen 2.5 Coder Colab.