Understanding Pre-Training in Large Language Models

Understanding Pre-Training in Large Language Models

Pre-training is the phase where we teach a model how language works. Before a model can answer questions, write code, or chat with us, it needs to learn the structure and patterns of language. This learning happens during pre-training. From Text to Tokens Everything begins with raw text. For example: “The cat sat on the […]

An Introduction to Vision Language Model

AI application now a days are not only generating texts , but also images, audio and videos. The similar approach of transformer architecture is used in Vision language model also. We will see the comparison first. Here image patch is just a small square chunk of an image. Now a small chunk here is having […]

Tokens and logits relation in LLM

A token is a piece of text the model understands. It may be: Now each token will have a specific numerical value assigned. Token Token ID I 40 love 3047 machine 7342 learning 7524 Above is take from gptforworks website. Each LLM has its own tokenizer and token IDs. Where Logits Come In, Logits = […]

An understanding of Attention Head in LLM

The attention heads live inside the Transformer layer Each Transformer layer actually contains two main parts: 1 Multi-Head Self Attention2 Feed Forward Neural Network Attention heads operate inside each transformer layer to determine how tokens in the sentence relate to each other before the model predicts the next token. Related post – How LLM is […]

How logits are formed in LLM?

How logits are formed in LLM?

The model scores (logits) come from the last neural network layer of the transformer. After many transformer layers, the model produces a vector that represents the context of the sentence. “The sky is”↓Context vector[0.23, -1.2, 0.88, 2.1, …] Linear layer converts the vector to scores. Think of it as a huge table of weights that […]

Softmax and how LLM is predicting the next word

Softmax and how LLM is predicting the next word

When LLM predict the next word, it first gives scores for every possible words from the transformer stage of LLM. I like to drink __________ The model produces scores for words like Word Score tea 5.2 coffee 4.8 water 3.1 car -1.5 These numbers are called logits. But these are scores and not probabilities. The […]

Epoch in model training

In the simplest terms, an Epoch represents one complete pass of your entire training dataset through the the model. Imagine you are training an AI to recognize animals, and you have a folder of 1,000 images: Why Do We Need More Than One pass? You might think, “If the AI has seen the pictures once, […]

Loading a model in LM Studio and Running

Local LLMs (Large Language Models) are changing the game for developers, writers, and privacy advocates. Running a model on your own hardware means your data never leaves your computer, you can customize your experience, and you don’t need an active internet connection to chat. But sometimes, getting the model to load isn’t seamless. This post […]

Decoding how to select a language models for local setup

Lets start with Q4_K_M. An example of it is mistralai/mistral-7b-instruct-v0.3 Lets go through each parameters, Q4 Means 4-bit quantization. Original models are usually 16-bit or 32-bit precision.4-bit compresses them heavily → much smaller memory usage. K Means it uses K-quantization (grouped quantization). This is an improved method used such that M Means Medium variant of […]

Income Tax calculator

Here you can get the details about the new regime income tax slab rates for FY 2025-26 (Assessment Year 2026-27) under the Finance Act, 2025 in India, and how you can use an online calculator to estimate your tax liability. An income tax calculator is a tool that estimates how much tax you need to […]