- Understanding Pre-Training in Large Language Models
Pre-training is the phase where we teach a model how language works. Before a model can answer questions, write code, or chat with us, it needs to learn the structure and patterns of language. This learning happens during pre-training. From Text to Tokens Everything begins with raw text. For example: “The cat sat on the… Read more: Understanding Pre-Training in Large Language Models - An Introduction to Vision Language ModelAI application now a days are not only generating texts , but also images, audio and videos. The similar approach of transformer architecture is used in Vision language model also. We will see the comparison first. Here image patch is just a small square chunk of an image. Now a small chunk here is having… Read more: An Introduction to Vision Language Model
- Tokens and logits relation in LLMA token is a piece of text the model understands. It may be: Now each token will have a specific numerical value assigned. Token Token ID I 40 love 3047 machine 7342 learning 7524 Above is take from gptforworks website. Each LLM has its own tokenizer and token IDs. Where Logits Come In, Logits =… Read more: Tokens and logits relation in LLM
- An understanding of Attention Head in LLM
The attention heads live inside the Transformer layer Each Transformer layer actually contains two main parts: 1 Multi-Head Self Attention2 Feed Forward Neural Network Attention heads operate inside each transformer layer to determine how tokens in the sentence relate to each other before the model predicts the next token. Related post – How LLM is… Read more: An understanding of Attention Head in LLM - How logits are formed in LLM?
The model scores (logits) come from the last neural network layer of the transformer. After many transformer layers, the model produces a vector that represents the context of the sentence. “The sky is”↓Context vector[0.23, -1.2, 0.88, 2.1, …] Linear layer converts the vector to scores. Think of it as a huge table of weights that… Read more: How logits are formed in LLM?