[Book] [Sebastian Raschka] Build a Large Language Model (From Scratch) [ENG, 2024]

Build a Large Language Model (From Scratch)

github:
https://github.com/rasbt/LLMs-from-scratch

[Videos] [Sebastian Raschka and Abhinav Kimothi] Master and Build Large Language Models [ENG, 54 Lessons (17h 15m) | 2.91 GB]

Master and Build Large Language Models

https://www.manning.com/livevideo/master-and-build-large-language-models

U01M01-Python-Environment-Setup-Video

Можно на оф.сайте заценить.

https://livevideo.manning.com/module/1820_1_1/master-and-build-large-language-models/chapter-1—understanding-large-language-models/python-environment-setup-video?

1.1. Python Environment Setup Video

$ mkdir -p ~/projects/dev/ml/llm
$ cd ~/projects/dev/ml/llm
$ git clone https://github.com/rasbt/LLMs-from-scratch
$ cd LLMs-from-scratch/

$ pip install uv
$ uv venv --python=python3.10
$ source .venv/bin/activate
$ uv pip install -r requirements.txt
$ uv run jupyter lab

1.2. Foundations to Build a Large Language Model (From Scratch)

1. Prerequisites to Chapter 2
2. Tokenizing text 
3. Converting tokens into token IDs
4. Adding special context tokens
5. Byte pair encoding 
6. Data sampling with a sliding window 
7. Creating token embeddings
8. Encoding word positions 

1. Prerequisites to Chapter 3 
2. A simple self-attention mechanism without trainable weights | Part 1 
3. A simple self-attention mechanism without trainable weights | Part 2 
4. Computing the attention weights step by step 
5. Implementing a compact self-attention Python class
6. Applying a causal attention mask 
7. Masking additional attention weights with dropout
8. Implementing a compact causal self-attention class
9. Stacking multiple single-head attention layers 
10. Implementing multi-head attention with weight splits 

1. Prerequisites to Chapter 4
2. Coding an LLM architecture 
3. Normalizing activations with layer normalization 
4. Implementing a feed forward network with GELU activations 
5. Adding shortcut connections 
6. Connecting attention and linear layers in a transformer block 
7. Coding the GPT model 
8. Generating text 

1. Prerequisites to Chapter 5 
2. Using GPT to generate text 
3. Calculating the text generation loss: cross entropy and perplexity 
4. Calculating the training and validation set losses 
5. Training an LLM 
6. Decoding strategies to control randomness
7. Temperature scaling 
8. Top-k sampling
9. Modifying the text generation function 
10. Loading and saving model weights in PyTorch
11. Loading pretrained weights from OpenAI 

1. Prerequisites to Chapter 6 
2. Preparing the dataset 
3. Creating data loaders 
4. Initializing a model with pretrained weights 
5. Adding a classification head 
6. Calculating the classification loss and accuracy 
7. Fine-tuning the model on supervised data 
8. Using the LLM as a spam classifier 

1. Preparing a dataset for supervised instruction fine-tuning 
2. Organizing data into training batches 
3. Creating data loaders for an instruction dataset
4. Loading a pretrained LLM
5. Fine-tuning the LLM on instruction data 
6. Extracting and saving responses
7. Evaluating the fine-tuned LLM