4.8 KiB
4.8 KiB
🧠 LLM Mini Project — Step-by-Step Checklist
📦 0. Setup Environment
- Create a new project folder
- Set up a virtual environment
- Install core dependencies:
- torch
- transformers
- datasets
- accelerate
- peft (for LoRA later)
- bitsandbytes (for quantization later)
- Confirm GPU is available (
torch.cuda.is_available())
🔍 1. Understand the Problem (don’t skip this)
- Write down in your own words:
- What is a language model?
- What does “predict next token” actually mean?
- Manually inspect:
- A sample sentence
- Its tokenized form
- Verify:
- Input tokens vs target tokens (shifted by 1)
📚 2. Load Dataset
- Choose dataset:
- Start with WikiText-2
- Load dataset using
datasets - Print:
- A few raw samples
- Check:
- Dataset size
- Train/validation split
🔢 3. Tokenization
- Load GPT-2 tokenizer
- Tokenize dataset:
- Apply truncation
- Apply padding
- Verify:
- Shape of tokenized output
- Decode tokens back to text (sanity check)
🧱 4. Prepare Training Data
- Convert dataset to PyTorch format
- Create DataLoader:
- Set batch size (start small: 2–8)
- Confirm:
- Batches load correctly
- Tensor shapes are consistent
🤖 5. Load Model
- Load pretrained GPT-2 small
- Move model to GPU (if available)
- Print:
- Model size (parameters)
- Run a single forward pass to confirm:
- No errors
🔁 6. Build Training Loop (core understanding)
- Write your own training loop (no Trainer API yet)
- Include:
- Forward pass
- Loss calculation
- Backpropagation
- Optimizer step
- Print:
- Loss every few steps
📉 7. Observe Training Behaviour
- Track:
- Training loss over time
- Answer:
- Is loss decreasing?
- Is it noisy or stable?
- (Optional)
- Plot loss curve
🧪 8. Evaluate Model
- Generate text from model:
- Before training
- After training
- Compare:
- Coherence
- Structure
- Note:
- Any overfitting signs (repetition, memorization)
⚖️ 9. Try LoRA Fine-Tuning
- Add LoRA using
peft - Freeze base model weights
- Train only adapter layers
- Compare vs full fine-tuning:
- Speed
- Memory usage
- Output quality
🧠 10. Understand Convergence
- Identify:
- When loss plateaus
- Check validation loss:
- Does it increase? (overfitting)
- Write down:
- What “good training” looks like
⚙️ 11. Model Saving & Loading
- Save:
- Model weights
- Tokenizer
- Reload model
- Confirm:
- Outputs remain consistent
🚀 PART 2 — Infrastructure & Serving
🧠 12. Understand Inference Flow
- Write down:
- Steps from input → output
- Measure:
- Time taken for a single generation
⚡ 13. Optimize Inference
- Test batching:
- Multiple inputs at once
- Compare:
- Latency vs throughput
🧮 14. Apply Quantization
- Load model in:
- 8-bit
- (Optional) 4-bit
- Compare:
- Memory usage
- Speed
- Output quality
🖥️ 15. Simulate Real-World Usage
- Pretend you have:
- Multiple users hitting your model
- Think through:
- How would you queue requests?
- When would you batch?
- When would you scale?
☁️ 16. Understand Infra Concepts
- Research:
- GPU provisioning
- Autoscaling
- Model warm starts
- Understand:
- Why loading time matters
- Why GPUs shouldn’t sit idle
🧬 17. (Bonus) DICOM Exploration
- Learn:
- What DICOM files are
- Think:
- How LLMs could be used with medical data
- Note:
- Privacy + domain challenges
✍️ 18. Write Your Blog
Structure
- Introduction:
- What is an LLM really?
- Training:
- Tokenization
- Training loop
- Loss behaviour
- Fine-tuning:
- Full vs LoRA
- Challenges:
- What went wrong
- Infrastructure:
- Serving challenges
- Batching
- Quantization
- Key Learnings:
- What surprised you
- What actually matters
✅ Final Deliverables
- Working training script
- LoRA vs full fine-tune comparison
- Basic inference script
- Blog post (clear + honest)
- Notes showing your understanding
⚠️ Keep Yourself Honest
- Can you explain the training loop without looking?
- Do you understand why loss decreases?
- Can you explain batching vs latency tradeoffs?
- Do you know what would break at scale?