Direct Preference Optimization (DPO) explained: Bradley-Terry model, log probabilities, math
⏱ 48:46 | 👁 36 mil visualizações | 🗓 2 years ago
19:39
What is RLHF?
18k • 1 year ago
22:13
The Elo Rating System
198k • 1 year ago
1:14:29
Mamba and S4 Explained: Architecture, Parallel Scan, Kernel Fusion, Recurrent, Convolution, Math
62k • 2 years ago
38:24
Proximal Policy Optimization (PPO) - How to train Large Language Models
85k • 2 years ago
2:15:13
Reinforcement Learning from Human Feedback explained with math derivations and the PyTorch code.
71k • 2 years ago
21:15
Direct Preference Optimization (DPO) - How to fine-tune LLMs directly without reinforcement learning
34k • 1 year ago
16:57
Direct Preference Optimization (DPO) | Paper Explained
2.3k • 5 months ago
1:00:15
ML Interpretability: feature visualization, adversarial example, interp. for language models
12k • 2 years ago
1:12:53
Distributed Training with PyTorch: complete tutorial with cloud infrastructure and code
38k • 2 years ago
1:09:00
[GRPO Explained] DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
171k • 1 year ago
1:18:44
Stanford CS234 I Guest Lecture on DPO: Rafael Rafailov, Archit Sharma, Eric Mitchell I Lecture 9
12k • 1 year ago
27:12
Variational Autoencoder - Model, ELBO, loss function and maths explained easily!
70k • 2 years ago