FirefliesAudio

🏠 Home ❤️ Liked ⏳ History

Direct Preference Optimization (DPO) explained: Bradley-Terry model, log probabilities, math

⏱ 48:46 | 👁 36 mil visualizações | 🗓 2 years ago

🎵 Baixar MP3 🎥 Baixar MP4

Vídeos relacionados

baixar What is RLHF? mp3

What is RLHF?

18k • 1 year ago

baixar The Elo Rating System mp3

The Elo Rating System

198k • 1 year ago

baixar Mamba and S4 Explained: Architecture, Parallel Scan, Kernel Fusion, Recurrent, Convolution, Math mp3

Mamba and S4 Explained: Architecture, Parallel Scan, Kernel Fusion, Recurrent, Convolution, Math

62k • 2 years ago

baixar Proximal Policy Optimization (PPO) - How to train Large Language Models mp3

Proximal Policy Optimization (PPO) - How to train Large Language Models

85k • 2 years ago

baixar Reinforcement Learning from Human Feedback explained with math derivations and the PyTorch code. mp3

Reinforcement Learning from Human Feedback explained with math derivations and the PyTorch code.

71k • 2 years ago

baixar Direct Preference Optimization (DPO) - How to fine-tune LLMs directly without reinforcement learning mp3

Direct Preference Optimization (DPO) - How to fine-tune LLMs directly without reinforcement learning

34k • 1 year ago

baixar Direct Preference Optimization (DPO) | Paper Explained mp3

Direct Preference Optimization (DPO) | Paper Explained

2.3k • 5 months ago

baixar ML Interpretability: feature visualization, adversarial example, interp. for language models mp3

ML Interpretability: feature visualization, adversarial example, interp. for language models

12k • 2 years ago

baixar Distributed Training with PyTorch: complete tutorial with cloud infrastructure and code mp3

Distributed Training with PyTorch: complete tutorial with cloud infrastructure and code

38k • 2 years ago

baixar [GRPO Explained] DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models mp3

[GRPO Explained] DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

171k • 1 year ago

baixar Stanford CS234 I Guest Lecture on DPO: Rafael Rafailov, Archit Sharma, Eric Mitchell I Lecture 9 mp3

Stanford CS234 I Guest Lecture on DPO: Rafael Rafailov, Archit Sharma, Eric Mitchell I Lecture 9

12k • 1 year ago

baixar Variational Autoencoder - Model, ELBO, loss function and maths explained easily! mp3

Variational Autoencoder - Model, ELBO, loss function and maths explained easily!

70k • 2 years ago