FirefliesAudio

🏠 Home ❤️ Liked ⏳ History

Direct Preference Optimization (DPO) - How to fine-tune LLMs directly without reinforcement learning

⏱ 21:15 | 👁 34 mil visualizações | 🗓 2 years ago

🎵 Baixar MP3 🎥 Baixar MP4

Vídeos relacionados

baixar Direct Preference Optimization (DPO) - How to fine-tune LLMs directly without reinforcement learning mp3

Direct Preference Optimization (DPO) - How to fine-tune LLMs directly without reinforcement learning

34k • 1 year ago

baixar A friendly introduction to deep reinforcement learning, Q-networks and policy gradients mp3

A friendly introduction to deep reinforcement learning, Q-networks and policy gradients

143k • 5 years ago

baixar Proximal Policy Optimization (PPO) & Group Relative Policy Optimization (GRPO) | Paper Explained mp3

Proximal Policy Optimization (PPO) & Group Relative Policy Optimization (GRPO) | Paper Explained

6.2k • 7 months ago

baixar Reinforcement Learning with Human Feedback (RLHF) - How to train and fine-tune Transformer Models mp3

Reinforcement Learning with Human Feedback (RLHF) - How to train and fine-tune Transformer Models

35k • 2 years ago

baixar MIT 6.S191: Secrets of Massively Parallel Training mp3

MIT 6.S191: Secrets of Massively Parallel Training

6.4k • 9 days ago

baixar The FASTEST introduction to Reinforcement Learning on the internet mp3

The FASTEST introduction to Reinforcement Learning on the internet

459k • 1 year ago

baixar Proximal Policy Optimization (PPO) is Easy With PyTorch | Full PPO Tutorial mp3

Proximal Policy Optimization (PPO) is Easy With PyTorch | Full PPO Tutorial

87k • 5 years ago

baixar Stable Diffusion - How to build amazing images with AI mp3

Stable Diffusion - How to build amazing images with AI

25k • 2 years ago

baixar Reinforcement Learning from Human Feedback explained with math derivations and the PyTorch code. mp3

Reinforcement Learning from Human Feedback explained with math derivations and the PyTorch code.

71k • 2 years ago

baixar RLHF in 90 min mp3

RLHF in 90 min

5.8k • 8 months ago

baixar GRPO - Group Relative Policy Optimization - How DeepSeek trains reasoning models mp3

GRPO - Group Relative Policy Optimization - How DeepSeek trains reasoning models

12k • 1 year ago

baixar Stanford CS229 I Machine Learning I Building Large Language Models (LLMs) mp3

Stanford CS229 I Machine Learning I Building Large Language Models (LLMs)

2.1m • 1 year ago