FirefliesAudio

🏠 Home ❤️ Liked ⏳ History

Otimização de Inferência LLM #2: Paralelismo de Tensores, Dados e Especialistas (TP, DP, EP, MoE)

⏱ 20:18 | 👁 4,4 mil visualizações | 🗓 7 months ago

🎵 Baixar MP3 🎥 Baixar MP4

Vídeos relacionados

baixar AI Optimization Lecture 01 - Prefill vs Decode - Mastering LLM Techniques from NVIDIA mp3

AI Optimization Lecture 01 - Prefill vs Decode - Mastering LLM Techniques from NVIDIA

14k • 11 months ago

baixar Why Inference is hard.. mp3

Why Inference is hard..

158k • 1 month ago

baixar Mixture of Experts (MoE), Visually Explained mp3

Mixture of Experts (MoE), Visually Explained

31k • 3 months ago

baixar Lecture 48: The Ultra Scale Playbook mp3

Lecture 48: The Ultra Scale Playbook

10k • 1 year ago

baixar Ultra-scale playbook, ch.3.1 -

Ultra-scale playbook, ch.3.1 - "Tensor Parallelism"

361 • 6 months ago

baixar What is Mixture of Experts? mp3

What is Mixture of Experts?

58k • 1 year ago

baixar Mastering LLM Inference Optimization From Theory to Cost Effective Deployment: Mark Moyou mp3

Mastering LLM Inference Optimization From Theory to Cost Effective Deployment: Mark Moyou

46k • 1 year ago

baixar Exploring the Latency/Throughput & Cost Space for LLM Inference // Timothée Lacroix // CTO Mistral mp3

Exploring the Latency/Throughput & Cost Space for LLM Inference // Timothée Lacroix // CTO Mistral

28k • 2 years ago

baixar AI Optimization Lecture 3: Distillation, Pruning, and Quantization mp3

AI Optimization Lecture 3: Distillation, Pruning, and Quantization

1.4k • 7 months ago

baixar Fast LLM Serving with vLLM and PagedAttention mp3

Fast LLM Serving with vLLM and PagedAttention

65k • 2 years ago

baixar How the VLLM inference engine works? mp3

How the VLLM inference engine works?

21k • 8 months ago

baixar How LLMs use multiple GPUs mp3

How LLMs use multiple GPUs

11k • 9 months ago