FirefliesAudio

🏠 Home ❤️ Liked ⏳ History

DistServe: desagregando o preenchimento prévio e a decodificação para inferência LLM otimizada pa...

⏱ 32:03 | 👁 5,5 mil visualizações | 🗓 Streamed 1 year ago

🎵 Baixar MP3 🎥 Baixar MP4

Vídeos relacionados

baixar verl: Flexible and Scalable Reinforcement Learning Library for LLM Reasoning and Tool-Calling mp3

verl: Flexible and Scalable Reinforcement Learning Library for LLM Reasoning and Tool-Calling

5.4k • Streamed 9 months ago

baixar How does batching work on modern GPUs? mp3

How does batching work on modern GPUs?

3.6k • Streamed 1 year ago

baixar PyTorch Expert Exchange: Efficient Generative Models: From Sparse to Distributed Inference mp3

PyTorch Expert Exchange: Efficient Generative Models: From Sparse to Distributed Inference

2.1k • Streamed 1 year ago

baixar The Science and Practice of Open and Scalable LLM Evaluations - Grzegorz Chlebus, NVIDIA mp3

The Science and Practice of Open and Scalable LLM Evaluations - Grzegorz Chlebus, NVIDIA

397 • 1 month ago

baixar High Performance LLM Inference in Production mp3

High Performance LLM Inference in Production

852 • 3 months ago

baixar Lecture 58: Disaggregated LLM Inference mp3

Lecture 58: Disaggregated LLM Inference

6.4k • 1 year ago

baixar Helion 1.0: A High-Level DSL for Performance Portable Kernels - Oguz Ulgen, Meta mp3

Helion 1.0: A High-Level DSL for Performance Portable Kernels - Oguz Ulgen, Meta

215 • 1 month ago

baixar Efficient Streaming Language Models with Attention Sinks mp3

Efficient Streaming Language Models with Attention Sinks

1.9k • Streamed 1 year ago

baixar Keynote: PyTorch 2.1 Technical Deep Dive - Mario, Mark, Mergen, Joe, Peng, Will, Yanan mp3

Keynote: PyTorch 2.1 Technical Deep Dive - Mario, Mark, Mergen, Joe, Peng, Will, Yanan

6.2k • 2 years ago

baixar Scaling Parallel Algorithms to Massive Datasets using Multi-SSD Machines mp3

Scaling Parallel Algorithms to Massive Datasets using Multi-SSD Machines

25 • 5 days ago

baixar SGLang: An Efficient Open-Source Framework for Large-Scale LLM Serving - Liangsheng Yin mp3

SGLang: An Efficient Open-Source Framework for Large-Scale LLM Serving - Liangsheng Yin

1.2k • 10 months ago

baixar LLM Inference Optimization #2: Tensor, Data & Expert Parallelism (TP, DP, EP, MoE) mp3

LLM Inference Optimization #2: Tensor, Data & Expert Parallelism (TP, DP, EP, MoE)

4.4k • 7 months ago