DistServe: desagregando o preenchimento prévio e a decodificação para inferência LLM otimizada pa...

⏱ 32:03 | 👁 5,5 mil visualizações | 🗓 Streamed 1 year ago
🎵 Baixar MP3 🎥 Baixar MP4

Vídeos relacionados

baixar verl: Flexible and Scalable Reinforcement Learning Library for LLM Reasoning and Tool-Calling mp3 1:04:07

verl: Flexible and Scalable Reinforcement Learning Library for LLM Reasoning and Tool-Calling

5.4k • Streamed 9 months ago
baixar How does batching work on modern GPUs? mp3 33:29

How does batching work on modern GPUs?

3.6k • Streamed 1 year ago
baixar PyTorch Expert Exchange: Efficient Generative Models: From Sparse to Distributed Inference mp3 39:23

PyTorch Expert Exchange: Efficient Generative Models: From Sparse to Distributed Inference

2.1k • Streamed 1 year ago
baixar The Science and Practice of Open and Scalable LLM Evaluations - Grzegorz Chlebus, NVIDIA mp3 26:53

The Science and Practice of Open and Scalable LLM Evaluations - Grzegorz Chlebus, NVIDIA

397 • 1 month ago
baixar High Performance LLM Inference in Production mp3 1:09:32

High Performance LLM Inference in Production

852 • 3 months ago
baixar Lecture 58: Disaggregated LLM Inference mp3 1:15:19

Lecture 58: Disaggregated LLM Inference

6.4k • 1 year ago
baixar Helion 1.0: A High-Level DSL for Performance Portable Kernels - Oguz Ulgen, Meta mp3 24:27

Helion 1.0: A High-Level DSL for Performance Portable Kernels - Oguz Ulgen, Meta

215 • 1 month ago
baixar Efficient Streaming Language Models with Attention Sinks mp3 35:50

Efficient Streaming Language Models with Attention Sinks

1.9k • Streamed 1 year ago
baixar Keynote: PyTorch 2.1 Technical Deep Dive - Mario, Mark, Mergen, Joe, Peng, Will, Yanan mp3 53:46

Keynote: PyTorch 2.1 Technical Deep Dive - Mario, Mark, Mergen, Joe, Peng, Will, Yanan

6.2k • 2 years ago
baixar Scaling Parallel Algorithms to Massive Datasets using Multi-SSD Machines mp3 1:04:00

Scaling Parallel Algorithms to Massive Datasets using Multi-SSD Machines

25 • 5 days ago
baixar SGLang: An Efficient Open-Source Framework for Large-Scale LLM Serving - Liangsheng Yin mp3 19:37

SGLang: An Efficient Open-Source Framework for Large-Scale LLM Serving - Liangsheng Yin

1.2k • 10 months ago
baixar LLM Inference Optimization #2: Tensor, Data & Expert Parallelism (TP, DP, EP, MoE) mp3 20:18

LLM Inference Optimization #2: Tensor, Data & Expert Parallelism (TP, DP, EP, MoE)

4.4k • 7 months ago