FirefliesAudio

🏠 Home ❤️ Liked ⏳ History

Understanding the LLM Inference Workload - Mark Moyou, NVIDIA

⏱ 34:14 | 👁 27 mil visualizações | 🗓 1 year ago

🎵 Baixar MP3 🎥 Baixar MP4

Vídeos relacionados

baixar Mastering LLM Inference Optimization From Theory to Cost Effective Deployment: Mark Moyou mp3

Mastering LLM Inference Optimization From Theory to Cost Effective Deployment: Mark Moyou

46k • 1 year ago

baixar DistServe: disaggregating prefill and decoding for goodput-optimized LLM inference mp3

DistServe: disaggregating prefill and decoding for goodput-optimized LLM inference

5.5k • Streamed 1 year ago

baixar How does batching work on modern GPUs? mp3

How does batching work on modern GPUs?

3.6k • Streamed 1 year ago

baixar Introducing NVIDIA Dynamo: Low-Latency Distributed Inference for Scaling Reasoning LLMs mp3

Introducing NVIDIA Dynamo: Low-Latency Distributed Inference for Scaling Reasoning LLMs

12k • 1 year ago

baixar High Performance LLM Inference in Production mp3

High Performance LLM Inference in Production

852 • 3 months ago

baixar Understanding LLM Inference | NVIDIA Experts Deconstruct How AI Works mp3

Understanding LLM Inference | NVIDIA Experts Deconstruct How AI Works

25k • Streamed 2 years ago

baixar vLLM: Easy, Fast, and Cheap LLM Serving for Everyone - Woosuk Kwon & Xiaoxuan Liu, UC Berkeley mp3

vLLM: Easy, Fast, and Cheap LLM Serving for Everyone - Woosuk Kwon & Xiaoxuan Liu, UC Berkeley

12k • 1 year ago

baixar Transformers, the tech behind LLMs | Deep Learning Chapter 5 mp3

Transformers, the tech behind LLMs | Deep Learning Chapter 5

10m • 2 years ago

baixar Accelerating LLM Inference with vLLM mp3

Accelerating LLM Inference with vLLM

27k • 1 year ago

baixar Why Inference is hard.. mp3

Why Inference is hard..

158k • 1 month ago

baixar From model weights to API endpoint with TensorRT LLM: Philip Kiely and Pankaj Gupta mp3

From model weights to API endpoint with TensorRT LLM: Philip Kiely and Pankaj Gupta

5.2k • 1 year ago

baixar AI Optimization Lecture 01 - Prefill vs Decode - Mastering LLM Techniques from NVIDIA mp3

AI Optimization Lecture 01 - Prefill vs Decode - Mastering LLM Techniques from NVIDIA

14k • 11 months ago