AI Inference Optimization Engineering: Quantization, Speculative Decoding, and Hardware-Specific LLM Deployment (Production AI Engineering Series) - Softcover

Buch 6�von 20: Production AI Engineering Series

Team, ChatVariety

9798199720021: AI Inference Optimization Engineering: Quantization, Speculative Decoding, and Hardware-Specific LLM Deployment (Production AI Engineering Series)

Softcover

ISBN 13: 9798199720021

Verlag: Independently published, 2026

Alle Exemplare dieser ISBN-Ausgabe

0 Gebraucht

3 Neu

Von EUR 14,14

Slash LLM Deployment Costs and Latency

Deploying Large Language Models (LLMs) in production is a massive economic and engineering hurdle. AI Inference Optimization Engineering is your comprehensive, hands-on guide to mastering the full stack of modern LLM optimization techniques. From memory-bandwidth solutions to hardware-specific compilation, this book bridges the gap between research-level models and enterprise-grade execution.

What you will master inside this book:

Hardware-Aware Optimization: Dive deep into KV cache mechanics, autoregressive decoding, and GPU memory hierarchies to eliminate latency bottlenecks.
State-of-the-Art Quantization: Apply GPTQ, AWQ, and GGUF compression algorithms to scale down massive neural networks without sacrificing model accuracy.
Advanced Acceleration Methods: Implement speculative decoding with draft models (like Medusa and Eagle), PagedAttention, and FlashAttention to boost throughput by 2-3x.
Production-Grade Serving: Build ultra-low-latency deployment infrastructures using vLLM, Triton Inference Server, and continuous batching.
Cross-Platform Deployment: Optimize models for specific target hardware, including NVIDIA H100 (TensorRT-LLM), Apple Silicon (llama.cpp/Metal), and Qualcomm mobile/edge accelerators.

Whether you are an ML infrastructure engineer, an AI platform architect, or a technical leader looking to scale LLMs cost-effectively, this book provides the production-ready code, equations, and architectural patterns you need to build hyper-efficient AI pipelines.

Die Inhaltsangabe kann sich auf eine andere Ausgabe dieses Titels beziehen.

Verlag: Independently published
Erscheinungsdatum: 2026
Sprache: Englisch
ISBN 13: 9798199720021
Einband: Taschenbuch
Anzahl der Seiten: 95
Kontakt zum Hersteller: Manufactured by Amazon on behalf of the author
https://www.amazon.de/hz/contact-us

c/o Amazon Media EU S.�.r.l., 38 Avenue John F. Kennedy
Luxembourg
L-1855
Luxemburg

Suchergebnisse f�r AI Inference Optimization Engineering: Quantization,...

Beispielbild f�r diese ISBN

AI Inference Optimization Engineering

Team, Chatvariety

Verlag: Independently published, 2026

ISBN 13: 9798199720021

Neu PAP

Anbieter: PBShop.store US, Wood Dale, IL, USA

Verk�uferbewertung 5 von 5 Sternen

PAP. Zustand: New. New Book. Shipped from UK. Established seller since 2000. Artikel-Nr. L2-9798199720021

Verk�ufer kontaktieren

Neu kaufen

EUR 14,14

Versand gratis
Versand innerhalb von USA

Anzahl: Mehr als 20 verf�gbar

In den Warenkorb

Beispielbild f�r diese ISBN

AI Inference Optimization Engineering

Team, Chatvariety

Verlag: Independently published, 2026

ISBN 13: 9798199720021

Neu PAP

Anbieter: PBShop.store UK, Fairford, GLOS, Vereinigtes K�nigreich

Verk�uferbewertung 5 von 5 Sternen

PAP. Zustand: New. New Book. Shipped from UK. Established seller since 2000. Artikel-Nr. L2-9798199720021

Verk�ufer kontaktieren

Neu kaufen

EUR 13,42

EUR 3,85 Versand
Versand von Vereinigtes K�nigreich nach USA

Anzahl: Mehr als 20 verf�gbar

In den Warenkorb

Beispielbild f�r diese ISBN

AI Inference Optimization Engineering : Quantization, Speculative Decoding, and Hardware-Specific LLM Deployment

Chatvariety Team

Verlag: Independently Published Jun 2026, 2026

ISBN 13: 9798199720021

Neu Taschenbuch

Anbieter: AHA-BUCH GmbH, Einbeck, Deutschland

Verk�uferbewertung 5 von 5 Sternen

Taschenbuch. Zustand: Neu. Neuware - Slash LLM Deployment Costs and LatencyDeploying Large Language Models (LLMs) in production is a massive economic and engineering hurdle. AI Inference Optimization Engineering is your comprehensive, hands-on guide to mastering the full stack of modern LLM optimization techniques. From memory-bandwidth solutions to hardware-specific compilation, this book bridges the gap between research-level models and enterprise-grade execution.What you will master inside this book: - Hardware-Aware Optimization: Dive deep into KV cache mechanics, autoregressive decoding, and GPU memory hierarchies to eliminate latency bottlenecks.- State-of-the-Art Quantization: Apply GPTQ, AWQ, and GGUF compression algorithms to scale down massive neural networks without sacrificing model accuracy.- Advanced Acceleration Methods: Implement speculative decoding with draft models (like Medusa and Eagle), PagedAttention, and FlashAttention to boost throughput by 2-3x.- Production-Grade Serving: Build ultra-low-latency deployment infrastructures using vLLM, Triton Inference Server, and continuous batching.- Cross-Platform Deployment: Optimize models for specific target hardware, including NVIDIA H100 (TensorRT-LLM), Apple Silicon (llama.cpp/Metal), and Qualcomm mobile/edge accelerators.Whether you are an ML infrastructure engineer, an AI platform architect, or a technical leader looking to scale LLMs cost-effectively, this book provides the production-ready code, equations, and architectural patterns you need to build hyper-efficient AI pipelines. Artikel-Nr. 9798199720021

Verk�ufer kontaktieren

Neu kaufen

EUR 13,00

EUR 60,71 Versand
Versand von Deutschland nach USA

Anzahl: 2 verf�gbar

In den Warenkorb

Verwandte Artikel zu AI Inference Optimization Engineering: Quantization,...

AI Inference Optimization Engineering: Quantization, Speculative Decoding, and Hardware-Specific LLM Deployment (Production AI Engineering Series) - Softcover

Team, ChatVariety

Inhaltsangabe

Suchergebnisse f�r AI Inference Optimization Engineering: Quantization,...

AI Inference Optimization Engineering

Neu kaufen

AI Inference Optimization Engineering

Neu kaufen

AI Inference Optimization Engineering : Quantization, Speculative Decoding, and Hardware-Specific LLM Deployment

Neu kaufen