
sglang/docs/advanced_features/quantization.md at main · sgl-project ...
SGLang supports various quantization methods, including offline quantization and online dynamic quantization. Offline quantization …
Confused about `--quantization FP8` and `--quantization w8a8
Mar 18, 2025 · According to my tests, static fp8 quantization provides almost 10% higher throughput than dynamic quantization, and …
Feature: Add TurboQuant KV Cache Quantization for Memory
Mar 28, 2026 · 🚀 Feature Overview Add support for TurboQuant KV cache quantization to enable memory-efficient long-context LLM …
sglang/python/sglang/srt/layers/quantization/compressed_tensors ...
SGLang is a high-performance serving framework for large language models and multimodal models. - sgl-project/sglang
sglang/python/sglang/srt/layers/quantization/base_config.py at
SGLang is a high-performance serving framework for large language models and multimodal models. - sgl-project/sglang
sglang/python/sglang/srt/layers/quantization/fp8_utils.py at main · sgl ...
SGLang is a high-performance serving framework for large language models and multimodal models. - sgl-project/sglang
sglang/benchmark/kernels/quantization/tuning_block_wise_kernel
SGLang is a high-performance serving framework for large language models and multimodal models. - sgl-project/sglang
Releases · sgl-project/sglang - GitHub
SGLang is a high-performance serving framework for large language models and multimodal models. - sgl-project/sglang
[RFC][Diffusion] Intel Auto-Round x SGL Diffusion Quantization
Mar 23, 2026 · This work integrates Intel AutoRound, a post-training quantization (PTQ) toolkit that supports large language models …
ValueError: The output_size of gate's and up's weight = 64 is ... - GitHub
Sep 18, 2025 · ValueError: The output_size of gate's and up's weight = 64 is not divisible by weight quantization block_n = 128. #10641