About 56 results
Open links in new tab
  1. sglang/docs/advanced_features/quantization.md at main · sgl-project ...

    SGLang supports various quantization methods, including offline quantization and online dynamic quantization. Offline quantization …

  2. Confused about `--quantization FP8` and `--quantization w8a8

    Mar 18, 2025 · According to my tests, static fp8 quantization provides almost 10% higher throughput than dynamic quantization, and …

  3. Feature: Add TurboQuant KV Cache Quantization for Memory

    Mar 28, 2026 · 🚀 Feature Overview Add support for TurboQuant KV cache quantization to enable memory-efficient long-context LLM …

  4. sglang/python/sglang/srt/layers/quantization/compressed_tensors ...

    SGLang is a high-performance serving framework for large language models and multimodal models. - sgl-project/sglang

  5. sglang/python/sglang/srt/layers/quantization/base_config.py at

    SGLang is a high-performance serving framework for large language models and multimodal models. - sgl-project/sglang

  6. sglang/python/sglang/srt/layers/quantization/fp8_utils.py at main · sgl ...

    SGLang is a high-performance serving framework for large language models and multimodal models. - sgl-project/sglang

  7. sglang/benchmark/kernels/quantization/tuning_block_wise_kernel

    SGLang is a high-performance serving framework for large language models and multimodal models. - sgl-project/sglang

  8. Releases · sgl-project/sglang - GitHub

    SGLang is a high-performance serving framework for large language models and multimodal models. - sgl-project/sglang

  9. [RFC][Diffusion] Intel Auto-Round x SGL Diffusion Quantization

    Mar 23, 2026 · This work integrates Intel AutoRound, a post-training quantization (PTQ) toolkit that supports large language models …

  10. ValueError: The output_size of gate's and up's weight = 64 is ... - GitHub

    Sep 18, 2025 · ValueError: The output_size of gate's and up's weight = 64 is not divisible by weight quantization block_n = 128. #10641