Quantization Machine Learning

About 56 results

Open links in new tab

Any time

github.com
https://github.com
sglang/docs/advanced_features/quantization.md at main · sgl-project ...
SGLang supports various quantization methods, including offline quantization and online dynamic quantization. Offline quantization …
github.com
https://github.com
Confused about `--quantization FP8` and `--quantization w8a8
Mar 18, 2025 · According to my tests, static fp8 quantization provides almost 10% higher throughput than dynamic quantization, and …
github.com
https://github.com
Feature: Add TurboQuant KV Cache Quantization for Memory
Mar 28, 2026 · 🚀 Feature Overview Add support for TurboQuant KV cache quantization to enable memory-efficient long-context LLM …
github.com
https://github.com
sglang/python/sglang/srt/layers/quantization/compressed_tensors ...
SGLang is a high-performance serving framework for large language models and multimodal models. - sgl-project/sglang
github.com
https://github.com
sglang/python/sglang/srt/layers/quantization/base_config.py at
SGLang is a high-performance serving framework for large language models and multimodal models. - sgl-project/sglang
github.com
https://github.com
sglang/python/sglang/srt/layers/quantization/fp8_utils.py at main · sgl ...
SGLang is a high-performance serving framework for large language models and multimodal models. - sgl-project/sglang
github.com
https://github.com
sglang/benchmark/kernels/quantization/tuning_block_wise_kernel
SGLang is a high-performance serving framework for large language models and multimodal models. - sgl-project/sglang
github.com
https://github.com
Releases · sgl-project/sglang - GitHub
SGLang is a high-performance serving framework for large language models and multimodal models. - sgl-project/sglang
github.com
https://github.com
[RFC][Diffusion] Intel Auto-Round x SGL Diffusion Quantization
Mar 23, 2026 · This work integrates Intel AutoRound, a post-training quantization (PTQ) toolkit that supports large language models …
github.com
https://github.com
ValueError: The output_size of gate's and up's weight = 64 is ... - GitHub
Sep 18, 2025 · ValueError: The output_size of gate's and up's weight = 64 is not divisible by weight quantization block_n = 128. #10641

Pagination
- 1
- 2
- 3
- Next

sglang/docs/advanced_features/quantization.md at main · sgl-project ...

Confused about `--quantization FP8` and `--quantization w8a8

Feature: Add TurboQuant KV Cache Quantization for Memory

sglang/python/sglang/srt/layers/quantization/compressed_tensors ...

sglang/python/sglang/srt/layers/quantization/base_config.py at

sglang/python/sglang/srt/layers/quantization/fp8_utils.py at main · sgl ...

sglang/benchmark/kernels/quantization/tuning_block_wise_kernel

Releases · sgl-project/sglang - GitHub

[RFC][Diffusion] Intel Auto-Round x SGL Diffusion Quantization

ValueError: The output_size of gate's and up's weight = 64 is ... - GitHub