how to Quant DeepseekV2 #1320

liusy58 · 2025-04-03T09:07:54Z

Hello, I want to quant DeepseekV2,

from transformers import AutoModelForCausalLM, AutoTokenizer

from llmcompressor import oneshot
from llmcompressor.modifiers.quantization import QuantizationModifier

from compressed_tensors.quantization.quant_args import (
    QuantizationArgs,
    QuantizationStrategy,
    QuantizationType,
)


MODEL_ID = "DeepSeek-V2-Lite-Chat"

# Load model.
model = AutoModelForCausalLM.from_pretrained(
    MODEL_ID, device_map="auto", torch_dtype="auto", trust_remote_code=True
)

tokenizer = AutoTokenizer.from_pretrained(MODEL_ID, trust_remote_code=True)

recipe = QuantizationModifier(
    targets=["Linear"], scheme="FP8_CUSTOME", ignore=["lm_head"], trust_remote_code_model=True
)

oneshot(model=model, recipe=recipe, trust_remote_code_model=True)

# Confirm generations of the quantized model look sane.
print("========== SAMPLE GENERATION ==============")
input_ids = tokenizer("Hello my name is", return_tensors="pt").input_ids.to("cuda")
output = model.generate(input_ids, max_new_tokens=20)
print(tokenizer.decode(output[0]))
print("==========================================")

# Save to disk in compressed-tensors format.
SAVE_DIR = "./" + "DeepSeek-V2-Lite-quant" +  "-FP8-Dynamic"
model.save_pretrained(SAVE_DIR)
tokenizer.save_pretrained(SAVE_DIR)

I define FP8_CUSTOME as follow:

FP8_CUSTOME= dict(
    weights=QuantizationArgs(
        num_bits=8,
        type=QuantizationType.FLOAT,
        strategy=QuantizationStrategy.BLOCK,
        block_structure="128x128",
        symmetric=True,
        dynamic=False,
    ),
    input_activations=QuantizationArgs(
        num_bits=8,
        type=QuantizationType.FLOAT,
        strategy=QuantizationStrategy.TOKEN,
        symmetric=True,
        dynamic=True,
        observer=None,
    ),
)

I got an error :

compressed_tensors/utils/offload.py", line 202, in update_offload_parameter
    data = data.to(param.dtype)
AttributeError: 'NoneType' object has no attribute 'to'

Could you please give me some guidance?

The text was updated successfully, but these errors were encountered:

liusy58 added the bug Something isn't working label Apr 3, 2025

liusy58 mentioned this issue Apr 7, 2025

[Feature] Support DeepEP Low Latency sgl-project/sglang#4767

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to Quant DeepseekV2 #1320

how to Quant DeepseekV2 #1320

liusy58 commented Apr 3, 2025 •

edited

Loading

how to Quant DeepseekV2 #1320

how to Quant DeepseekV2 #1320

Comments

liusy58 commented Apr 3, 2025 • edited Loading

liusy58 commented Apr 3, 2025 •

edited

Loading