Skip to content

how to Quant DeepseekV2 #1320

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
liusy58 opened this issue Apr 3, 2025 · 0 comments
Open

how to Quant DeepseekV2 #1320

liusy58 opened this issue Apr 3, 2025 · 0 comments
Labels
bug Something isn't working

Comments

@liusy58
Copy link

liusy58 commented Apr 3, 2025

Hello, I want to quant DeepseekV2,

from transformers import AutoModelForCausalLM, AutoTokenizer

from llmcompressor import oneshot
from llmcompressor.modifiers.quantization import QuantizationModifier

from compressed_tensors.quantization.quant_args import (
    QuantizationArgs,
    QuantizationStrategy,
    QuantizationType,
)


MODEL_ID = "DeepSeek-V2-Lite-Chat"

# Load model.
model = AutoModelForCausalLM.from_pretrained(
    MODEL_ID, device_map="auto", torch_dtype="auto", trust_remote_code=True
)

tokenizer = AutoTokenizer.from_pretrained(MODEL_ID, trust_remote_code=True)

recipe = QuantizationModifier(
    targets=["Linear"], scheme="FP8_CUSTOME", ignore=["lm_head"], trust_remote_code_model=True
)

oneshot(model=model, recipe=recipe, trust_remote_code_model=True)

# Confirm generations of the quantized model look sane.
print("========== SAMPLE GENERATION ==============")
input_ids = tokenizer("Hello my name is", return_tensors="pt").input_ids.to("cuda")
output = model.generate(input_ids, max_new_tokens=20)
print(tokenizer.decode(output[0]))
print("==========================================")

# Save to disk in compressed-tensors format.
SAVE_DIR = "./" + "DeepSeek-V2-Lite-quant" +  "-FP8-Dynamic"
model.save_pretrained(SAVE_DIR)
tokenizer.save_pretrained(SAVE_DIR)

I define FP8_CUSTOME as follow:

FP8_CUSTOME= dict(
    weights=QuantizationArgs(
        num_bits=8,
        type=QuantizationType.FLOAT,
        strategy=QuantizationStrategy.BLOCK,
        block_structure="128x128",
        symmetric=True,
        dynamic=False,
    ),
    input_activations=QuantizationArgs(
        num_bits=8,
        type=QuantizationType.FLOAT,
        strategy=QuantizationStrategy.TOKEN,
        symmetric=True,
        dynamic=True,
        observer=None,
    ),
)

I got an error :

compressed_tensors/utils/offload.py", line 202, in update_offload_parameter
    data = data.to(param.dtype)
AttributeError: 'NoneType' object has no attribute 'to'

Could you please give me some guidance?

@liusy58 liusy58 added the bug Something isn't working label Apr 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant