You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
I try to quantify deepseek-v3-BF16
when i use this code :
import torch
from datasets import load_dataset
from transformers import AutoModelForCausalLM, AutoTokenizer
from llmcompressor.modifiers.quantization import GPTQModifier
from llmcompressor.transformers import oneshot
from llmcompressor.transformers.compression.helpers import calculate_offload_device_map
# NOTE: transformers 4.49.0 has an attribute error with DeepSeek.
# Please consider either downgrading your transformers version to a
# previous version or upgrading to a version where this bug is fixed
# select a Mixture of Experts model for quantization
MODEL_ID = "DeepSeek-V3-bf16"
# adjust based off number of desired GPUs
# if not enough memory is available, some layers will automatically be offlaoded to cpu
device_map = calculate_offload_device_map(
MODEL_ID,
reserve_for_hessians=True,
num_gpus=8,
torch_dtype=torch.bfloat16,
trust_remote_code=True,
)
model = AutoModelForCausalLM.from_pretrained(
MODEL_ID, device_map=device_map, torch_dtype=torch.bfloat16, trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
# Select calibration dataset.
# its recommended to use more calibration samples for MoE models so each expert is hit
DATASET_ID = "/kefu-nas/xyb/quantization/dataset/ultrachat_200k"
DATASET_SPLIT = "train_sft"
NUM_CALIBRATION_SAMPLES = 2048
MAX_SEQUENCE_LENGTH = 2048
# Load dataset and preprocess.
ds = load_dataset(DATASET_ID, split=DATASET_SPLIT)
ds = ds.shuffle(seed=42).select(range(NUM_CALIBRATION_SAMPLES))
def preprocess(example):
return {
"text": tokenizer.apply_chat_template(
example["messages"],
tokenize=False,
)
}
ds = ds.map(preprocess)
# Tokenize inputs.
def tokenize(sample):
return tokenizer(
sample["text"],
padding=False,
max_length=MAX_SEQUENCE_LENGTH,
truncation=True,
add_special_tokens=False,
)
ds = ds.map(tokenize, remove_columns=ds.column_names)
# define a llmcompressor recipe for INT8 W8A8 quantization
# since the MoE gate layers are sensitive to quantization, we add them to the ignore
# list so they remain at full precision
recipe = [
GPTQModifier(
targets="Linear",
scheme="W8A8",
ignore=["lm_head", "re:.*mlp.gate$"],
),
]
SAVE_DIR = "deepseel_v3-W8A8"
oneshot(
model=model,
dataset=ds,
recipe=recipe,
max_seq_length=MAX_SEQUENCE_LENGTH,
num_calibration_samples=NUM_CALIBRATION_SAMPLES,
trust_remote_code_model=True,
save_compressed=True,
output_dir=SAVE_DIR,
)
The python process only used GPU 0 with 300MB when i have 8 gpus. But the process used 1.4T cpu memory.
Expected behavior
A clear and concise description of what you expected to happen.
Environment
Include all relevant environment information:
OS [e.g. Ubuntu 20.04]:
Python version [e.g. 3.7]:
LLM Compressor version or commit hash [e.g. 0.1.0, f7245c8]:
ML framework version(s) [e.g. torch 2.3.1]:
Other Python package versions [e.g. vLLM, compressed-tensors, numpy, ONNX]:
Other relevant environment information [e.g. hardware, CUDA version]:
To Reproduce
Exact steps to reproduce the behavior:
Errors
If applicable, add a full print-out of any errors or exceptions that are raised or include screenshots to help explain your problem.
Additional context
Add any other context about the problem here. Also include any relevant files.
The text was updated successfully, but these errors were encountered:
Describe the bug
I try to quantify deepseek-v3-BF16
when i use this code :
The python process only used GPU 0 with 300MB when i have 8 gpus. But the process used 1.4T cpu memory.
Expected behavior
A clear and concise description of what you expected to happen.
Environment
Include all relevant environment information:
f7245c8
]:To Reproduce
Exact steps to reproduce the behavior:
Errors
If applicable, add a full print-out of any errors or exceptions that are raised or include screenshots to help explain your problem.
Additional context
Add any other context about the problem here. Also include any relevant files.
The text was updated successfully, but these errors were encountered: