Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error with shape #27

Open
manlenzzz opened this issue Apr 21, 2024 · 2 comments
Open

Error with shape #27

manlenzzz opened this issue Apr 21, 2024 · 2 comments

Comments

@manlenzzz
Copy link

RuntimeError: Error(s) in loading state_dict for PeftModelForCausalLM:
size mismatch for base_model.model.model.embed_tokens.weight: copying a param with shape torch.Size([32001, 4096]) from checkpoint, the shape in current model is torch.Size([32000, 4096]).
size mismatch for base_model.model.lm_head.weight: copying a param with shape torch.Size([32001, 4096]) from checkpoint, the shape in current model is torch.Size([32000, 4096]).

I used the checkpoints from the training config:
python train_gsm8k.py
--model_name_or_path LoftQ/Llama-2-7b-hf-4bit-64rank
--learning_rate 3e-4
--seed 11
--expt_name gsm8k_llama2_7b_4bit_64rank_loftq
--output_dir exp_results/
--num_train_epochs 6
--per_device_train_batch_size 2
--gradient_accumulation_steps 8
--evaluation_strategy "no"
--save_strategy "epoch"
--weight_decay 0.1
--warmup_ratio 0.03
--lr_scheduler_type "cosine"
--logging_steps 10
--do_train
--report_to tensorboard

@yxli2123
Copy link
Owner

Hi @manlenzzz , this script works for me. Could you please provide more details about the error?

It looks like it's due to smart_tokenizer_and_embedding_resize function. Because LLAMA tokenizer doesn't have PAD token, we need to add one special token during the training, which will make the shape [32000, 4096] to [32001, 4096]. I would suggest you run the smart_tokenizer_and_embedding_resize function after loading the backbone and before loading adapters. For example:

model = AutoModelForCausalLM.from_pretrained("LoftQ/Llama-2-7b-hf-4bit-64rank")
tokenizer = AutoTokenizer.from_pretrained("LoftQ/Llama-2-7b-hf-4bit-64rank")
smart_tokenizer_and_embedding_resize(model, tokenizer)
model = PeftModel.from_pretrained(model, "path/to/your/adapter")

@manlenzzz
Copy link
Author

yes!you are right.

After I modified the code it worked fine:
def evaluation(model_args, data_args):
if model_args.full_precision:
model = transformers.AutoModelForCausalLM.from_pretrained(
model_args.model_name_or_path,
low_cpu_mem_usage=True,
torch_dtype=torch.bfloat16,
token=model_args.token,
device_map='cuda:1',
)
else:
model = transformers.AutoModelForCausalLM.from_pretrained(
model_args.model_name_or_path,
low_cpu_mem_usage=True,
torch_dtype=torch.bfloat16,
token=model_args.token,
device_map='cuda:1',
quantization_config=transformers.BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=False,
bnb_4bit_quant_type='nf4',
),
)

tokenizer = transformers.AutoTokenizer.from_pretrained(
    model_args.model_name_or_path,
    token=model_args.token,
    model_max_length=model_args.model_max_length,
    padding_side="left",
    use_fast=False,
)

special_tokens_dict = dict()
if tokenizer.pad_token is None:
    special_tokens_dict["pad_token"] = DEFAULT_PAD_TOKEN
if tokenizer.eos_token is None:
    special_tokens_dict["eos_token"] = DEFAULT_EOS_TOKEN
if tokenizer.bos_token is None:
    special_tokens_dict["bos_token"] = DEFAULT_BOS_TOKEN
if tokenizer.unk_token is None:
    special_tokens_dict["unk_token"] = DEFAULT_UNK_TOKEN

smart_tokenizer_and_embedding_resize(
    special_tokens_dict=special_tokens_dict,
    tokenizer=tokenizer,
    model=model,
)
##########################
#       Peft Model       #
##########################
if model_args.adapter_name_or_path is not None:
    model = PeftModel.from_pretrained(
        model,
        model_args.adapter_name_or_path,
        is_trainable=False,
        token=model_args.token,
    )
else:
    model = PeftModel.from_pretrained(
        model,
        model_args.model_name_or_path,
        subfolder='gsm8k',
        is_trainable=False,
        token=model_args.token,
    )

yxli2123 added a commit that referenced this issue Apr 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants