Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add modules_to_not_convert in quantized model #3053

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

jiqing-feng
Copy link

@jiqing-feng jiqing-feng commented Feb 24, 2025

Fix modules_to_not_convert to skip the unquantized linear. As some models have unquantized modules, we should skip these modules in quantization.

This PR could enable qwen2_vl-awq model.
Without this change, the error will be:

Traceback (most recent call last):
  File "/opt/conda/bin/text-generation-server", line 10, in <module>
    sys.exit(app())
  File "/opt/conda/lib/python3.11/site-packages/typer/main.py", line 323, in __call__
    return get_command(self)(*args, **kwargs)
  File "/opt/conda/lib/python3.11/site-packages/click/core.py", line 1161, in __call__
    return self.main(*args, **kwargs)
  File "/opt/conda/lib/python3.11/site-packages/typer/core.py", line 743, in main
    return _main(
  File "/opt/conda/lib/python3.11/site-packages/typer/core.py", line 198, in _main
    rv = self.invoke(ctx)
  File "/opt/conda/lib/python3.11/site-packages/click/core.py", line 1697, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/conda/lib/python3.11/site-packages/click/core.py", line 1443, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/conda/lib/python3.11/site-packages/click/core.py", line 788, in invoke
    return __callback(*args, **kwargs)
  File "/opt/conda/lib/python3.11/site-packages/typer/main.py", line 698, in wrapper
    return callback(**use_params)
  File "/usr/src/server/text_generation_server/cli.py", line 119, in serve
    server.serve(
  File "/usr/src/server/text_generation_server/server.py", line 316, in serve
    asyncio.run(
  File "/opt/conda/lib/python3.11/asyncio/runners.py", line 190, in run
    return runner.run(main)
  File "/opt/conda/lib/python3.11/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
  File "/opt/conda/lib/python3.11/asyncio/base_events.py", line 641, in run_until_complete
    self.run_forever()
  File "/opt/conda/lib/python3.11/asyncio/base_events.py", line 608, in run_forever
    self._run_once()
  File "/opt/conda/lib/python3.11/asyncio/base_events.py", line 1936, in _run_once
    handle._run()
  File "/opt/conda/lib/python3.11/asyncio/events.py", line 84, in _run
    self._context.run(self._callback, *self._args)
> File "/usr/src/server/text_generation_server/server.py", line 268, in serve_inner
    model = get_model_with_lora_adapters(
  File "/usr/src/server/text_generation_server/models/__init__.py", line 1592, in get_model_with_lora_adapters
    model = get_model(
  File "/usr/src/server/text_generation_server/models/__init__.py", line 1388, in get_model
    return VlmCausalLM(
  File "/usr/src/server/text_generation_server/models/vlm_causal_lm.py", line 354, in __init__
    super().__init__(
  File "/usr/src/server/text_generation_server/models/flash_causal_lm.py", line 1289, in __init__
    weights_loader = get_loader(quantize, model_id, revision)
  File "/usr/src/server/text_generation_server/utils/quantization.py", line 159, in get_loader
    return GPTQWeightsLoader(
TypeError: GPTQWeightsLoader.__init__() got an unexpected keyword argument 'modules_to_not_convert'
2025-02-25T13:01:44.894419Z ERROR shard-manager: text_generation_launcher: Shard complete standard error output:

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
@jiqing-feng
Copy link
Author

Hi @Narsil , could you please help to trigger the tests and review this PR? Thanks!

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Copy link
Member

@danieldk danieldk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding support for this. I added some small comments.

@jiqing-feng
Copy link
Author

Hi @danieldk , I have fixed your comments, please review the new changes. Thanks!

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants