-
Notifications
You must be signed in to change notification settings - Fork 919
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Support Deepseek-VL2 #2798
base: main
Are you sure you want to change the base?
Conversation
@@ -0,0 +1,127 @@ | |||
from typing import List,Optional,Tuple,Union |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename the file to deepseek_vl2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename done
|
||
self.layers = modules | ||
|
||
def forward(self, x): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have not yet implemented the forward part of the DeepseekV2ForCausalLM. I will finish all the implementations and add the unit test this weekend.
@ccw1996 Do you need our help? |
Has support for deepseek vl2 been implemented? |
if config.projector_type == "downsample_mlp_gelu": | ||
mlp_depth = config.depth | ||
mlp_ratio = config.mlp_ratio | ||
modules = [nn.Linear(config.input_dim * config.downsample_ratio * config.downsample_ratio, config.n_embed * mlp_ratio)] | ||
for _ in range(1, mlp_depth - 1): | ||
modules.append(nn.GELU()) | ||
modules.append(nn.Linear(config.n_embed * mlp_ratio, config.n_embed * mlp_ratio)) | ||
modules.append(nn.GELU()) | ||
modules.append(nn.Linear(config.n_embed * mlp_ratio, config.n_embed)) | ||
modules = nn.Sequential(*modules) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ccw1996 I'm happy to take the rest of the work to parallelize the remaining functions. Could you give me access to your branch?
@ccw1996 Apologies for the delay. Would you like me to help with the rest of it? |
@ccw1996 I see, I think you can copy those layers from timm into python/sglang/srt/models/deepseekvl2.py, and then replace layers with sgl classes. I'm interested in helping if you can give me access. |
@yizhang2077 @ispobock Looks like we'll have to copy lots of code from timm--now mostly just the linear layers with variable depth to parallelize, will finish soon |
Sure, can you mark the problematic part? |
if config.projector_type == "downsample_mlp_gelu": | ||
mlp_depth = config.depth | ||
mlp_ratio = config.mlp_ratio | ||
modules = [ | ||
nn.Linear( | ||
config.input_dim | ||
* config.downsample_ratio | ||
* config.downsample_ratio, | ||
config.n_embed * mlp_ratio, | ||
) | ||
] | ||
for _ in range(1, mlp_depth - 1): | ||
modules.append(nn.GELU()) | ||
modules.append( | ||
nn.Linear(config.n_embed * mlp_ratio, config.n_embed * mlp_ratio) | ||
) | ||
modules.append(nn.GELU()) | ||
modules.append(nn.Linear(config.n_embed * mlp_ratio, config.n_embed)) | ||
modules = nn.Sequential(*modules) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to parallelize this part with Column and Row linear
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yizhang2077 Actually with GELU we'll have to gather output for each TP linear. Should we use replicated linear instead?
two problem. one is radix cache will make input error, i will try to fix it. the second is output seems like not use images embedding. Can you help me to debug it? |
Let me try tomorrow |
input_embeds[idx].masked_scatter_( | ||
image_seq_mask[idx].unsqueeze(-1), images_in_this_batch | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ccw1996 The image embedding (images_in_this_batch
) is indeed applied to the text embedding here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Edenzzzz thanks a lot. Now it can output right answer. I will finish cuda graph and clean code in this weekend.
logger.info( | ||
"Automatically turn off --chunked-prefill-size and disable radix cache for deekseek-vl2." | ||
) | ||
server_args.chunked_prefill_size = -1 | ||
server_args.disable_radix_cache = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The language part still supports radix cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The language part relay on input embed. If use radix cache, the input embed is wrong. I will try to debug it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, I think you're right. Llava and qwen_vl also don't use radix attn
], | ||
) | ||
cls.base_url += "/v1" | ||
|
||
if __name__ == "__main__": | ||
unittest.main() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ccw1996 This seems mostly ready. Did you encounter 400 Bad Request when running Qwen-VL?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if qwen-vl is normal, i tested qwen2-vl and passed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tests have not passed. We should test deepseek-vl2
, not qwen-vl. There's some dim mismatch in capturing cuda graph. You can try to fix it and then it should be ready
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, i fix these error in latest commit. now it can test pass.
@Edenzzzz Can you help me merge all the commits? Now, it's ready. Thanks a lot |
modules = ReplicatedLinear( | ||
config.input_dim, | ||
config.n_embed, | ||
quant_config=quant_config, | ||
) | ||
|
||
elif config.projector_type == "mlp_gelu": | ||
mlp_depth = config.depth | ||
modules = [ReplicatedLinear( | ||
config.input_dim, | ||
config.n_embed, | ||
quant_config=quant_config, | ||
)] | ||
for _ in range(1, mlp_depth): | ||
modules.append(nn.GELU()) | ||
modules.append( | ||
ReplicatedLinear( | ||
config.n_embed, | ||
config.n_embed, | ||
quant_config=quant_config, | ||
) | ||
) | ||
modules = nn.Sequential(*modules) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are still bugs when running the test. Replaced linear layers, we need to take out the first element of the output tuple
Motivation
Add Deepseek-VL2 model to SGLang, as requested in #2653
Modifications
Checklist