Skip to content
@OpenGVLab

OpenGVLab

General Vision Team of Shanghai AI Laboratory

Static Badge Twitter

Welcome to OpenGVLab! 👋

We are a research group from Shanghai AI Lab focused on Vision-Centric AI research. The GV in our name, OpenGVLab, means general vision, a general understanding of vision, so little effort is needed to adapt to new vision-based tasks.

We develop model architecture and release pre-trained foundation models to the community to motivate further research in this area. We have made promising progress in general vision AI, with 109 SOTA🚀. In 2022, our open-sourced foundation model 65.5 mAP on the COCO object detection benchmark, 91.1% Top1 accuracy in Kinetics 400, achieved landmarks for AI vision👀 tasks for image🖼️ and video📹 understanding. In 2023, we created VideoChat🦜,llama-adapter🦙, 3D foundation model Ponder V2🧊 and many more wonderful works! In CVPR 2023, our vision foundation model InternImage was listed as one of the most influential papers, and by benefiting from our partner OpenDriveLab, we won the Best paper together🎉 .

In 2024, we released the best open-source VLM InternVL , video understanding foundation model InternVideo2, which won 7 Champions on EgoVis challenges 🥇. Up to now, our brilliant team have open-sourced more than 70 works, please find them here😃

Based on solid vision foundations, we have expanded to Multi-Modality models and. We aim to empower individuals and businesses by offering a higher starting point for developing vision-based AI products and lessening the burden of building an AI model from scratch.

Branches: Alpha (explore lattest advances in vision+language research), uni-medical (focus on medical AI), Vchitect (Generative AI)

Follow us:    Twitter X logo Twitter   🤗Hugging Face    Medium logo Medium    WeChat logo WeChat    zhihu logo Zhihu

Pinned Loading

  1. InternVL Public

    [CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型

    Python 8.4k 639

  2. InternVideo Public

    [ECCV2024] Video Foundation Models & Data for Multimodal Understanding

    Python 1.9k 112

  3. Ask-Anything Public

    [CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.

    Python 3.3k 262

  4. VideoMamba Public

    [ECCV2024] VideoMamba: State Space Model for Efficient Video Understanding

    Python 962 74

  5. OmniQuant Public

    [ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.

    Python 817 64

  6. LLaMA-Adapter Public

    [ICLR 2024] Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters

    Python 5.9k 381

Repositories

Showing 10 of 80 repositories
  • InternVideo Public

    [ECCV2024] Video Foundation Models & Data for Multimodal Understanding

    Python 1,916 Apache-2.0 112 124 3 Updated Jun 16, 2025
  • ZeroGUI Public

    ZeroGUI: Automating Online GUI Learning at Zero Human Cost

    Python 59 Apache-2.0 3 1 0 Updated Jun 14, 2025
  • MUTR Public

    「AAAI 2024」 Referred by Multi-Modality: A Unified Temporal Transformers for Video Object Segmentation

    Python 79 MIT 6 3 0 Updated Jun 13, 2025
  • VideoChat-Flash Public

    VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling

    Python 434 MIT 11 9 0 Updated Jun 13, 2025
  • VRBench Public

    A Benchmark for Multi-Step Reasoning in Long Narrative Videos

    6 Apache-2.0 0 0 0 Updated Jun 13, 2025
  • PVC Public

    [CVPR 2025] PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models

    Python 42 MIT 0 3 0 Updated Jun 12, 2025
  • FluxViT Public

    Make Your Training Flexible: Towards Deployment-Efficient Video Models

    Python 30 MIT 0 1 0 Updated Jun 11, 2025
  • VideoChat-R1 Public

    VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning

    Python 150 4 14 0 Updated Jun 9, 2025
  • VeBrain Public

    Visual Embodied Brain: Let Multimodal Large Language Models See, Think, and Control in Spaces

    62 MIT 5 2 0 Updated Jun 6, 2025
  • InternVL Public

    [CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型

    Python 8,360 MIT 639 211 5 Updated May 29, 2025