We are excited to announce the release of Intel® Extension for PyTorch* v2.7.10+xpu. This is the new release which supports Intel® GPU platforms (Intel® Arc™ Graphics family, Intel® Core™ Ultra Processors with Intel® Arc™ Graphics, Intel® Core™ Ultra Series 2 with Intel® Arc™ Graphics, Intel® Core™ Ultra Series 2 Mobile Processors and Intel® Data Center GPU Max Series) based on PyTorch* 2.7.0.
Highlights
-
Intel® oneDNN v3.7.1 integration
-
Large Language Model (LLM) optimization
Intel® Extension for PyTorch* optimizes typical LLM models like Llama 2, Llama 3, Phi-3-mini, Qwen2, and GLM-4 on the Intel® Arc™ Graphics family. Moreover, new LLM inference models such as Llama 3.3, Phi-3.5-mini, Qwen2.5, and Mistral-7B are also optimized on Intel® Data Center GPU Max Series platforms compared to the previous release. A full list of optimized models can be found in the LLM Optimizations Overview, with supported transformer version updates to 4.48.3.
-
Serving framework support
Intel® Extension for PyTorch* offers extensive support for various ecosystems, including vLLM and TGI, with the goal of enhancing performance and flexibility for LLM workloads on Intel® GPU platforms (intensively verified on Intel® Data Center GPU Max Series and Intel® Arc™ B-Series graphics on Linux). The vLLM/TGI features, such as chunked prefill and MoE (Mixture of Experts), are supported by the backend kernels provided in Intel® Extension for PyTorch*. In this release, Intel® Extension for PyTorch* adds sliding windows support in
ipex.llm.modules.PagedAttention.flash_attn_varlen_func
to meet the need of models like Phi3, and Mistral, which enable sliding window support by default. -
[Prototype] QLoRA/LoRA finetuning using BitsAndBytes
Intel® Extension for PyTorch* supports QLoRA/LoRA finetuning with BitsAndBytes on Intel® GPU platforms. This release includes several enhancements for better performance and functionality:
- The performance of the NF4 dequantize kernel has been improved by approximately 4.4× to 5.6× across different shapes compared to the previous release.
_int_mm
support in INT8 has been added to enable INT8 LoRA finetuning in PEFT (with float optimizers likeadamw_torch
).
-
Codegen support removal
Removes codegen support from Intel® Extension for PyTorch* and reuses the codegen capability from Torch XPU Operators, to ensure interoperability of code change in codegen with usages in Intel® Extension for PyTorch*.
-
[Prototype] Python 3.13t support
Adds prototype support for Python 3.13t and provides prebuilt binaries on the download server.
Known Issues
Please refer to Known Issues webpage.