Qwen Unveils Advanced Vision-Language Model for AI Growth - Qwen Rolls Out New Vision‐Language Model To Advance Coding, Reasoning, And Multimodal AI Performance

Alibaba Cloud's Qwen team has launched a groundbreaking model, the Qwen3.5-397B-A17B, which aims to enhance multimodal AI performance significantly. This new open-weight model offers robust capabilities in reasoning, coding, and agent tasks, marking a crucial step in the development of general-purpose AI agents. With 397 billion parameters, the model is designed to provide high efficiency and effectiveness across various applications.

Innovative Hybrid Architecture Enhances Performance

The Qwen3.5-397B-A17B model features a cutting-edge hybrid architecture that incorporates linear attention through Gated Delta Networks along with a sparse mixture-of-experts (MoE) design. This innovative structure enables the model to activate only 17 billion of its total parameters during each forward pass, ensuring that it maintains high performance while minimizing computational costs. Such efficiency is essential for today's AI applications, where rapid response times and resource management are critical.

Moreover, this model expands its language and dialect coverage significantly, increasing from 119 to 201. This enhancement broadens accessibility for developers and users around the globe, facilitating a more inclusive approach to AI technology. The Qwen team's commitment to improving language capabilities highlights their strategy to cater to a diverse audience, further solidifying their position in the competitive AI landscape.

Reinforcement Learning Drives Generalization and Efficiency

Qwen3.5 also represents a substantial leap in reinforcement learning, a core aspect that supports the model's adaptability across various tasks. Unlike previous models that focused on optimizing for specific benchmarks, the Qwen team has prioritized increasing task complexity and generalizability. This strategic shift has led to enhanced performance across several critical evaluations, including BFCL-V4, VITA-Bench, DeepPlanning, Tool-Decathlon, and MCP-Mark.

The pretraining process for Qwen3.5 has been significantly improved, focusing on power, efficiency, and versatility. It has been trained on a vast dataset that encompasses visual-text information, fortified with multilingual, STEM, and reasoning content. As a result, it can match the performance of earlier trillion-parameter models while being more efficient. Architectural improvements, like higher-sparsity MoE and stability refinements, contribute to its impressive throughput, especially at extended context lengths of 32k and 256k tokens.

Advanced Multimodal Training Infrastructure

The infrastructure supporting Qwen3.5 is tailored for efficient multimodal training, employing a heterogeneous parallelism strategy that separates vision and language components. This design prevents bottlenecks during processing and allows for near-full throughput even with mixed workloads involving text, images, and video. A native FP8 pipeline further optimizes memory usage, reducing activation memory by nearly 50% and boosting training speed by over 10%. These innovations maintain stability even at massive token scales.

Furthermore, the model's multimodal capabilities are enhanced through early text-vision fusion and an expanded array of datasets that include images, STEM materials, and video. This broadens its applicability and enhances its learning efficiency across various formats. The larger vocabulary of 250k also contributes to better encoding and decoding capabilities, ensuring the model performs well in real-world applications.

Looking Ahead: The Future of Qwen AI Models

As the Qwen team continues to push the boundaries of artificial intelligence, the release of the Qwen3.5-397B-A17B model showcases their dedication to innovation and performance. With an emphasis on general-purpose AI agents, this latest model not only addresses the current demands of the market but also anticipates future needs in various sectors, including technology, education, and more.

Future technical reports will provide additional insights into the model's performance metrics and benchmarks, further demonstrating the advancements made with Qwen3.5. As AI technology evolves, the Qwen team is poised to play a pivotal role in shaping the future landscape of multimodal artificial intelligence.