AI News Analysis

Meituan Open-Sources LongCat-2.0, a 1.6-Trillion-Parameter MoE Model; Full-Stack Execution on Domestic Computing Hardware Successfully Achieved

Daily News 4 views
Meituan has officially open-sourced LongCat-2.0, a massive MoE model with 1.6 trillion parameters. This release achieves end-to-end training and inference deployment on a domestic computing cluster comprising 50,000 GPUs, thereby breaking reliance on overseas computing power. The announcement details LongCat-2.0's technical highlights, performance advantages, and industry value.

On June 30, Meituan Officially Released and Open-Sourced LongCat-2.0, Its New-Generation Trillion-Parameter MoE Large Model

The core breakthrough of this launch lies not in its trillion-level parameter scale, but in enabling 100% domestic computing power for the full training and inference pipeline. It is the industry’s first trillion-parameter large model that completes end-to-end pre-training and online inference on a 50,000-card domestic computing cluster, completely overturning the long-standing stereotype that “domestic chips are only fit for inference, while large-scale training must rely on high-end overseas GPUs”.

I. Core Hardware & Model Specifications: 1.6 Trillion MoE Architecture, Native 1M Ultra-Long Context Window

LongCat-2.0 adopts the highly efficient mainstream Mixture-of-Experts (MoE) architecture, with a total nominal parameter count of 1.6 trillion. Unlike dense large models that activate all parameters for every calculation, it leverages dynamic routing mechanisms. On average, only 480 billion parameters are activated per text segment, with a floating range of 33B to 56B, balancing ultra-large model capacity and controllable inference costs.

It is trained on an unprecedented corpus of over 30 trillion tokens, covering general Chinese text, English materials, massive open-source code, and vertical industry business documents, laying a solid foundation for long-text comprehension, code generation, and AI agent tasks.

Two flagship capabilities push its practical performance to new heights:

Native Support for 1M Token Ultra-Long Context

It seamlessly processes million-word input in a single load, fully parsing entire code repositories, lengthy industry whitepapers, and tens of thousands of lines of business scripts with coherent comprehension. It delivers exceptional adaptability for code development, document summarization, and enterprise knowledge base workflows.

Deep Compatibility With Mainstream AI Development Frameworks

It natively integrates with mainstream code and agent toolchains including Claude Code, OpenClaw, and Hermes. Whether individual developers conduct Vibe Coding for rapid program generation or enterprises build automated agent workflows, the model can be directly accessed via API calls.

Prior to its official release, the model underwent gray-scale testing on OpenRouter under the anonymous codename Owl Alpha. It quickly ranked among the world’s top three models by total API call volume upon launch, holding a leading monthly call volume share in the code generation track and earning widespread recognition from overseas developers after hands-on testing.

II. Industry Milestone: Full Lifecycle Operation on a 50,000-Card Domestic Computing Cluster, Zero Overseas GPUs Deployed

The transformative highlight of LongCat-2.0 that reshapes industry norms is its end-to-end training and inference pipeline running entirely on domestic AI acceleration chips. At peak scheduling, the cluster mobilizes more than 50,000 domestic NPU cards, with not a single NVIDIA GPU deployed. It completes the industry’s first full closed-loop verification of trillion-parameter model development built purely on domestic computing infrastructure.

Meituan launched its joint “Model-Chip Co-Optimization” research initiative back in 2023. Over three years, the team collaborated with domestic computing hardware vendors to resolve countless engineering pain points for ten-thousand-card cluster training, rolling out multiple self-developed underlying optimization solutions:

  1. 10,000-Card Automatic Fault Tolerance & Recovery
  2. Proprietary communication error handling and cluster elastic scaling mechanisms cut the average daily failure rate of large-scale training workloads by over 70%. Even if dozens of computing cards malfunction in a single day, training tasks will not be interrupted or rolled back.
  3. Deterministic Computation Optimization for NPUs
  4. It addresses the critical pain point of numerical inconsistency in distributed training on domestic chips, ensuring zero gradient deviation across multi-card synchronization. No irreversible loss anomalies emerged throughout the full 35 trillion-token training cycle.
  5. Dramatic Improvement in Computing Utilization
  6. By restructuring operator scheduling and memory read-write logic, the model’s floating-point compute utilization rate is lifted 1.5x against the initial baseline. Under steady operation, its daily data throughput exceeds 1 trillion tokens.

Before LongCat-2.0, most domestic large model vendors paired overseas GPUs for pre-training, reserving domestic chips solely for online inference. LongCat-2.0’s successful deployment proves that existing domestic computing clusters are fully capable of supporting industrial-grade full-cycle development of trillion-parameter large models, validating the complete viability of an independent, controllable domestic computing roadmap.

III. Three Original Architectural Innovations Address Core MoE & Long-Text Pain Points

To adapt to domestic hardware and unlock full performance of its ultra-large model, Meituan’s team implemented three proprietary core technologies in LongCat-2.0, all open-sourced alongside the model package:

1. LSA Sparse Attention Mechanism

Traditional large models suffer from quadratic spikes in attention computation cost when processing million-word context, imposing massive hardware pressure. The self-developed LSA architecture reduces computational complexity to linear order, paired with layered index reuse to drastically cut memory footprint for long-text scenarios — serving as the core underlying enabler for stable 1M-token context support.

2. Industry-First Zero-Computation Expert Dynamic Routing

It realizes fine-grained token-level computing resource allocation: lightweight input such as short sentences and simple prompts activates only a small subset of expert networks to save computing overhead; complex code, multi-step reasoning, and ultra-long documents automatically schedule more expert modules for in-depth computation, eliminating resource waste from uniform full-expert activation and drastically lowering large-scale inference costs.

3. MOPD Multi-Group Expert Routing

Massive expert networks are divided into three specialized groups by scenario: agent tool invocation, mathematical logical reasoning, and multi-turn conversational dialogue. Input content is automatically routed to the corresponding dedicated expert group, delivering marked accuracy gains for vertical specialized tasks. Paired with a 135-billion-parameter Ngram embedding module, it strengthens semantic expression of word combinations and mitigates comprehension discontinuities in long texts.

IV. Benchmark Performance: Code Capability Rivals Top Closed-Source Global Models

On SWE-bench Pro, the world’s authoritative general code evaluation benchmark, LongCat-2.0 scores 59.5, surpassing GPT-5.5 (58.6) and Claude Opus 4.6 (57.3), placing it in the top tier of domestic open-source large models for coding performance.

For real-world developer workflows, its core advantages fall into three categories:

  1. Repository-Level Code Refactoring
  2. Upload full source code packages to batch fix vulnerabilities, refactor legacy logic, and auto-generate interface documentation — perfectly aligned with the mainstream Vibe Coding no-code development paradigm.
  3. Complex Multi-Step Agent Execution
  4. It supports chained tool invocation, data retrieval, and autonomous error correction, lowering the barrier to building automated office and data processing bots.
  5. Deep Optimization for China’s Local Life Service Scenarios
  6. Built on Meituan’s years of proprietary business data accumulation, it delivers superior comprehension accuracy over most general large models for vertical use cases including local merchant management, order data analysis, and service copywriting generation.

V. Full Open-Source Release Lowers Barriers to Domestic AI Industrial Adoption

Meituan officially confirmed that all supporting assets for LongCat-2.0 will be open-sourced simultaneously, including model weights, distributed training infrastructure, domestic computing-adapted inference engines, and full operator optimization code. Individual developers, SMEs, and research institutions may obtain commercial-use licenses free of charge, eliminating the need to build custom underlying training frameworks for trillion-parameter models from scratch.

The open-source release delivers two tangible industry-wide values:

  1. SMEs no longer need to purchase high-priced overseas high-end computing hardware. They can reuse this domestic cluster training-inference pipeline to independently develop vertical domain large models at low cost.
  2. It accelerates a positive feedback loop for the domestic computing ecosystem. Chip manufacturers, AI framework developers, and enterprise clients can iterate and optimize based on a unified engineering reference, cutting redundant trial-and-error costs across the industry.

VI. Profound Industry Significance: Domestic AI Shifts From “Model Catch-Up” to “Computing Self-Reliance”

The launch of LongCat-2.0 represents far more than another open-source trillion-parameter large model — it marks a critical turning point for China’s domestic AI industry.

In the past, competition among domestic large model players mostly centered on parameter scale and general benchmark scores, with underlying computing power heavily reliant on overseas hardware. Large-scale training projects risked stagnation once supply chains faced restrictions. The successful full operation of a trillion-parameter model on a 50,000-card domestic cluster delivers a proven, viable alternative roadmap:

  • It breaks development limitations imposed by monopolies on overseas computing hardware, granting China’s AI industry a fully independent, controllable end-to-end technical stack.
  • The competitive logic of the large model track is reshaped. Instead of solely competing on the volume of overseas GPU purchases, domestic computing engineering optimization and model-chip co-optimization become core new competitive advantages.
  • It reduces long-term large model operating costs for domestic enterprises, eliminating persistent premium pricing for high-end overseas GPUs and enabling predictable long-term computing expenditure planning.

Within the global open-source AI landscape, as closed-source models like GPT and Claude continuously tighten commercial licensing terms and raise API pricing, this fully self-developed domestic trillion-parameter open model provides global developers with a low-cost, unrestricted alternative solution.

VII. Objective Analysis of Current Limitations for Rational Deployment

As the world’s first trillion-parameter large model fully trained and inferred on domestic computing hardware, LongCat-2.0 still carries unavoidable inherent limitations that developers must account for in advance during implementation:

  1. Its ultra-large-scale multimodal image and video generation capabilities remain immature; its core strengths lie exclusively in text processing, code generation, and logical reasoning.
  2. For ultra-low-latency real-time inference workloads, it delivers slightly higher latency than top-tier overseas GPU clusters under equivalent cost, making it best suited for offline batch processing and non-extreme high-concurrency business scenarios.
  3. The underlying domestic operator ecosystem is still under development; minor secondary adaptation and debugging are required for niche development framework compatibility.

Closing Remarks

From small-scale domestic computing verification three years ago to stable training of the 1.6-trillion-parameter MoE model on a 50,000-card cluster and full open-source release, Meituan’s LongCat-2.0 completes a landmark practical validation for China’s domestic AI computing industry.

Parameter scale is merely superficial. The truly groundbreaking takeaway is its proof that we can independently complete the full lifecycle development of world-class ultra-large models — from training to online deployment — relying entirely on local computing hardware and domestic engineering teams. As this open-source solution proliferates, more enterprises will build proprietary large models powered by domestic computing infrastructure, further accelerating the pace of self-reliance in computing hardware and core AI technology across China’s domestic AI industry.