Have You Ever Had This Experience?

You want to build AI-powered tools, yet a few rounds of API calls rack up a bill enough to cover a hot pot dinner. Your manager asks, “Can we deploy an open-source model internally?” You scour the internet, only to hit two major roadblocks: available open-source models either deliver terrible performance or carry exorbitant deployment costs. What’s most frustrating is that state-of-the-art closed-source models operate as complete black boxes — you get zero visibility into their training logic, datasets or underlying architectures.

I know this exasperating predicament all too well. Then I stumbled across the technical paper for Alibaba’s M6 multimodal model, a 10-trillion-parameter system once labeled the world’s largest pre-trained model, with training energy consumption equivalent to merely 1% of GPT-3’s. My first thought was that the data must contain typos — yet the figures were fully verified and accurate.

The Full Development Timeline of M6

In June 2020, Alibaba DAMO Academy kicked off the M6 project, launching its base version with 300 million parameters.

Only 7 months later (January 2021), it scaled to hundreds of billions of parameters, becoming the world’s largest Chinese multimodal model at that time.
Four months afterward (May 2021), the trillion-parameter iteration of M6 launched and entered real-world commercial operation.
Its landmark breakthrough landed in October 2021: M6 upgraded to the world’s first 10-trillion-parameter multimodal foundation model. For context, GPT-3 only holds 175 billion parameters — M6’s parameter count is 57 times larger than GPT-3.

What Value Do Massive Parameter Sizes Bring?

More parameters translate to far more artificial neurons inside the model, enabling it to absorb massive volumes of knowledge and exhibit human-like inductive reasoning capabilities.

M6 boasts native multimodal, multi-task capabilities: it is not limited to text processing alone, but simultaneously interprets images, web pages, audio, video and other heterogeneous data formats. It is far more than a basic chatbot built purely for conversation; it is an all-round AI capable of reading, visual analysis, graphic design and original content creation.

What truly impressed me, however, was not its record-breaking parameter scale — but how the team pulled off such ultra-low energy consumption.

Training a 10-trillion-parameter model would conventionally demand astronomical computing power and electricity. Industry data shows training GPT-3 consumed as much energy as driving a car all the way to the Moon and back. By contrast, the M6 team only deployed 512 pieces of 32G V100 GPUs and completed a usable iteration of the 10-trillion model in just 10 days. For models of identical parameter magnitude, M6’s energy expenditure hits only 1% of GPT-3’s.

To put that 1% efficiency gap into perspective: tasks that require 100 units of electricity for competing models only need a single unit to run on M6.

This revolutionary efficiency was realized via DAMO Academy’s self-developed Whale distributed training framework and a suite of core optimization technologies: expert parallelism strategies, finer-grained CPU offloading, and share-release algorithms. These technical leaps not only streamlined M6’s own training pipeline but also made it feasible to train hundred-billion-parameter models on a single machine.

M6 is far more than a standalone model; it serves as tangible proof of a complete domestic technical stack. It demonstrates that Chinese research teams can not only build ultra-large foundation models but also train them with exceptional cost and energy efficiency — a core competitive technical moat.

Real-World Industrial Deployment Beyond Lab Research

M6 was the first domestic multimodal large model to achieve full commercial rollout, with its capabilities widely integrated into Alibaba’s core business ecosystems:

Tmall: Generates full scripts for virtual live stream hosts
Rhino Intelligent Manufacturing: Designs apparel brand collections that launch directly on Taobao
Taobao & Alipay: Boosts precision for search intent recognition and content understanding
Alipay search vector retrieval: Significantly lifts click-through rates on search results

By the end of 2021, M6 had been deployed across more than 40 business scenarios, handling hundreds of millions of API calls every single day.

In September 2022, DAMO Academy unveiled the full Tongyi large model series, with M6-OFA acting as the unified underlying foundation for all Tongyi products. M6 stands as the direct technical predecessor to today’s Tongyi Qianwen. As of 2025, Alibaba’s Tongyi ecosystem has open-sourced over 300 distinct models, surpassed 6 billion global downloads, and spawned more than 170,000 derivative fine-tuned models.

From a tiny research team of fewer than ten people to building the world’s largest multimodal model, then evolving into the core backbone of Tongyi Qianwen — M6’s development journey encapsulates the full trajectory of China’s large model industry, shifting from technological catch-up to global technological leadership.

Sincere, Practical Recommendations for Different Readers

For Developers & Academic Researchers Studying Ultra-Large Multimodal Models

Read M6’s official research paper (arXiv:2103.00823). It details the full construction workflow of the M6-Corpus training dataset, its two-stage image generation framework, and all distributed training optimization techniques. These granular technical details will never be disclosed by closed-source model vendors.

For Enterprise Technical Leaders Evaluating Large Model Deployment Costs & Business Value

M6’s low-carbon, high-efficiency development roadmap offers critical strategic insight: not every business scenario requires unlimited raw computing power. The true technological edge lies in building larger, more capable models with far fewer hardware and energy resources.

For Ordinary End Users

The Taobao search algorithms and Tmall virtual streamers you interact with daily are all powered by underlying M6-derived technology. Many transformative AI technologies operate quietly in the background, invisible to users yet continuously optimizing your daily digital experience.

M6 may not be the most widely discussed AI model in mainstream media, yet it is drastically underrated by the general public.

After all, few technological breakthroughs can match this quiet, powerful achievement: delivering 100% equivalent model performance with merely 1% of the energy consumption of industry rivals — this kind of understated engineering mastery deserves full recognition and respect.

Alibaba's M6—the world's first 10 trillion-parameter multimodal large model—consumes only 1% of the energy of GPT-3.