MiniMax-M1: China’s Open-Source Powerhouse Redefines Large Language Models

19 Jun, 2025

Summary

Minimax, founded in 2021, is a Chinese AI startup backed by Alibaba and Tencent, planning a Hong Kong IPO at $3 billion.
MiniMax M1, launched in June 2025, is an open-weight reasoning model with a 1M token context window, second only to DeepSeek R1 in intelligence.
M1’s key features include a hybrid-attention architecture and cost-effectiveness, competing with closed-source models like OpenAI o3.
The 1M token capability enhances document analysis, conversational AI, and code generation, with applications in legal and tech sectors.
Training cost for M1’s RL step was $0.53M, lower than Deepseek’s $5.6M for full pre-training, suggesting efficient scaling.
Research suggests M1 benchmarks well, scoring 63 on the Artificial Analysis Intelligence Index, close to top models like Gemini 2.5 Pro.

Overview of Minimax and M1

Minimax, a Shanghai-based AI startup founded in 2021, has made waves with its MiniMax M1 model, launched in June 2025. Backed by major investors like Alibaba and Tencent, the company is eyeing a Hong Kong IPO, potentially valuing it at $3 billion. M1 is an open-weight reasoning model, notable for its 1M token context window, making it highly capable for long-context tasks.

Key Features and Differentiation

M1, built on the Text-01 model with 456B total parameters, uses a hybrid-attention architecture combining Lightning Attention and MoE. It’s open-source under Apache 2.0, offering cost-effective API pricing compared to closed-source models like GPT-4o, and scores 63 on the Artificial Analysis Intelligence Index, second to DeepSeek R1.

Advantages of 1M Token Capability

The 1M token context window allows M1 to handle extensive texts, ideal for document summarization, long conversations, and large codebases, with applications in legal analysis, customer service, and software development.

Training Costs Comparison

M1’s RL training cost $0.53M using 512 H800 GPUs for three weeks, lower than Deepseek’s $5.6M for full pre-training, indicating efficient reinforcement learning scaling.

Benchmarking Insights

Research suggests M1 performs competitively, with a 63 score on the Artificial Analysis Intelligence Index, close to top models like OpenAI o3 and Gemini 2.5 Pro, though exact comparisons vary by benchmark.

Detailed Analysis of MiniMax M1 LLM

Introduction to Minimax: History, Investors, and IPO Plans

Minimax, established in 2021 in Shanghai, China, by former SenseTime Group Inc. employees including Yan Junjie, has emerged as a significant player in the AI landscape. The company specializes in large language models (LLMs) and multimodal AI solutions, with products like Hailuo AI for text-to-video generation and Talkie, an AI companion app competing with Character.AI (SiliconANGLE). Its funding history includes a $250 million round in 2023, valuing it at $1.2 billion, with Tencent involvement, followed by a $600 million round in 2024 led by Alibaba, pushing valuation to $2.5 billion. Other investors include Hillhouse Investment, HongShan, and IDG Capital (PitchBook, Reuters).

Minimax is reportedly planning an IPO in Hong Kong, potentially as early as 2025, aiming for a $3 billion valuation. This move, supported by financial advisers, would position it as one of the first major public offerings from China’s AI startup sector, leveraging Hong Kong’s role as a bridge between Chinese and global markets (Bloomberg, China Daily HK).

Description of the M1 Model and Its Differentiating Features

Launched on June 18, 2025, as part of MiniMax Week, MiniMax M1 is the company’s first reasoning model, based on Text-01, an MoE with 456B total parameters and 45.9B active parameters (Artificial Analysis on X). M1 supports text-only input/output and is available in two variants: M1 40K and M1 80K, with thinking budgets of 40,000 and 80,000 tokens, respectively. Its standout feature is a 1M token context window during training, extending to 4M during inference, enabled by a hybrid architecture combining Lightning Attention, Softmax Attention, and MoE (GitHub - MiniMax-AI/MiniMax-M1, arXiv).

M1 is open-sourced under Apache 2.0, unlike Meta’s Llama family (community license) or DeepSeek (partially open), enhancing accessibility (The Register). It scores 63 on the Artificial Analysis Intelligence Index, second among open-weight models behind DeepSeek R1, and is competitive with closed-source models like OpenAI o3 and Gemini 2.5 Pro on benchmarks like AIME 2024 and LiveCodeBench (VentureBeat). Its cost-effectiveness, with API pricing at $0.4 -$ 1.2 per 1M input tokens and $2.1 per 1M output tokens, is 10x cheaper than GPT-4o (Maginative).

Advantages and Key Applications of the 1M Input Token Capability

The 1M token context window allows M1 to process extensive texts, offering advantages in:

Document Analysis: Summarizing long legal texts, research papers, or reports, maintaining context across thousands of pages.
Conversational AI: Supporting extended interactions in chatbots, retaining context for coherent responses over long dialogues.
Code Generation: Handling large codebases for accurate suggestions and debugging, considering the entire project context.
Data Analysis: Analyzing large datasets or logs, extracting insights from historical data, useful in finance and business intelligence.

Applications span legal, finance, software development, and customer service, where long-context reasoning is critical (Medium).

Training Costs and Comparison with Deepseek

M1’s RL training used 512 H800 GPUs for three weeks, costing $0.53M in rental costs, specifically for the RL step ([Artificial Analysis on X](https://x.com/ArtificialAnlys/status/1935311012137402678)). In contrast, Deepseek’s $5.6M cost for DeepSeek V3 covers full pre-training, making direct comparison challenging. However, M1’s lower RL cost suggests efficient scaling of reinforcement learning, a notable trend in AI development.

Benchmarking Against Leading Open-Weight and Closed-Source Models

M1’s performance is robust, with a 63 score on the Artificial Analysis Intelligence Index, second to DeepSeek R1 (68), ahead of Qwen3 235B-A22B (60) and Llama 3.1 Nemotron Ultra (53) (Artificial Analysis on X). On other benchmarks like AIME 2024, LiveCodeBench, SWE-bench Verified, Tau-bench, and MRCR, M1 competes with OpenAI o3, Gemini 2.5 Pro, and Claude 4 Opus, often matching or exceeding them, though vendor-supplied results should be verified independently (The Register).

Below is a table summarizing key benchmark comparisons:

Model	Artificial Analysis Intelligence Index Score	Context Window (Tokens)	Open/Closed Source
DeepSeek R1	68	~1M	Partially Open
MiniMax M1 80K	63	1M (Training), 4M (Inference)	Open
Qwen3 235B-A22B	60	~128K	Closed
Llama 3.1 Nemotron Ultra	53	~128K	Open
OpenAI o3	Competitive (Exact Score Not Provided)	~128K	Closed
Gemini 2.5 Pro	Competitive (Exact Score Not Provided)	~1M	Closed

This table highlights M1’s strong position, particularly in context window and open-source accessibility.

Key Citations

#AI #China #LLM #Minimax #open-weights