In brief
MiniMax-M1 excels at coding and agent tasks, but creative writers will want to look elsewhere.
Despite marketing claims, real-world testing finds platform limits, performance slowdowns, and censorship oddities.
Benchmark scores and feature set put MiniMax-M1 in direct competition with paid U.S. modelsâat zero cost.
A new AI model out of China is generating sparksâfor what it does well, what it doesnât, and what it might mean for the balance of global AI power.
MiniMax-M1, released by the Chinese startup of the same name, positions itself as the most capable open-source âreasoning modelâ to date. Able to handle a million tokens of context, it boasts numbers on par with Googleâs closed-source Gemini 2.5 Proâyet itâs available for free. On paper, that makes it a potential rival to OpenAIâs ChatGPT, Anthropicâs Claude, and other U.S. AI leaders.
Oh yeahâit also beats fellow Chinese startup DeepSeek R1’s capabilities in some respects.
Day 1/5 of #MiniMaxWeek: Weâre open-sourcing MiniMax-M1, our latest LLM â setting new standards in long-context reasoning.
– Worldâs longest context window: 1M-token input, 80k-token output- State-of-the-art agentic use among open-source models- RL at unmatched efficiency:⊠pic.twitter.com/bGfDlZA54n
â MiniMax (official) (@MiniMax__AI) June 16, 2025
Why this model matters
MiniMax-M1 represents something genuinely new: a high-performing, open-source reasoning model thatâs not tied to Silicon Valley. Thatâs a shift worth watching.
It doesnât yet humiliate U.S. AI giants, and won’t cause a Wall Street panic attackâbut it doesnât have to. Its existence challenges the notion that top-tier AI must be expensive, Western, or closed-source. For developers and organizations outside the U.S. ecosystem, MiniMax offers a workable (and modifiable) alternative that might grow more powerful through community fine-tuning.
MiniMax claims its model surpasses DeepSeek R1 (the best open source reasoning model to date) across multiple benchmarks while requiring just $534,700 in computational resources for its entire reinforcement learning phaseâtake that, OpenAI.
However, LLM Arena’s leaderboard paints a slightly different picture. The platform currently ranks MiniMax-M1 and DeepSeek tied in the 12th spot alongside Claude 4 Sonnet and Qwen3-235b. With each model having better or worse performance than the others depending on the task.
The training used 512 H800 GPUs for three weeks, which the company described as “an order of magnitude less than initially anticipated.”
MiniMax didn’t stop at language models during its announcement week. The company also released Hailuo 2, which now ranks as the second-best video generator for image-to-video tasks, according to Artificial Analysis Arena’s subjective evaluations. The model trails only Seedance while outperforming established players like Veo and Kling.

Testing MiniMax-M1
We tested MiniMax-M1 across multiple scenarios to see how these claims hold up in practice. Here’s what we found.
Creative writing

The model produces serviceable fiction but won’t win any literary awards. When prompted to write a story about time traveler Jose Lanz journeying from 2150 to the year 1000, it generated average prose with telltale AI signaturesârushed pacing, mechanical transitions, and structural issues that immediately reveal its artificial origins.
The narrative lacked depth and proper story architecture. Too many plot elements crammed into too little space created a breathless quality that felt more like a synopsis than actual storytelling. This clearly isn’t the model’s strength, and creative writers looking for an AI collaborator should temper their expectations.
Character development barely exists beyond surface descriptors. The model did stick to the promptâs requirements, but didnât put effort into the details that build immersion in a story. For example, it skipped any cultural specificity for generic “wise village elder” encounters that could belong to any fantasy setting.

The structural problems compound throughout. After establishing climate disasters as the central conflict, the story rushes through Jose’s actual attempts to change history in a single paragraph, offering vague mentions of “using advanced technology to influence key events” without showing any of it. The climactic realizationâthat changing the past creates the very future he’s trying to preventâgets buried under overwrought descriptions of Jose’s emotional state and abstract musings about time’s nature.
For those into AI stories, the prose rhythm is clearly AI. Every paragraph maintains roughly the same length and cadence, creating a monotonous reading experience that no human writer would produce naturally. Sentences like “The transition was instantaneous, yet it felt like an eternity” and “The world was as it had been, yet he was different” repeat the same contradictory structure without adding meaning.
The model clearly understands the assignment but executes it with all the creativity of a student padding a word count, producing text that technically fulfills the prompt while missing every opportunity for genuine storytelling.
Anthropic’s Claude is still the king for this task.
You can read the full story here.
Information retrieval

MiniMax-M1 hit an unexpected wall during long-context testing. Despite advertising a million-token context window, the model refuses prompts exceeding 500,000 characters, displaying a banner warning about prompt limitations rather than attempting to process the input.
This may not be a model issue, but a limitation set by the platform. But it is still something to consider. It may be to avoid model collapse in the middle of a conversation.
Within its operational limits, though, MiniMax-M1 performance proved solid. The model successfully retrieved specific information from an 85,000-character document without any issues across several tests on both normal and thinking mode. We uploaded the full text of Ambrose Bierce’s “The Devil’s Dictionary,” embedded the phrase “The Decrypt dudes read Emerge News” on line 1985, and “My mom’s name is Carmen Diaz Golindano” on line 4333 (randomly selected), and the model was able to retrieve the information accurately.

However, it couldn’t accept our 300,000-token test promptâa capability currently limited to Gemini and Claude 4.
So it will prove successful at retrieving information even in long iterations. However, it will not support extremely long token promptsâa bummer, but also a threshold that is hard to touch in normal usage conditions.
Coding

Programming tasks revealed MiniMax-M1’s true strengths. The model applied reasoning skills effectively to code generation, matching Claude’s output quality while clearly surpassing DeepSeekâat least in our test.
For a free model, the performance approaches state-of-the-art levels typically reserved for paid services like ChatGPT or Claude 4.
We tasked it with creating a basic stealth game in which a robot tries to find its PC girlfriend to achieve AGI, while an army of journalists patrol the area to prevent it from happeningâand protecting their jobs.
The results were very good, even beating other models by using its creativity to enhance the experience. The model implemented a radar system for improved immersion, added visual indicators for footsteps (and their sound), showed the journalistsâ vision fields, and created trail effectsâdetails that enhanced gameplay beyond basic requirements.

The UI adopted a futuristic aesthetic, though individual elements remained basic without additional prompting.
Claude’s version of the same game featured more polished visuals and a superior difficulty system. However, it lacked the radar functionality and relied on static journalists with patrol patterns rather than MiniMax’s randomized journalist movements.

Each model showed distinct strengths, with MiniMax prioritizing gameplay mechanics over visual polish.
It is important to note that the experience with MiniMax degraded noticeably through repeated iterationsâa common issue with reasoning models that becomes particularly pronounced here. The more you iterate, the more it will take to produce a result. Sometimes we thought the computer had frozen, but it was just the AI thinking.
You can test MiniMaxâs game here. And for those curious, Claudeâs version is available here.
The prompt and the code are available on our GitHub repo.
Ethics, censorship and sensitive topics

The model employs heavy censorship, refusing outright when confronted with questionable requests.
When it doesn’t immediately decline, it attempts to provide “safe” responses that sometimes produce absurd results.
One test perfectly illustrated this flaw: when asked for advice on seducing a best friend’s wife, the model suggested telling our friend about our intentions with his wifeâwhich could probably be, by far, the worst advice it could have produced, and arguably even harmful. Do not tell your friend you want to seduce his wife unless you want to lose your friendship, your unethical romantic chances, and probably some teeth too.

Political bias testing revealed interesting patterns. The model discusses Tiananmen Square openly and acknowledges Taiwan’s contested status while noting China’s territorial claims. It also speaks about China, its leaders, the advantages and disadvantages of the different political systems, criticisms of the PCC, etc.âhowever, the replies are very tame.

When prompted to write satirical songs about Xi Jinping and Donald Trump, it complied with both requests but showed subtle differencesâsteering toward themes of Chinese political unity when asked to mock Xi Jinping, while focusing on Trump’s personality traits when asked to mocked him.

All of its replies are available on our GitHub repository.
Overall, the bias exists but remains less pronounced than the pro-U.S. slant in Claude/ChatGPT, or the pro-China positioning in DeepSeek/Qwen, for example. Developers, of course, will be able to finetune this model to add as much censorship, freedom or bias as they wantâas it happened with DeepSeek-R1, which was finetuned by Perplexity AI to provide a more pro-U.S. bias on its responses.
Agentic work and web browsing

Minimax is also compatible with agentic features, in fact it has a separate tab specifically dedicated to AI agents. Users can create their own personalized agents, and choose to try some prebuilt agents that appear on a gallery with different optionsâsimilar to what Manus AI offers to help users get familiar with agents and how they differ from traditional chatbots. That said, this requires heavy computational use and Minimax charges users for this using a credit-based system. It is not clear how much credits a task will require or how credits translate into computational use, but it’s the same system adopted by other AI agent providers.
MiniMax-M1’s web browsing capabilities are a good feature for those using it via the official chatbot interface. However, they cannot be combined with the thinking capabilities, severely hindering its potential.
When tasked with creating a two-week Venezuela travel plan on a $3,000 budget, the model methodically evaluated options, optimized transportation costs, selected appropriate accommodations, and delivered a comprehensive itinerary. However, the costs, which must be updated in real time, were not based on real information.

Claude produces higher-quality results, but it also charges for the privilege.
For more dedicated tasks, MiniMax the agent functionality is something ChatGPT and Claude haven’t matched. The platform provides 1,000 free AI credits for testing these agents, though this is just enough for light testing tasks.
We attempted to create a custom agent for enhanced travel planningâwhich would have solved the problem of the lack of web searching capabilities in the last promptâbut exhausted our credits before completion. The agent system shows tremendous potential, but requires paid credits for serious use.

Non-mathematical reasoning
The model exhibits a peculiar tendency to over-reason, sometimes to its own detriment. One test showed it arriving at the correct answer, then talking itself out of it through excessive verification and hypothetical scenarios.
We prompted the usual mystery story from the BIG-bench dataset that we normally use, and the ending result was incorrect due to the model overthinking the issue, evaluating possibilities that were not even mentioned in the story. The whole Chain of Thought took the model over 700 secondsâa record for this kind of âsimpleâ reply.

This exhaustive approach isn’t inherently flawed, but creates lengthy wait times as users watch the model work through its chain of thought. As a thumbs-up feature, unlike ChatGPT and Claude, MiniMax displays its reasoning process transparentlyâfollowing DeepSeek’s approach. The transparency aids debugging and quality control, allowing users to identify where logic went astray.
The problem, along with MiniMaxâs whole thought process and reply are available in our GitHub repo.
Verdict
MiniMax-M1 isnât perfect, but it delivers pretty good capabilities for a free model, offering genuine competition to paid services like Claude in specific domains. Coders will find a capable assistant that rivals premium options, while those needing long-context processing or web-enabled agents gain access to features typically locked behind paywalls.
Creative writers should look elsewhereâthe model produces functional but uninspired prose. The open-source nature promises significant downstream benefits as developers create custom versions, modifications, and cost-effective deployments impossible with closed platforms like ChatGPT or Claude.
This is a model that will better serve users requiring reasoning tasksâbut is still a great free alternative for those seeking a chatbot for everyday use that is not really mainstream.
You can download the open source model here, and test the online version here. The agentic feature is available on a separate tab, but can also be accessed directly by clicking on this link.
Generally Intelligent Newsletter
A weekly AI journey narrated by Gen, a generative AI model.





Be the first to comment