It's not even open weights as generally understood, the non-commercial restriction is pretty severe. The earlier M2.5 model will still be preferred for many purposes.
I've flagged the post, the title is editorialized, the title on the blog post is "MiniMax M2.7: The Agentic Model That Helped Build Itself" (at least at the time of writing this).
I've yet to see a convincing explanation of what make such a “license” legally bounding in the first place.
There's no copyright on model weights themselves (because they are produced purely mechanically without involving human creativity, the same way there's no copyright on compiled artifacts of a piece of software or an h264 encoded movie file).
For software and movies the copyright cover the source material, not the resulting binary, and for LLMs the source material can also be protected by copyright. The problem, is that LLM makers don't own most of the copyright on the source material and worse they claim the training process is transformative enough to erase the copyright of the source material so even the part of the training data for which they own copyright couldn't extend their copyright protection to the weights.
It's very likely that these licenses are entirely devoid of legal value (and I don't think Meta engaged in any legal actions (not even a DMCA takedown) on any of the bazillions llama finetunes violating the llama license on huggingface).
Even the MIT-licensed weights are just that: open weights. Let's not call the weights "source", because they're emphatically not. I can't retrain Qwen from the ground up with different pre-training algorithms, for example.
Model weights are source because they are "the preferred form for modification", e.g. you can use them for fine-tuning. Training a new model from raw data (1) gets you something very different from the original and (2) is computationally unfeasible for most, compared to simpler fine tuning.
I disagree. Fine-tuning, while useful, feels more like patching executables than source code. Besides, just because most people don't compile e.g. Android for themselves doesn't mean that Android should only be distributed in binary form.
I've been using M2.7 through the Alibaba coding plan for a bit now, and am quite impressed with it's coding ability, and even more impressed when I see how small it is. Fascinating really, makes me wonder how big the frontier models are.
What's people's experience of using MiniMax for coding?
I had a really bad time with it. I use (real) Claude Code for work so I know what a good model feels like. MiniMax's token plan is nice but the quality is really far from Claude models.
I needed to constantly "remind" it to get things done. Even for a four sentence prompt in a session that is well below the context window, MiniMax would ignore half of it. This happens all the time. (This is Claude Code + MiniMax API, set up using official instructions)
Basically, if I say get A, B and C done, it will only do A and B. I say, you still need to do C, so it does C but reverts the code for A.
Things that Claude can usually one shot takes 5 iterations with MiniMax.
I ended up switching to Claude to get one of my personal projects done.
"Helped build itself" is a bit of a stretch here, it makes it sound as if the model was doing lasting self-improvements.
What the article describes is that the model was able to tweak to its own deployment harness (memory, skills, experimental loop etc) to improve performance on benchmarks. While impressive, it's not doing any modifications to its own weights by e.g. modifying the training code.
In addition to this conversation already having been started at https://news.ycombinator.com/item?id=47735348 yesterday, MiniMax M2.7 is not open source. The open weights have been released, which is definitely good and follows some of the spirit of open source, but isn't the same thing.
In my experience, even the MiniMax M2.5 is a very capable model with decent capabilities and with some hand holding, can do good investigation into an issue deep down multiple layers of a software stack given you keep asking right questions.
I am pretty sure MiniMax M2.7 would be much better.
36 comments
> Non-commercial use permitted based on MIT-style terms; commercial use requires prior written authorization.
And calling the non-commercial usage "MIT-style terms" is a stretch - they come with a bunch of extra restrictions about prohibited uses.
It's open weights, not open source.
There's no copyright on model weights themselves (because they are produced purely mechanically without involving human creativity, the same way there's no copyright on compiled artifacts of a piece of software or an h264 encoded movie file). For software and movies the copyright cover the source material, not the resulting binary, and for LLMs the source material can also be protected by copyright. The problem, is that LLM makers don't own most of the copyright on the source material and worse they claim the training process is transformative enough to erase the copyright of the source material so even the part of the training data for which they own copyright couldn't extend their copyright protection to the weights.
It's very likely that these licenses are entirely devoid of legal value (and I don't think Meta engaged in any legal actions (not even a DMCA takedown) on any of the bazillions llama finetunes violating the llama license on huggingface).
https://huggingface.co/unsloth/MiniMax-M2.7-GGUF
I've been using M2.7 through the Alibaba coding plan for a bit now, and am quite impressed with it's coding ability, and even more impressed when I see how small it is. Fascinating really, makes me wonder how big the frontier models are.
How does it compare to z.ai GLM?
I had a really bad time with it. I use (real) Claude Code for work so I know what a good model feels like. MiniMax's token plan is nice but the quality is really far from Claude models.
I needed to constantly "remind" it to get things done. Even for a four sentence prompt in a session that is well below the context window, MiniMax would ignore half of it. This happens all the time. (This is Claude Code + MiniMax API, set up using official instructions)
Basically, if I say get A, B and C done, it will only do A and B. I say, you still need to do C, so it does C but reverts the code for A.
Things that Claude can usually one shot takes 5 iterations with MiniMax.
I ended up switching to Claude to get one of my personal projects done.
What the article describes is that the model was able to tweak to its own deployment harness (memory, skills, experimental loop etc) to improve performance on benchmarks. While impressive, it's not doing any modifications to its own weights by e.g. modifying the training code.
I am pretty sure MiniMax M2.7 would be much better.
> That is not a benchmark result. That is a different way of thinking about how AI models get built.
tiresome