Mistral AI Releases Forge (mistral.ai)

by pember 193 comments 733 points
Read article View on HN

193 comments

[−] kioleanu 59d ago
I like Mistral, it hits the exact sweet spot between cost and my data staying in the EU, withouth a significant drop in quality, but man are their model naming conventions confusing af. They mention they have a model called Devstral 2, which is neither Codestral nor Devestral. I want to use it, but the api only lists devstral-2512, devstral-latest, devstral-medium-latest, devstral-medium-2507, devstral-small, devstral-small-2507.

I think, devstral-latest should be it, no? So I write to support and get an answer 12 hours later that says oh, no, devstral 2 is definetely called devstral 2 and then a page of instructions on how to set it up in Intellij... generated with AI. The screens it is refering to don't exist and never did.

[−] IanCal 59d ago
I got really lost on their site, but to help a bit according to their model page

devstral-2512 devstral-latest and devstral-medium-latest are all devstral 2 https://docs.mistral.ai/models/devstral-2-25-12

labs-devstral-small-2512 and devstral-small-latest are devstral small 2

devstral-medium-2507 is devstral 1.0

and devstral-small-2507 is devstral small 1.1

[−] kioleanu 59d ago
wow, thank you, this is great. I was thinking they should have a page like this, but I couldn't find myself.
[−] newswasboring 59d ago
I have a general impression they are not interested too much in individual devs and making it suite their workflow. They want to be a B2B company and deliver a custom workflow per company.

Or it can just be a Google like problem where a big company one part doesn't talk to the other.

[−] soco 59d ago
But wouldn't winning devs be a neat helping point in winning b2b contacts? Or they think golf courts are enough for success? Okay they might be right here, but still they make it so confusing for no obvious reason.
[−] MidnightRider39 59d ago
In my experience devs rarely have anything to say in B2B contracts. At best they can recommend a solution to the decision maker, but in almost all deals i was a part of they didn’t have any influence on the final decision. I wish it were otherwise but alas
[−] yunwal 56d ago
In my experience, this is only true at large companies (say, >200 employees). Which means the large companies of the future will all be taking their business elsewhere.
[−] lelanthran 59d ago

> But wouldn't winning devs be a neat helping point in winning b2b contacts?

How? The largest providers that are trying to win devs are locked in a competition to get the devs to continue using the models for free!

The best way to win B2B contracts is to solve the problems that plague business, not those that plague devs. The devs are fickle, have no stickiness and will jump providers to the next free provider, to self-hosted, etc.

Selling to business using Mistral's approach is, I feel, just a good business plan.

"Giving away some credits for free, then making a loss on subscribers" is an absolutely terrible business plan.

[−] newswasboring 59d ago
To me it's obvious because the size of companies they are targeting (ASML being an obvious one). I think golf course marketing works well in the EU context when decisions are being made not purely on tech reasons.
[−] sofixa 59d ago

> I think golf course marketing works well in the EU context when decisions are being made not purely on tech reasons.

It's not like b2b sales is more technical merit based, individual contributor led, elsewhere.

It's always the same, depending on the field individual contributors can have some flexibility on picking tools (so a developer in a mid sized company would be able to pick whatever, an accountant probably would be more constrained, meanwhile a developer at a big bank would not have any choice). But for strategic software choices, that impact the whole company, where standardisation makes sense or is even mandatory to get actual value out of it, you need to sell to high level decision makers, not individual contributors. A CTO or a VP of X can decide to buy and mandate the implementation of something as impactful, workflow changing and potentially time and money saving as a company wide AI platform. A dev can't.

[−] wqaatwt 59d ago

> being made not purely on tech reasons.

As if that’s not true in the US (not just government contracts but VC in general as well)…

[−] R0m41nJosh 59d ago
As far as I understood the French president is pushing French most valuated companies to use Mistral. There can't be a more to down strategy :)
[−] philipallstar 59d ago
Also EU protectionism itself might be enough.
[−] hermanzegerman 59d ago
Where is EU protectionist?

I feel we are way less protectionist than most other Economic Regions. Including the USA, which are very protectionist but always claim otherwise

[−] brabel 59d ago
Well different discussion, but look at the Mercosur agreement and all the opposition from farmers in the EU. They are extremely protectionist when it comes to agriculture, at least.
[−] hermanzegerman 59d ago
Yes the farmers are a very vocal and powerful minority.

They get more than 50% of their income from subsidies, are quite well off, but always find a reason to complain.

I was thinking more about stuff like "Buy American"-Regulations for public tenders. Stuff like that doesn't exist here

[−] philipallstar 59d ago
Well, if every big company gets a giant EU fine for, say, preinstalling a web browser in an OS, except for EU companies, that could make it easier for the EU companies.
[−] pembrook 59d ago
Apparently you aren't aware of the EU's deep regulatory protectionism and subsidies at both EU and country level. A small portion is legitimately about protecting consumers, but ultimately this stuff is all designed by and for EU industry.

Basically all economic regions get highly protectionist when it comes to key areas like agriculture, banking, steel production, energy, automotive manufacturing, etc.

On tariffs, the US is now higher, but tariffs are a tax that passes through overwhelmingly onto the consumer (by like 95%+). Given there's essentially no fully domestic US manufacturing supply chains and the US imports everything, it's a defacto VAT from the perspective of the consumer. The EU has VAT levels that are still much higher than the average US tariff level, which is a essentially a dampener on consumption.

[−] victorbjorklund 59d ago
Like American protectionism? Heck, America even prohibits its own companies to sell to the government if the president doesn't like them enough.
[−] kioleanu 59d ago
you might be correct. for example, they have an intellij plugin that allows integration without the AI Assistant, but it is only available for Enterprise customers
[−] Manfred 59d ago
I had the same experience. It's even more confusing when you want to create an API key because they are separated by product, maybe?
[−] butILoveLife 59d ago

>data staying in the EU

This is really why Mistral has any support.

The models are bottom barrel, but its the best Europe has...

Although you could use Chinese models on European servers.

[−] ogou 59d ago
Don't sleep on Mistral. Highly underrated as a general service LLM. Cheaper, too. Their emphasis on bespoke modelling over generalized megaliths will pay off. There are all kinds of specialized datasets and restricted access stores that can benefit from their approach. Especially in highly regulated EU.

Not everyone is obsessed with code generation. There is a whole world out there.

[−] mark_l_watson 60d ago
I am rooting for Mistral with their different approach: not really competing on the largest and advanced models, instead doing custom engineering for customers and generally serving the needs of EU customers.
[−] upghost 60d ago

> Pre-training allows organizations to build domain-aware models by learning from large internal datasets.

> Post-training methods allow teams to refine model behavior for specific tasks and environments.

How do you suppose this works? They say "pretraining" but I'm certain that the amount of clean data available in proper dataset format is not nearly enough to make a "foundation model". Do you suppose what they are calling "pretraining" is actually SFT and then "post-training" is ... more SFT?

There's no way they mean "start from scratch". Maybe they do something like generate a heckin bunch of synthetic data seeded from company data using one of their SOA models -- which is basically equivalent to low resolution distillation, I would imagine. Hmm.

[−] roxolotl 60d ago
Mistral has been releasing some cool stuff. Definitively behind on frontier models but they are working a different angle. Was just talking at work about how hard model training is for a small company so we’d probably never do it. But with tools like this, and the new unsloth release, training feels more in reach.
[−] jcmartinezdev 59d ago
Mistral is doing some really great stuff lately. Sure, it's hard to compete with OpenAI and Anthropic and their models, but they are taking up some interesting takes and designing their product in unique ways.

I like a lot what they are doing and I'll be watching them a lot more closely. I'd love to work for them btw!

[−] ryeguy_24 60d ago
How many proprietary use cases truly need pre-training or even fine-tuning as opposed to RAG approach? And at what point does it make sense to pre-train/fine tune? Curious.
[−] dmix 60d ago
This is definitely the smart path for making $$ in AI. I noticed MongoDB is also going into this market with https://www.voyageai.com/ targeting business RAG applications and offering consulting for company-specific models.
[−] csunoser 60d ago
Huh. I initially thought this is just another finetuning end point. But apparently they are partnering up with customers on the pretraining side as well. But RL as well? Jeez RL env are really hard to get right. Best wishes I guess.
[−] dash2 59d ago
I think it’s interesting what this approach suggests about who will profit from AI. I’m sceptical that having huge numbers of GPUs is a moat. After all, real humans – even geniuses – are trained on much much less data than the whole Internet. But proprietary and specialised data could very well be a moat. It’s hard to train a scientist/lawyer/analyst without reading a lot of science/law/finance. Companies’ proprietary data might encode a great deal of irreplaceable knowledge. Seems as if Mistral is taking this bet.
[−] losvedir 59d ago

>

Forge enables enterprises to build models that internalize their domain knowledge. Organizations can train models on large volumes of internal documentation, codebases, structured data, and operational records. During training, the model learns the vocabulary, reasoning patterns, and constraints that define that environment.

I'm probably really out of date at this point, but my impression was that fine tuning never really worked that well for knowledge acquisition, and that don't variety of RAG is the way to go here. Fine tuning can affect the "voice", but not really the knowledge.

[−] jbverschoor 59d ago
ASML and ESA as clients means something. I dont expect to see the first name somewhere else on the logo list
[−] andai 60d ago
They mention pretraining too, which surprises me. I thought that was prohibitively expensive?

It's feasible for small models but, I thought small models were not reliable for factual information?

[−] rorylawless 60d ago
The fine tuning endpoint is deprecated according to the API docs. Is this the replacement?

https://docs.mistral.ai/api/endpoint/deprecated/fine-tuning

[−] tho23i42342397 59d ago
Interesting. Does this actually scale though ? I've never seen enterprises which have "internal knowledge" in proper readable form - it's often in code, and more importantly in people who wrote them.

I recall that even at Google - with its own search engine and so on - the best way to understand anything was to read code or to reach out to those who wrote them. I don't know how it works in places that work with the "real world" like ASML.

Often the issue is not even about documentation - it's just that it's extremely hard to include all the nuances in text and still have it be readable (code-documentation comes to mind).

Interestingly, I strongly feel that this also where LLMs (and some of our more textually-obsessed academics) fail.

[−] hermit_dev 60d ago
The future of AI is specialization, not just achieving benevolent knowledge as fast as we can at the expense of everything and everyone along the way. I appreciate and applaud this approach. I am looking into a similar product myself. Good stuff.
[−] alansaber 59d ago
I find the mistral "middle" between small LMs /1T LMs compelling. Models that are sufficiently big to be performant but specialised for domains and tasks- this is what I assumed we'd always head towards.
[−] zby 59d ago
My bet is that the solution to continuous learning is with external storage. There is a lot of talk about context engineering - but I have not seen anyone taking context as the main bottleneck and building a system around that. This would show that even context engineering is kind of wrong term - because context does not enter the llm in some mysterious way - it goes through prompt and the whole model of passing chat history back and forth is not the most efficient way of using the prompt limitation.
[−] speedgoose 59d ago
I was enthusiastic but it’s "contact us" priced for now. I was expecting a classic cloud LLM forge with a public pricing.
[−] Aldipower 59d ago
I cannot keep up with their products, model names and releases. What is what for? Their marketing texts do not make sense for me. Is there a nice overview somewhere?

I am a simple stupid Le Chat user with a small mind and the Tredict MCP Server connected to it (to Le Chat, not my mind), which works ok-ish. :-)

[−] thecopy 59d ago
Looks interesting. But how to explore or test or use? The product page (https://mistral.ai/products/forge) also does not contain anything useful. Just "Contact us"

Dissapointing.

[−] whatever1 59d ago
I thought that for pretraining to work and reasoning to emerge you need internet scale data. How can forge achieve it with just internal company data (unless the said company is AT&T or something) ?
[−] apexalpha 59d ago
This looks good but how much money are we talking here? Are we 'retraining' an entire model but adding enterprise data to the public data set?
[−] krinne 59d ago
I wasnt able to find a way to access this - is this something accessible only to enterprises ?

Would love to take it for a spin, if that is even possible.

[−] spacesh1psoda 59d ago
Go EU!
[−] aavci 60d ago
How does this compare to fine tuning?
[−] bsjshshsb 60d ago
Id training or FT > context? Anyone have experience.

Is it possible to retrain daily or hourly as info changes?

[−] Havoc 59d ago
Good for them. Really hope they find market fit
[−] supernes 59d ago

> Code agents are becoming the primary users of developer tools, so we built Forge for them first, not

... for humans.

[−] burgerquizz 59d ago
can i use mistral to read my source code and teach it so i don't need to inject the whole doc every single time and consume token every single time?
[−] dragochat 59d ago
where sample notebook/script? where github? where signup?

...learn a thing or two from NVIDIA or gtfo

[−] todteera 59d ago
[flagged]