I like Mistral, it hits the exact sweet spot between cost and my data staying in the EU, withouth a significant drop in quality, but man are their model naming conventions confusing af. They mention they have a model called Devstral 2, which is neither Codestral nor Devestral. I want to use it, but the api only lists devstral-2512, devstral-latest, devstral-medium-latest, devstral-medium-2507, devstral-small, devstral-small-2507.
I think, devstral-latest should be it, no? So I write to support and get an answer 12 hours later that says oh, no, devstral 2 is definetely called devstral 2 and then a page of instructions on how to set it up in Intellij... generated with AI. The screens it is refering to don't exist and never did.
I have a general impression they are not interested too much in individual devs and making it suite their workflow. They want to be a B2B company and deliver a custom workflow per company.
Or it can just be a Google like problem where a big company one part doesn't talk to the other.
But wouldn't winning devs be a neat helping point in winning b2b contacts? Or they think golf courts are enough for success? Okay they might be right here, but still they make it so confusing for no obvious reason.
In my experience devs rarely have anything to say in B2B contracts. At best they can recommend a solution to the decision maker, but in almost all deals i was a part of they didn’t have any influence on the final decision.
I wish it were otherwise but alas
In my experience, this is only true at large companies (say, >200 employees). Which means the large companies of the future will all be taking their business elsewhere.
> But wouldn't winning devs be a neat helping point in winning b2b contacts?
How? The largest providers that are trying to win devs are locked in a competition to get the devs to continue using the models for free!
The best way to win B2B contracts is to solve the problems that plague business, not those that plague devs. The devs are fickle, have no stickiness and will jump providers to the next free provider, to self-hosted, etc.
Selling to business using Mistral's approach is, I feel, just a good business plan.
"Giving away some credits for free, then making a loss on subscribers" is an absolutely terrible business plan.
To me it's obvious because the size of companies they are targeting (ASML being an obvious one). I think golf course marketing works well in the EU context when decisions are being made not purely on tech reasons.
> I think golf course marketing works well in the EU context when decisions are being made not purely on tech reasons.
It's not like b2b sales is more technical merit based, individual contributor led, elsewhere.
It's always the same, depending on the field individual contributors can have some flexibility on picking tools (so a developer in a mid sized company would be able to pick whatever, an accountant probably would be more constrained, meanwhile a developer at a big bank would not have any choice). But for strategic software choices, that impact the whole company, where standardisation makes sense or is even mandatory to get actual value out of it, you need to sell to high level decision makers, not individual contributors. A CTO or a VP of X can decide to buy and mandate the implementation of something as impactful, workflow changing and potentially time and money saving as a company wide AI platform. A dev can't.
Well different discussion, but look at the Mercosur agreement and all the opposition from farmers in the EU. They are extremely protectionist when it comes to agriculture, at least.
Well, if every big company gets a giant EU fine for, say, preinstalling a web browser in an OS, except for EU companies, that could make it easier for the EU companies.
Apparently you aren't aware of the EU's deep regulatory protectionism and subsidies at both EU and country level. A small portion is legitimately about protecting consumers, but ultimately this stuff is all designed by and for EU industry.
Basically all economic regions get highly protectionist when it comes to key areas like agriculture, banking, steel production, energy, automotive manufacturing, etc.
On tariffs, the US is now higher, but tariffs are a tax that passes through overwhelmingly onto the consumer (by like 95%+). Given there's essentially no fully domestic US manufacturing supply chains and the US imports everything, it's a defacto VAT from the perspective of the consumer. The EU has VAT levels that are still much higher than the average US tariff level, which is a essentially a dampener on consumption.
you might be correct. for example, they have an intellij plugin that allows integration without the AI Assistant, but it is only available for Enterprise customers
Don't sleep on Mistral. Highly underrated as a general service LLM. Cheaper, too. Their emphasis on bespoke modelling over generalized megaliths will pay off. There are all kinds of specialized datasets and restricted access stores that can benefit from their approach. Especially in highly regulated EU.
Not everyone is obsessed with code generation. There is a whole world out there.
I am rooting for Mistral with their different approach: not really competing on the largest and advanced models, instead doing custom engineering for customers and generally serving the needs of EU customers.
> Pre-training allows organizations to build domain-aware models by learning from large internal datasets.
> Post-training methods allow teams to refine model behavior for specific tasks and environments.
How do you suppose this works? They say "pretraining" but I'm certain that the amount of clean data available in proper dataset format is not nearly enough to make a "foundation model". Do you suppose what they are calling "pretraining" is actually SFT and then "post-training" is ... more SFT?
There's no way they mean "start from scratch". Maybe they do something like generate a heckin bunch of synthetic data seeded from company data using one of their SOA models -- which is basically equivalent to low resolution distillation, I would imagine. Hmm.
Mistral has been releasing some cool stuff. Definitively behind on frontier models but they are working a different angle. Was just talking at work about how hard model training is for a small company so we’d probably never do it. But with tools like this, and the new unsloth release, training feels more in reach.
Mistral is doing some really great stuff lately. Sure, it's hard to compete with OpenAI and Anthropic and their models, but they are taking up some interesting takes and designing their product in unique ways.
I like a lot what they are doing and I'll be watching them a lot more closely. I'd love to work for them btw!
How many proprietary use cases truly need pre-training or even fine-tuning as opposed to RAG approach? And at what point does it make sense to pre-train/fine tune? Curious.
This is definitely the smart path for making $$ in AI. I noticed MongoDB is also going into this market with https://www.voyageai.com/ targeting business RAG applications and offering consulting for company-specific models.
Huh. I initially thought this is just another finetuning end point. But apparently they are partnering up with customers on the pretraining side as well. But RL as well? Jeez RL env are really hard to get right. Best wishes I guess.
I think it’s interesting what this approach suggests about who will profit from AI. I’m sceptical that having huge numbers of GPUs is a moat. After all, real humans – even geniuses – are trained on much much less data than the whole Internet. But proprietary and specialised data could very well be a moat. It’s hard to train a scientist/lawyer/analyst without reading a lot of science/law/finance. Companies’ proprietary data might encode a great deal of irreplaceable knowledge. Seems as if Mistral is taking this bet.
Forge enables enterprises to build models that internalize their domain knowledge. Organizations can train models on large volumes of internal documentation, codebases, structured data, and operational records. During training, the model learns the vocabulary, reasoning patterns, and constraints that define that environment.
I'm probably really out of date at this point, but my impression was that fine tuning never really worked that well for knowledge acquisition, and that don't variety of RAG is the way to go here. Fine tuning can affect the "voice", but not really the knowledge.
Interesting. Does this actually scale though ? I've never seen enterprises which have "internal knowledge" in proper readable form - it's often in code, and more importantly in people who wrote them.
I recall that even at Google - with its own search engine and so on - the best way to understand anything was to read code or to reach out to those who wrote them. I don't know how it works in places that work with the "real world" like ASML.
Often the issue is not even about documentation - it's just that it's extremely hard to include all the nuances in text and still have it be readable (code-documentation comes to mind).
Interestingly, I strongly feel that this also where LLMs (and some of our more textually-obsessed academics) fail.
The future of AI is specialization, not just achieving benevolent knowledge as fast as we can at the expense of everything and everyone along the way. I appreciate and applaud this approach. I am looking into a similar product myself. Good stuff.
I find the mistral "middle" between small LMs /1T LMs compelling. Models that are sufficiently big to be performant but specialised for domains and tasks- this is what I assumed we'd always head towards.
My bet is that the solution to continuous learning is with external storage. There is a lot of talk about context engineering - but I have not seen anyone taking context as the main bottleneck and building a system around that.
This would show that even context engineering is kind of wrong term - because context does not enter the llm in some mysterious way - it goes through prompt and the whole model of passing chat history back and forth is not the most efficient way of using the prompt limitation.
I cannot keep up with their products, model names and releases.
What is what for? Their marketing texts do not make sense for me.
Is there a nice overview somewhere?
I am a simple stupid Le Chat user with a small mind and the Tredict MCP Server connected to it (to Le Chat, not my mind), which works ok-ish. :-)
Looks interesting. But how to explore or test or use? The product page (https://mistral.ai/products/forge) also does not contain anything useful. Just "Contact us"
I thought that for pretraining to work and reasoning to emerge you need internet scale data. How can forge achieve it with just internal company data (unless the said company is AT&T or something) ?
193 comments
I think, devstral-latest should be it, no? So I write to support and get an answer 12 hours later that says oh, no, devstral 2 is definetely called devstral 2 and then a page of instructions on how to set it up in Intellij... generated with AI. The screens it is refering to don't exist and never did.
devstral-2512 devstral-latest and devstral-medium-latest are all devstral 2 https://docs.mistral.ai/models/devstral-2-25-12
labs-devstral-small-2512 and devstral-small-latest are devstral small 2
devstral-medium-2507 is devstral 1.0
and devstral-small-2507 is devstral small 1.1
Or it can just be a Google like problem where a big company one part doesn't talk to the other.
> But wouldn't winning devs be a neat helping point in winning b2b contacts?
How? The largest providers that are trying to win devs are locked in a competition to get the devs to continue using the models for free!
The best way to win B2B contracts is to solve the problems that plague business, not those that plague devs. The devs are fickle, have no stickiness and will jump providers to the next free provider, to self-hosted, etc.
Selling to business using Mistral's approach is, I feel, just a good business plan.
"Giving away some credits for free, then making a loss on subscribers" is an absolutely terrible business plan.
> I think golf course marketing works well in the EU context when decisions are being made not purely on tech reasons.
It's not like b2b sales is more technical merit based, individual contributor led, elsewhere.
It's always the same, depending on the field individual contributors can have some flexibility on picking tools (so a developer in a mid sized company would be able to pick whatever, an accountant probably would be more constrained, meanwhile a developer at a big bank would not have any choice). But for strategic software choices, that impact the whole company, where standardisation makes sense or is even mandatory to get actual value out of it, you need to sell to high level decision makers, not individual contributors. A CTO or a VP of X can decide to buy and mandate the implementation of something as impactful, workflow changing and potentially time and money saving as a company wide AI platform. A dev can't.
> being made not purely on tech reasons.
As if that’s not true in the US (not just government contracts but VC in general as well)…
I feel we are way less protectionist than most other Economic Regions. Including the USA, which are very protectionist but always claim otherwise
They get more than 50% of their income from subsidies, are quite well off, but always find a reason to complain.
I was thinking more about stuff like "Buy American"-Regulations for public tenders. Stuff like that doesn't exist here
Basically all economic regions get highly protectionist when it comes to key areas like agriculture, banking, steel production, energy, automotive manufacturing, etc.
On tariffs, the US is now higher, but tariffs are a tax that passes through overwhelmingly onto the consumer (by like 95%+). Given there's essentially no fully domestic US manufacturing supply chains and the US imports everything, it's a defacto VAT from the perspective of the consumer. The EU has VAT levels that are still much higher than the average US tariff level, which is a essentially a dampener on consumption.
>data staying in the EU
This is really why Mistral has any support.
The models are bottom barrel, but its the best Europe has...
Although you could use Chinese models on European servers.
Not everyone is obsessed with code generation. There is a whole world out there.
> Pre-training allows organizations to build domain-aware models by learning from large internal datasets.
> Post-training methods allow teams to refine model behavior for specific tasks and environments.
How do you suppose this works? They say "pretraining" but I'm certain that the amount of clean data available in proper dataset format is not nearly enough to make a "foundation model". Do you suppose what they are calling "pretraining" is actually SFT and then "post-training" is ... more SFT?
There's no way they mean "start from scratch". Maybe they do something like generate a heckin bunch of synthetic data seeded from company data using one of their SOA models -- which is basically equivalent to low resolution distillation, I would imagine. Hmm.
I like a lot what they are doing and I'll be watching them a lot more closely. I'd love to work for them btw!
>
Forge enables enterprises to build models that internalize their domain knowledge. Organizations can train models on large volumes of internal documentation, codebases, structured data, and operational records. During training, the model learns the vocabulary, reasoning patterns, and constraints that define that environment.I'm probably really out of date at this point, but my impression was that fine tuning never really worked that well for knowledge acquisition, and that don't variety of RAG is the way to go here. Fine tuning can affect the "voice", but not really the knowledge.
It's feasible for small models but, I thought small models were not reliable for factual information?
https://docs.mistral.ai/api/endpoint/deprecated/fine-tuning
I recall that even at Google - with its own search engine and so on - the best way to understand anything was to read code or to reach out to those who wrote them. I don't know how it works in places that work with the "real world" like ASML.
Often the issue is not even about documentation - it's just that it's extremely hard to include all the nuances in text and still have it be readable (code-documentation comes to mind).
Interestingly, I strongly feel that this also where LLMs (and some of our more textually-obsessed academics) fail.
I am a simple stupid Le Chat user with a small mind and the Tredict MCP Server connected to it (to Le Chat, not my mind), which works ok-ish. :-)
Dissapointing.
Would love to take it for a spin, if that is even possible.
Is it possible to retrain daily or hourly as info changes?
> Code agents are becoming the primary users of developer tools, so we built Forge for them first, not
... for humans.
...learn a thing or two from NVIDIA or gtfo