Cloudflare's AI Platform: an inference layer designed for agents

[−] mips_avatar 28d ago

So it's basically just openrouter with cloudflare argo networking? I feel like they could do some much more interesting stuff with their replicate acquisition. Application specific RL is getting so good but there's no good way to deploy these models in a scalable way. Even the providers like fireworks which claim to let you deploy LORAs in a scalable way can't do it. For now I literally have to host base load on my application on a rack of 3090s in my garage which seems silly but it saves me $1k a month.

[−] bryden_cruz 28d ago

Running a rack of 3090s in your garage to avoid provider lock-in/costs is the most Hacker News thing. Out of curiosity, what are you doing for uptime/failover? If you are running production traffic to that garage rack, does your app just degrade gracefully if your home internet drops, or do you have a cloud fallback?

[−] mips_avatar 27d ago

Yeah the model i'm running locally is just one of several models the app supports and it falls back to others if not available.

[−] handfuloflight 28d ago

[flagged]

[−] jonfromsf 28d ago

Gilfoyle? Is that you?

[−] mips_avatar 28d ago

I think these gpus were actually used for bitcoin mining before I bought them

[−] menno-sh 27d ago

It's Anton's grandson!

[−] vladgur 28d ago

Curious which models are you able to run and how many 3090s do they require at scale?

[−] mips_avatar 28d ago

4 3090s with nvlinks on each pair. Super fast inference on Moe models around 20-36b

[−] embedding-shape 27d ago

> Super fast inference

How fast is "super fast" exactly, and with what runtime+model+quant specifically? Curious to see how how 4x 3090s compare to 1x Pro 6000, could probably put together 4x 3090s for a fraction of the cost compared to the Pro 6000, but the times I've seen the tok/s in/out for multiple GPUs my heart always drops a little.

[−] ascorbic 28d ago

The interesting part is that you can use the same API with Workers AI models (hosted at the edge) and proxied models (OpenRouter-style).

Disclaimer: I work at Cloudflare, but not on this.

[−] whereistejas 29d ago

This actually looks very useful. Cloudflare seems to be brining together a great set of tools. Not to mention, D2 is literally the only sqlite-as-a-service solution out there whose reliability is great and free tier limits are generous.

[−] james2doyle 28d ago

I find it really confusing that the worker AI models on here: https://developers.cloudflare.com/workers-ai/models/ do not have full overlap with the ones on here: https://developers.cloudflare.com/ai/models/

Yes, you can see the same "hosted" ones on there, but when you look at the models endpoint, there are much less options at the "workers-ai/*" namespace. Is that intentional?

[−] sf_tristanb 28d ago

Sexy, but I wouldn't trust it. Why ? because Cloudflare AI Gateway is reporting inaccurate/wrong price for flagship models such as Nano Banana 2 and Nano Banana pro (I run production app using those). Been reporting it on discord and twitter, and they don't care. Entreprise client here :)

[−] wahnfrieden 29d ago

No spending limit / no ability to set a budget, unlike Google or OpenAI. Be prepared for an eye-watering invoice if you have a bug or get hacked.

edit: Why downvote? It's correct, and it's a risk that competitors handle better, including for their CDN products (compared to Bunny CDN). Maybe you are just used to the risk and haven't felt the burn yourself yet. Or you have the mistaken notion that there is no price at which temporary downtime is worthwhile to avoid paying.

[−] throwpoaster 29d ago

Anthropic gonna acquire Cloudflare for stock. Solves their infrastructure problems in one shot.

[−] ramesh31 29d ago

Big, could be a viable Bedrock alternative. Probably better uptime than Anthropic or AWS, too.

[−] bm-rf 29d ago

Not seeing any pricing info on the models[1] page. Wonder how much of a lift this is over paying providers directly. Perhaps Cloudflare is doing this at cost? Also interesting that zero data retention is not on by default, and is not supported with all providers[2]. Finally, would be great if this could return OpenAI AND Anthropic style completions.

[1] https://developers.cloudflare.com/ai/models/

[2] https://developers.cloudflare.com/ai-gateway/features/unifie...

[−] datadrivenangel 28d ago

Good to see their purchase of Replicate paying off!

[−] erans 27d ago

It's great to see more such platform popping up. It's good for the ecosystem. We need more hosting options that are clear, secure and have the ability to help people run as many models as possible.

[−] strimoza 28d ago

Interesting timing — I've been using Bunny CDN for video delivery and considering moving parts to Cloudflare. Anyone have experience comparing the two for media streaming specifically?

[−] hemangjoshi37a 28d ago

The interesting question isn't "can CF run agent inference" — it's what the routing layer needs to look like for multi-turn workflows. Shipping agent systems to enterprise clients the last year, the bottleneck is never raw tokens/sec. It's (a) state checkpointing betweentool calls, (b) cold-start latency on embedding/rerank models, (c) rate-limit coordination across concurrent agent loops. Does CF expose per-session state, or still stateless-per-request? Without that, you end up building the interesting part yourself.

[−] Invictus0 28d ago

I've been using AI gateway for months already, is this any different or is it just moving out of beta?

[−] VikRubenfeld 28d ago

Is there something free like Codes or AntiGravity that can run open-source LLM models?

[−] messh 28d ago

So, is this similar to openrouter?

[−] pprotas 29d ago

Can't wait for the free tier!

[−] kol3x 28d ago

Would be nice to filter out "proxied" models in the Workers AI page.

[−] TheServitor 28d ago

That's so brilliant that it's already a thing called openrouter!

[−] Jack5500 29d ago

Sadly no mention on regions.

[−] 6thbit 29d ago

don’t attach to a single AI provider when you can attach to cloudflare as your single AI gateway provider!

rant aside, they are greatly positioned network wise to offer this service, i wonder about their princing and potential markup on top of token usage?

i presume they wont let you “manage all your AI spend in one place” for free.

[−] kinnth 28d ago

openrouter works perfectly well for me called by cloudflare workers. open router also has superior cascading and waterfalling if models are offline. Not sure they have that working from V1.

I love everything about openrouter. So kinda a fan boy.

[−] mbtrucks 29d ago

Can I set a hard cost limit ? Else I'm not interested, don't be like googles mess of billing.

[−] ernsheong 29d ago

What is Cloudflare trying to be? Everything everywhere all at once?

[−] reconnecting 28d ago

Unified inference layer is a polite way to say: "proxy that knows every prompt and every response".

[−] mbtrucks 29d ago

Can I set a hard cost limit per day ? With no drift, else I'm not interested.

[−] stult 29d ago

A few weeks ago, I ran into a bug with Cloudflare's DNS server not detecting when I updated the records with the registrar. The bug was 100% on their end, entirely unsolvable by me, yet they have made it literally impossible to contact them to file a bug report. Their standard user help workflow dead-ended by forcing me to talk to their absolutely useless AI help chatbot, which proceeded to regurgitate their FAQ (inaccurately, uselessly), then referred me to a phone number that was disconnected/not in service, then gave me an email address that auto-replied it was no longer in use, then just looped back to the FAQ. There was no way for me to even send them an email to let them know they have a major bug.

I immediately pulled all my sites off of Cloudflare and I will never use that godawful nightmare of a company for anything ever again. If they can't even host a generic help bot without screwing it up that badly, why would I ever use them for anything at all, never mind an AI platform?

[−] RITESH1985 28d ago

[flagged]

[−] ZihangZ 28d ago

[dead]

[−] kantaro 28d ago

[flagged]

[−] redoh 28d ago

[flagged]

Cloudflare's AI Platform: an inference layer designed for agents (blog.cloudflare.com)

95 comments