Pool spare GPU capacity to run LLMs at larger scale (github.com)

by i386 • 3 comments • 11 points

Read article View on HN

3 comments

[−] lostmsu 53d ago

> MoE models via expert sharding with zero cross-node inference traffic

This makes the whole project questionable

[−] vagrantJin 53d ago

This is very promising, definitely looks more user friendly than exo. Can't wait to try it out.

[−] iwinux 53d ago

You lost me on "spare GPU". I don't have any capable GPUs, let alone spare ones :)