Pool spare GPU capacity to run LLMs at larger scale (github.com)

by i386 3 comments 11 points
Read article View on HN

3 comments

[−] lostmsu 53d ago

> MoE models via expert sharding with zero cross-node inference traffic

This makes the whole project questionable

[−] vagrantJin 53d ago
This is very promising, definitely looks more user friendly than exo. Can't wait to try it out.
[−] iwinux 53d ago
You lost me on "spare GPU". I don't have any capable GPUs, let alone spare ones :)