Pool spare GPU capacity to run LLMs at larger scale (github.com) by i386 • 3 comments • 11 points Read article View on HN
3 comments
> MoE models via expert sharding with zero cross-node inference traffic
This makes the whole project questionable