Zml-smi: universal monitoring tool for GPUs, TPUs and NPUs (zml.ai)

by steeve 11 comments 77 points
Read article View on HN

11 comments

[−] rdyro 40d ago
[−] serialx 40d ago
Look into all-smi https://github.com/lablup/all-smi It supports all GPUs thinkable including Apple Silicon and many AI accelerator cards.
[−] mrflop 45d ago
Renaming fopen64 to intercept library calls feels like a brittle hack masquerading as "sandboxing." Why not just upstream this hardware support to nvtop instead of fragmenting the ecosystem?
[−] steeve 45d ago
sadly, sandboxing is something that can't be upstreamed. this way, sandboxing is kept in zml instead of patching mesa.

as for nvtop, great program, but we missed a few features (such as sandboxing)

[−] pstuart 40d ago
It looks cool and I was excited to get monitoring for the NPU on my Ryzen AI 395+, unfortunately it does not show. NPU support in linux really seems to be an afterthought.
[−] steeve 40d ago
Weird, because we tried it. It doesn’t show anything?

We use the amdsmi to get metrics. I’ll investigate.

[−] marwanet 40d ago
If this logic were pushed into nvtop, wouldn't the codebase become unmaintainable? Each vendor's interception method is going to be different.
[−] nareyko 40d ago
[dead]
[−] imcritic 40d ago
Is it capable of exposing metrics in Prometheus format?
[−] steeve 40d ago
consider it done
[−] 152334H 40d ago
"NPU" seems to refer to trainium only?
[−] synergy20 40d ago
would be nice to have cpu usage added so I have all in one?

currently I use btop which shows basic gpu load along with cpu, network, etc.