CERN uses ultra-compact AI models on FPGAs for real-time LHC data filtering

[−] intoXbox 49d ago

They used a custom neural net with autoencoders, which contain convolutional layers. They trained it on previous experiment data.

https://arxiv.org/html/2411.19506v1

Why is it so hard to elaborate what AI algorithm / technique they integrate? Would have made this article much better

[−] moffkalast 48d ago

Ah anomaly detection, that makes a lot more sense.

[−] jgalt212 49d ago

Because it does not align with LLM Uber Alles.

[−] dcanelhas 49d ago

I'm half expecting to see "AI model" appearing as stand-in for "linear regression" at this point in the cycle.

[−] ninjagoo 49d ago

> I'm half expecting to see "AI model" appearing as stand-in for "linear regression" at this point in the cycle.

Already the case with consulting companies, have seen it myself

[−] idiotsecant 49d ago

Some career do-nothing-but-make-noise in my organization hired a firm to 'Do AI' on some shitty data and the outcome was basically linear regression. It turns out that you can impressive executives with linear regression if you deliver it enthusiastically enough.

[−] tasuki 49d ago

Tbh, often enough, linear regression is exactly what is needed.

[−] idiotsecant 48d ago

Yes, and we do it every day and call it 'linear regression' and don't need a data center full of expensive toys to do it

[−] mpierini 44d ago

You do unsupervised learning without labels with a linear regression. Interesting. What would you regress in this case? The problem is the following: you have a point cloud of data (electronic signal from arrays arranged into an irregular pattern). You know the physics that was discovered. You are looking for rare events (one in a billion or less) and you don’t know what they look like.

[−] mpierini 45d ago

And you think we did not try linear regressions? This is what we used to do 20 years ago. Then we gained two orders of magnitude in signal-to-background discrimination. And since our data are not even images, off-shelf solutions mostly don’t apply. Try to process40 MHz of incoming collisions (1 MB each) within 100 nsec with a linear regression of point-cloud data. When you are done trying, try to think that maybe (maybe…) life is not as easy as bread&butter. If you succeed, come and knock at CERN’s door. Maybe we will let you in…

[−] ozim 48d ago

Not everyone knows everything so knowledge is the new oil.

I do know about linear regression even had quite some of it at university.

But I still wouldn’t be able to just implement it on some data without good couple days to weeks of figuring things out and which tools to use so I don’t implement it from scratch.

[−] idiotsecant 48d ago

Implement it...from scratch? Its literally least squares regression. Its a few lines of code. What are you trying to say here?

[−] ozim 48d ago

You have to get the data first build all data processing pipelines to get your parameters for linear regression.

[−] mpierini 44d ago

A lot of data. More than google + netflix + you name it. And you have 100 nsec. And the data are not on disk. Good luck with your linear regression

[−] blitzar 49d ago

I'm half expecting to see "AI model" appearing as stand-in for "if > 0" at this point in the cycle.

[−] Foobar8568 49d ago

This is why I am programming now in Ocaml, files themselves are AI ( ml ).

[−] srean 49d ago

I am sure you did not forget that pattern matching.

[−] Vetch 49d ago

This is essentially what any relu based neural network approximately looks like (smoother variants have replaced the original ramp function). AI, even LLMs, essentially reduce to a bunch of code like

    let v0 = 0
    let v1 = 0.40978399*(0.616*u + 0.291*v)
    let v2 = if 0 > v1 then 0 else v1

    let v3 = 0
    let v4 = 0.377928*(0.261*u + 0.468*v)
    let v5 = if 0 > v4 then 0 else v4...

[−] samrus 49d ago

Thats a bit far. Relu does check x>0 but thats just one non-linearity in the linear/non-linear sandwich that makes up universal function approximator theorem. Its more conplex than just x>0

[−] greenavocado 49d ago

Multiply-accumulate, then clamp negative values to zero. Every even-numbered variable is a weighted sum plus a bias (an affine transformation), and every odd-numbered variable is the ReLU gate (max(0, x)). Layer 2 feeds on the ReLU outputs of layer 1, and the final output is a plain linear combination of the last ReLU outputs

    // inputs: u, v
    // --- hidden layer 1 (3 neurons) ---
    let v0  = 0.616*u + 0.291*v - 0.135
    let v1  = if 0 > v0 then 0 else v0
    let v2  = -0.482*u + 0.735*v + 0.044
    let v3  = if 0 > v2 then 0 else v2
    let v4  = 0.261*u - 0.553*v + 0.310
    let v5  = if 0 > v4 then 0 else v4
    // --- hidden layer 2 (2 neurons) ---
    let v6  = 0.410*v1 - 0.378*v3 + 0.528*v5 + 0.091
    let v7  = if 0 > v6 then 0 else v6
    let v8  = -0.194*v1 + 0.617*v3 - 0.291*v5 - 0.058
    let v9  = if 0 > v8 then 0 else v8
    // --- output layer (binary classification) ---
    let v10 = 0.739*v7 - 0.415*v9 + 0.022
    // sigmoid squashing v10 into the range (0, 1)
    let out = 1 / (1 + exp(-v10))

[−] GeorgeTirebiter 48d ago

i let v0 = 0.616u + 0.291v - 0.135 let v1 = if 0 > v0 then 0 else v0

is there something 'less good' about:

    let v1  = if v0 < 0 then 0 else v0

Am I the only one who stutter-parses "0 > value" vs my counterexample?

Is Yoda condition somehow better?

Shouldn't we write: Let v1 = max 0 v0

[−] Vetch 48d ago

The relu/if-then-else is in fact centrally important as it enables computations with complex control flow (or more exactly, conditional signal flow or gating) schemes (particularly as you add more layers).

[−] phire 49d ago

I'm sure I've seen basic hill climbing (and other optimisation algorithms) described as AI, and then used evidence of AI solving real-world science/engineering problems.

[−] LiamPowell 49d ago

Historically this was very much in the field of AI, which is such a massive field that saying something uses AI is about as useful as saying it uses mathematics. Since the term was first coined it's been constantly misused to refer to much more specific things.

From around when the term was first coined: "artificial intelligence research is concerned with constructing machines (usually programs for general-purpose computers) which exhibit behavior such that, if it were observed in human activity, we would deign to label the behavior 'intelligent.'" [1]

[1]: https://doi.org/10.1109/TIT.1963.1057864

[−] zingar 49d ago

That definition moves the goalposts almost by definition, people only stopped thinking that chess demonstrated intelligence when computers started doing it.

[−] Eufrat 49d ago

The term artificial intelligence has always been just a buzzword designed to sell whatever it needed to. IMHO, it has no meaningful value outside of a good marketing term. John McCarthy is usually the person who is given credit for coming up with the name and he has admitted in interviews that it was just to get eyeballs for funding.

[−] coherentpony 49d ago

I am somewhat cynically waiting for the AI community to rediscover the last half a century of linear algebra and optimisation techniques.

At some point someone will realise that backpropagation and adjoint solves are the same thing.

[−] bonoboTP 49d ago

There are plenty of smart people in the "AI community" already who know it. Smugly commenting does not replace actual work. If you have real insight and can make something perform better, I guarantee you that many people will listen (I don't mean twitter influencers but the actual field). If you don't know any serious researcher in AI, I have my doubts that you have any insight to offer.

[−] whattheheckheck 49d ago

I am sure they are aware...

[−] thesz 48d ago

There is an HIGGS dataset [1]. As name suggest, it is designed to apply machine learning to recognize Higgs bozon.

[1] https://archive.ics.uci.edu/ml/datasets/HIGGS

In my experiments, linear regression with extended (addition of squared values) attributes is very much competitive in accuracy terms with reported MLP accuracy.

[−] dguest 48d ago

The LHC has moved on a bit since then. Here's an open dataset that one collaboration used to train a transformer:

https://opendata-qa.cern.ch/record/93940

if you can beat it with linear regression we'd be happy to know.

[−] thesz 48d ago

Thanks.

The paper [1] referenced in your link follows the lagacy of the paper on the HIGGS dataset, and does not operate with quantities like accuracy and/or perplexity. HIGGS dataset paper provided area under ROC, from which one had to approximate accuracy. I used accuracy from the ADMM paper [2] to compare my results with. As I checked later, area under ROC in [1] mostly agrees with [2] SGD training results on HIGGS.

  [1] https://arxiv.org/pdf/2505.19689
  [2] https://proceedings.mlr.press/v48/taylor16.pdf

I think that perplexity measure is appropriate there in [1] because we need to discern between three outcomes. This calls for softmax and for perplexity as a standard measure.

So, my questions are: 1) what perplexity should I target when dealing with "mc-flavtag-ttbar-small" dataset? And 2) what is the split of train/validate/test ratio there?

[−] dguest 47d ago

For better or worse the people working on this don't really use perplexity or accuracy to evaluate models. The target is whatever you'd get for those metrics if you used the discriminants that were provided in the dataset (i.e. the GN2v01 values).

As for why accuracy and perplexity aren't reported: the experiments generally choose a threshold to consider something a "b-hadron" (basically picking a point along the ROC curve) and quantify the TPR and FPR at that point. There are reasons for this, mostly that picking a standard point lets them verify that the simulation actually reflects data. See, for example, the FPR [1] and TPR [2] "calibrations".

It's a good point, though, the physicists should probably try harder to report standard metrics that the rest of the ML community uses.

[1]: https://arxiv.org/pdf/2301.06319

[2]: https://arxiv.org/abs/1907.05120

[−] mpierini 45d ago

Perplexity, aka measuring how much a network is sure about its answer. Which might be wrong. It would not pass the pier review of any particle physics journal. (Real) science is about being right, not about being sure about itself.

[−] mpierini 45d ago

And this problem is a joke compared to a real problem. We are talking about going from 40 MHz to 100 kHz incoming data stream, after which a second layer of real-time selection reduces the data to 1 kHz which is processed, cleaned, elaborated into high level features that you have in that dataset. But if you think you can do better, apply for a CERN job, come here and enlighten us!

[−] yread 49d ago

And why not, when linear regression works, it works so well it's basically magic, better than intelligence, artificial or otherwise

[−] plasino 49d ago

Having work with people who do that, I can guarantee that’s not the case. See https://ssummers.web.cern.ch/conifer/ and HSL4ML, these run BDT and CNN

[−] Staross 49d ago

That works well to get around patents btw :)

[−] etrautmann 49d ago

It seems like most of the implementation is FPGA, which I wouldn’t call “physically burned into silicon.” That’s quite a stretch of language

[−] vultour 49d ago

Because if it’s not an LLM it’s not good for the current hype cycle. Calling everything AI makes the line go up.

[−] danielbln 49d ago

LLMs also make the cynicism go up among the HN crowd.

[−] okamiueru 48d ago

Hm. Is HN starting to become more skeptical of LLMs? For the past couple of years, HN has seemed worryingly enthusiastic about LLMs.

[−] andersonpico 48d ago

How so? Half the people here have LLM delusion in every thread posted here; more than half of the things going to the frontpage are AI. Just look at hours where Americans are awake.

[−] irishcoffee 48d ago

Fucking Americans. Only 4% of the world population, with the magic of disproportionately afflicting the global news headlines which make their way here.

It’s impressive, honestly.

[−] fnord77 49d ago

Thanks for tracking this down. I too am annoyed when so-called technical articles omit the actual techniques.

[−] chsun 48d ago

One of the authors (of one of the two models, not this particular paper) here. Just a clarification, these models are *not* burned into silicon. They are trained with brutal QAT but are put onto fpgas. For axol1tl, the weights are burned in the sense that the weights are hard-wired in the fabric (i.e., shift-add instead of conventional read-muk-add cycle), but not on the raw silicon so the chip can be reprogrammed. Though, for projects like smartpixel or HG-Cal readout, there are similar ones targeting silicon (google something like "smartpixel cern", "HGCAL autoencoder" and you will find them), and I thought it was one of them when viewing the title.

Some slides with more info: https://indico.cern.ch/event/1496673/contributions/6637931/a... The approval process for a full paper is quite lengthy in the collaboration, but a more comprehensive one is coming in the following months, if everything went smoothly.

Regarding the exact algorithm: there are a few versions of the models deployed. Before v4 (when this article was written), they are slides 9-10. The model was trained as a plain VAE that is essentially a small MLP. In inference time, the decoder was stripped and the mu^2 term from the KL div was used as the loss (contributions from terms containing sigma was found to be having negliable impact on signal efficiency). In v5 we added a VICREG block before that and used the reconstruction loss instead. Everything runs in =2 clock cycles at 40MHz clock. Since v5, hls4ml-da4ml flow (https://arxiv.org/abs/2512.01463, https://arxiv.org/abs/2507.04535) was used for putting the model on FPGAs.

For CICADA, the models was trained as a VAE again, but this time distilled with supervised loss on the anomaly score on a calibration dataset. Some slides: https://indico.global/event/8004/contributions/72149/attachm... (not up-to-date, but don't know if there other newer open ones). Both student and teacher was a conventional conv-dense models, can be found in slides 14-15.

Just sell some of my works for running qat (high-granularity quantization) and doing deployment (distributed arithmetic) of NNs in the context of such applications (i.e., FPGA deployment for <1us latency), if you are interested: https://arxiv.org/abs/2405.00645 https://arxiv.org/abs/2507.04535

Happy to take any questions.

[−] jurschreuder 49d ago

I've got news for you, everybody with a modern cpu uses this, which use a perceptron for branch prediction.

[−] serendipty01 49d ago

Might be related: https://www.youtube.com/watch?v=T8HT_XBGQUI (Big Data and AI at the CERN LHC by Dr. Thea Klaeboe Aarrestad)

https://www.youtube.com/watch?v=8IZwhbsjhvE (From Zettabytes to a Few Precious Events: Nanosecond AI at the Large Hadron Collider by Thea Aarrestad)

Page: https://www.scylladb.com/tech-talk/from-zettabytes-to-a-few-...

[−] quijoteuniv 49d ago

A bit of hype in the AI wording here. This could be called a chip with hardcoded logic obtained with machine learning

[−] konradha 49d ago

How are FPGAs "bruned into silicon"? Would be news to me that there are ASICs being taped out at CERN

[−] Surac 49d ago

Very important! This is not a LLM like the ones so often called AI these days. Its a neural network in a FPGA.

CERN uses ultra-compact AI models on FPGAs for real-time LHC data filtering (theopenreader.org)

151 comments