I somehow find the concept of a general time series model strange. How can the same model predict egg prices in Italy, and global inflation in a reliable way?
And how would you even use this model, given that there are no explanations that help you trust where the prediction comes from…
That's what traditional time-series modelling does. This is a foundational model, which means it's just a neural network trained on lots of time series. (So maybe OP's question still stands? But it's the same question as "how can LLMs be good at so many different kinds of conversations?")
Because traditional time-series modelling (ARIMA, GARCH, ...) is too "simple" and "strict". Just like "simple" computer vision (OpenCV, edge-detection, ...) was crushed by neural networks when having to deal with real world images.
This seemed like a good answer at first. But on further thought, images on the whole really do seem to have quite a bit more standard structure / "grammar" to exploit compared to arbitrary time-series. Many images are of the world, where there is gravity so you might see preponderance of blobs at the bottom, or the repetitive types like people, animals, faces, eyes. Wildly abstract images still have some continuity, pixels in a neighborhood are likely to be similar.
Time series in general have none of this kind of structure that's strictly necessary. I'm sure that many real-world sensors typically have some gaussian distribution aspects + noise and/or smoothness and locality types of assumptions that are pretty safe, but presumably that simple stuff is exactly what traditional time-series modelling was exploiting.
Maybe the real question is just what kind of time-series are in the training data, and why do we think whatever implicit structure that is there actually generalizes? I mean, you can see how any training that mixes pictures of dogs and cats with picturing of people could maybe improve drawing hair, detecting hair, or let you draw people AND dogs. It's less clear to me how mixing sensor data / financial data / anything else together could be helpful.
> It's less clear to me how mixing sensor data / financial data / anything else together could be helpful.
Because many of these have the same underlying causal structures - humans doing things, weather correlations, holidays.
Well studied behavioral stuff like "the stock market takes the stairs up and the elevator down" which is not really captured by "traditional" modelling tools.
I'm sure people will be doing mechanical interpretation on these models to extract what they pattern match for prediction.
I am not familiar with time series models, but judging from your answer, it would be necessary to feed long time series into this model for it to detect trends. What is a token here? Can it, for the lack of a better example, take in all intraday movements of a stock for a day, a week, a month, etc?
Do these models predict on just a single time series then?
it is far more useful for predictions to look for correlations between time series. This is far more complex than looking for correlations in general because most time series trend up or down and therefore correlate.
ar(k) stuff, sure. that's old news. i would expect the newfangled stuff to be good at 0-shot learning of pre-event signatures spread across multiple series, at a minimum.
My understanding is that the synthetic training data helps capture abstract time-series patterns that are common in all domains.
As they say in appendix 8:
> We create the synthetic data to reflect common time-series patterns using traditional statistical models. We start with four simple times series patterns:
> • Piece-wise linear trends (I), where the number of the piece-wise linear components is randomly chosen between 2 and 8.
> • ARMA(p, q) (II), where 1 ≤ p, q ≤ 8 and the corresponding coefficients are generated from either a multivariate Gaussian or a uniform, then normalized.
> • Seasonal patterns. In particular we create the sine (III) and the cosine (IV) waves of different random periods between 4 and max context length / 2 time-points and time delays.
If there were no such underlying patterns in the class of all time-series data, then even the idea of traditional time-series models would be fundamentally misplaced.
And since this is a transformer model, it also looks for patterns in the problem-specific input data at inference time, just like how the input context to an LLM influences its output's relevance.
When I worked on Google Ads, we used time series forecasting to compute the odds of an ad campaign reaching its goal (and to tell users how likely they were to hit them).
A ton of (unsophisticated) advertisers would just draw a line from zero to the number they are at today and project that line to the end of the month to forecast the amount of conversions/spend they were going to hit. This of course doesn't take into account various seasonalities (day-of-week, time-of-year, etc.) and gives you a pretty poor forecast. Compared to those, time-series forecasting is much more accurate.
Is it perfectly accurate? No, that's impossible. But when you can train a model on all advertising campaigns, you can give good 95% confidence intervals.
So, predict sign (branch predictors in modern CPUs also use neural networks of sorts), exponent (most probably it changes slowly) and then predict mantissa using Benford's law.
So the time series are provided with no context? It's just trained on lots of sets of numbers? Then you give it a new set of numbers and it guesses the rest, again with no context?
My guess as to how this would work: the machine will first guess from the data alone if this is one of the categories it has already seen/inferred (share prices, google trend cat searches etc.) Then it'll output a plausible completion for the category.
That doesn't seem as if it will work well for any categories outside the training data. I would rather just use either a simple model (ARIMA or whatever) or a theoretically-informed model. But what do I know.
The Cisco Time Series model is inspired by this model from Google. This one is targeted at observability data and I can confirm it works great in that context https://github.com/splunk/cisco-time-series-model
Let's say I have long time series of past solar irradiation and long time series of past weather forecasts. Can this model make use of weather forecasts for time X in the future to predict electricity prices in the future?
That is, can it use one time series at time X to predict another time series at time X?
Or is this strictly about finding patterns WITHIN a time series.
Same with all tech scams, Even if you magically assume that they could solve their problem with this tech why on earth would they give it to the public, for free or for a price. Alphabet would just become the best quantitative hedgefund in the world.
I'm willing to bet an intelligent LLM with a dataset and a pandas stats package could outperform this model by running its own experiments and making predictions
109 comments
And how would you even use this model, given that there are no explanations that help you trust where the prediction comes from…
They decompose a time series into trends, seasonality and residuals. That’s what they are actually modelling.
They cannot predict wars in the Middle East influencing inflation unless there is a seasonal pattern(s).
> They cannot predict wars in the Middle East influencing inflation unless there is a seasonal pattern(s).
well...
(for those who are lost: https://x.com/onionweigher/status/1936630237208469898)
Born too late to deploy to the Middle East.
Born just in time to deploy to the Middle East.
Time series in general have none of this kind of structure that's strictly necessary. I'm sure that many real-world sensors typically have some gaussian distribution aspects + noise and/or smoothness and locality types of assumptions that are pretty safe, but presumably that simple stuff is exactly what traditional time-series modelling was exploiting.
Maybe the real question is just what kind of time-series are in the training data, and why do we think whatever implicit structure that is there actually generalizes? I mean, you can see how any training that mixes pictures of dogs and cats with picturing of people could maybe improve drawing hair, detecting hair, or let you draw people AND dogs. It's less clear to me how mixing sensor data / financial data / anything else together could be helpful.
> It's less clear to me how mixing sensor data / financial data / anything else together could be helpful.
Because many of these have the same underlying causal structures - humans doing things, weather correlations, holidays.
Well studied behavioral stuff like "the stock market takes the stairs up and the elevator down" which is not really captured by "traditional" modelling tools.
I'm sure people will be doing mechanical interpretation on these models to extract what they pattern match for prediction.
Or other low-dimensional time domain signals?
it is far more useful for predictions to look for correlations between time series. This is far more complex than looking for correlations in general because most time series trend up or down and therefore correlate.
As they say in appendix 8:
> We create the synthetic data to reflect common time-series patterns using traditional statistical models. We start with four simple times series patterns:
> • Piece-wise linear trends (I), where the number of the piece-wise linear components is randomly chosen between 2 and 8.
> • ARMA(p, q) (II), where 1 ≤ p, q ≤ 8 and the corresponding coefficients are generated from either a multivariate Gaussian or a uniform, then normalized.
> • Seasonal patterns. In particular we create the sine (III) and the cosine (IV) waves of different random periods between 4 and max context length / 2 time-points and time delays.
If there were no such underlying patterns in the class of all time-series data, then even the idea of traditional time-series models would be fundamentally misplaced.
And since this is a transformer model, it also looks for patterns in the problem-specific input data at inference time, just like how the input context to an LLM influences its output's relevance.
A ton of (unsophisticated) advertisers would just draw a line from zero to the number they are at today and project that line to the end of the month to forecast the amount of conversions/spend they were going to hit. This of course doesn't take into account various seasonalities (day-of-week, time-of-year, etc.) and gives you a pretty poor forecast. Compared to those, time-series forecasting is much more accurate.
Is it perfectly accurate? No, that's impossible. But when you can train a model on all advertising campaigns, you can give good 95% confidence intervals.
So, predict sign (branch predictors in modern CPUs also use neural networks of sorts), exponent (most probably it changes slowly) and then predict mantissa using Benford's law.
- decomposition: discover a more general form of Fourrier transform to untangle the underlying factors
- memorization: some patterns are recurrent in many domains such as power low
- multitask: exploit cross-domain connections such as weather vs electricity
> How can the same model predict egg prices in Italy, and global inflation in a reliable way?
How can the same lossy compression algorithm (eg JPG) compress pictures of everything in a reliable way?
Or just search for the James-Stein paradox.
> predict egg prices in Italy, and global inflation in a reliable way?
Easy, both go up.
The problem isn’t domain generalization, it’s that we keep pretending these models have any notion of what the data means.
People ask how one model can understand everything, but that assumes there’s any understanding involved at all.
At some point you have to ask: how much of “forecasting” is actually anything more than curve fitting with better marketing?
My guess as to how this would work: the machine will first guess from the data alone if this is one of the categories it has already seen/inferred (share prices, google trend cat searches etc.) Then it'll output a plausible completion for the category.
That doesn't seem as if it will work well for any categories outside the training data. I would rather just use either a simple model (ARIMA or whatever) or a theoretically-informed model. But what do I know.
That is, can it use one time series at time X to predict another time series at time X?
Or is this strictly about finding patterns WITHIN a time series.
I always had difficulties with ML and time series, I'll need to try that out.
There is infinitely more entropy in the real world out there than any model can even remotely capture.
The world is not minecraft.