Regularization and Bagging: First Techniques in HFT Modelling
Stabilizing linear alphas in low signal-to-noise microstructure data
Building intuition for regularization and bagging
High-frequency trading operates in one of the noisiest data regimes in finance. Order-book alphas built from microsecond-level imbalances, weighted mid-price changes, and queue dynamics carry an extremely low signal-to-noise ratio. A single linear model fitted on historical data tends to overfit noise rather than capture persistent patterns. The result is unstable coefficients that look great in-sample and fall apart out-of-sample.
Tier 1 HFT desks deal with this using two practical and complementary techniques: regularization and bagging (bootstrap aggregating). Both target the dominant source of error in live trading, variance while preserving the underlying signal. This article explains exactly how each method works, why it succeeds where plain OLS falls short, and how the two approaches complement each other.
The Core Problem: Low Signal-to-Noise and Model Instability
Say I have three noisy alphas extracted from the order book: x_1, x_2, x_3 The standard thing to do is to fit a linear model of the form:
Coefficients are usually found via Ordinary Least Squares (OLS), by minimizing the sum of squared errors:
This result is called the Gauss-Markov Theorem. In the idealized Gauss-Markov world (zero-mean errors, constant variance, uncorrelated errors), OLS gives the best linear unbiased estimator. In live HFT, those assumptions break quickly. We get non-stationarity, fat tails, and highly correlated signals. Small changes in the training window can swing the b_1 dramatically, producing large out-of-sample variance.
Regularization: Trading A Bit Of Bias For Stability
The first practical fix is regularization. Instead of minimizing only the sum of squared errors, we add a “penalty” for large values of b_1
Where “a” controls how much we penalize large b_1
We couldn’t have used this model under the original Gauss-Markov framing, because it isn’t unbiased. But that’s the trade we want to make. A small increase in bias buys a large reduction in variance, which is exactly what low signal-to-noise regimes demand. This is called regularization, and it matters even more once we scale beyond toy linear models
Bagging: Bootstrap Aggregation for Further Variance Reduction
Regularization handles variance by constraining the model. Bagging does it through a different angle, through ensemble averaging.
A particular problem in HFT is the low signal-to-noise ratio. Models are prone to fit to noise rather than patterns. What if we want to focus on a stable, consistent prediction in live trading, rather than just maximizing in-sample fit?
One idea would be to partition the historical data and fit a separate model on each partition, then combine. The problem is that each model will likely be much worse than the original, since it doesn’t have enough points to produce a good fit. We need more data.
The ‘trick’ here is to realize that we can effectively create more by randomly sampling (with replacement) from the historical data we have. This approximates what we were doing when we collected the data in the first place: randomly sampling the ‘real world. This has caveats but you basically end with a load of fitted models.
….
And the final prediction is the average. The key thing to understand is that each model has the same average error (bias), but by running the fit many times and averaging, we improve the stability of the process. In other words, we reduce the variance of the errors, just like regularization.
Two Routes To The Same Goal
Regularization and bagging both reduce variance, through different mechanisms:
Regularization constrains the model directly in the optimisation objective.
Bagging averages over many perturbed versions of the data.
They’re complementary. A regularized linear model is an excellent base learner for bagging, and the resulting ensemble is stable and robust to the correlated, non-stationary nature of order-book signals.
More to come.



good piece anay