Work in progress. This primer is still being written.
β ITSJUSTBETA.COM

Part 04 / 16

Types of Factor Model

Every factor model must produce the same four ingredients: exposures XX, factor returns ff, a factor covariance FF, and specific risk Δ\Delta. FF and Δ\Delta are estimated from data in all three families, so what separates the families is which of XX and ff is observed and which is estimated. Chapter 1 previewed that split; this chapter works through each family in turn, then the hybrids (§4.4) that combine them. The choice determines the rest: the data each family needs, its reaction speed, and where it’s useful.

4.1 Time-series/Macroeconomic models

The idea: Pick factors that are directly observable as a time series: the market’s excess return, surprises in industrial production and inflation, shifts and twists of the yield curve, credit spread changes, oil prices. The unknowns are each stock’s sensitivities, estimated by a time-series regression per stock over some period of time:

rit=αi+βift+ϵit,t=1,,T,r_{it} = \alpha_i + \beta_i^\top f_t + \epsilon_{it}, \qquad t = 1, \dots, T,

so stock ii‘s exposure vector is β^i=(tf~tf~t)1tf~tr~it\hat{\beta}_i = \left(\textstyle\sum_t \tilde f_t \tilde f_t^\top\right)^{-1} \textstyle\sum_t \tilde f_t \tilde r_{it} (OLS on demeaned variables f~,r~\tilde f, \tilde r). With factor returns observed, the factor covariance FF is just the sample covariance of the ftf_t series, and Δ\Delta comes from the regression residuals ϵit\epsilon_{it}.

Estimation details that matter.

  • Window choice: a rolling window (commonly 60 months) keeps betas current but noisy. An expanding window (all history to date) is the most stable but the slowest to react. Either way, longer windows trade noise for staleness. Exponential weighting (recent observations count more, with some chosen half-life) is the standard compromise.
  • Standard errors: with 60 monthly observations and one regressor, the standard error of a beta near 1 is typically 0.15–0.25, wide. Beta estimates are routinely shrunk toward 1 (e.g., Blume or Vasicek adjustment: β^adj=λβ^+(1λ)1\hat\beta^{\text{adj}} = \lambda \hat\beta + (1-\lambda)\cdot 1, with λ\lambda stock-specific in Vasicek’s precision-weighted version) to combat this noise.
  • Macro surprises, not levels: for macro factors only the unexpected component should be priced in the period’s return, so inputs are innovations from a forecasting model (e.g., changes vs. consensus), not raw levels. Estimating that expected component is itself a modeling problem, and errors in it feed straight into the factor returns.

Strengths: Factors are economically meaningful by construction. They directly answer questions like “what is my portfolio’s sensitivity to rates?”. Minimal data is required beyond returns and the factor series.

Weaknesses, and why this family lost the risk-model market:

  • Stale exposures. A company that doubles its leverage today still carries the beta of its old self for years until the regression window catches up. Measured characteristics (the fundamental model family) update immediately.
  • No history, no model. IPOs and recent listings cannot be regressed. They need proxies.
  • Low explanatory power. Macro factors explain single-stock returns poorly: the per-stock time-series regression typically yields an R2R^2 well under 20%. Too much ends up in idiosyncratic risk, violating the assumption that Δ\Delta is diagonal.
  • Errors-in-variables. Estimated β^\hat\beta‘s carry estimation noise into every downstream use.

Macro models survive in some areas: asset allocation, macro scenario analysis, and as overlays answering rate/oil/inflation sensitivity questions that characteristic-based factors do not address directly.

4.2 Cross-sectional/Fundamental models

The idea: Flip what is known. Exposures are measured from observable characteristics, fresh every period (see Chapter 3). The unknowns are the factor returns, recovered each period by a cross-sectional regression across stocks:

rt=Xt1ft+ϵt  WLS across i  f^t.r_t = X_{t-1} f_t + \epsilon_t \quad \xrightarrow{\;\text{WLS across } i\;} \quad \hat f_t.

One regression per period, giving a time series {f^t}\{\hat f_t\} from which FF is estimated, with residuals feeding Δ\Delta. Chapter 6 is devoted to this regression. Chapter 7 focuses on the fact that each f^kt\hat f_{kt} is the return of an investable long–short portfolio.

Exposures-from-characteristics react faster: The leverage example again: the company that doubles its debt sees its leverage descriptor jump at the next data update, so its risk profile updates in days. The momentum exposure of a stock with a price jump updates mechanically with the price. No regression window to wait out. Measured exposures also bring high explanatory power (cross-sectional R2R^2 of 20–40% per month for single stocks, far higher for portfolios) and cover new listings from day one (an IPO has characteristics immediately). That R2R^2 is measured across stocks within one period, so it is not comparable to the per-stock time-series R2R^2 in §4.1. The two answer different questions.

That combination is why every major commercial risk model, MSCI Barra and SimCorp Axioma among them, is built this way, and why this primer’s construction chapters (5-8) follow the architecture.

Costs: Heavy data infrastructure (point-in-time fundamentals, classifications, corporate actions, see Chapter 16). A judgment-laden factor definition process (which characteristics, which descriptor recipes, see Chapters 3 and 15). And the model only knows about the characteristics it was given. A common driver not represented in XX leaks into residuals (the detection-and-repair loop of Chapter 15).

The MiniModel is this type. Exposures were measured in Chapter 3. Factor returns get estimated in Chapter 6.

4.3 Statistical models

The idea: Let the returns data choose the factors. No characteristics, no chosen series. Extract the directions of greatest common variation directly from the T×NT \times N panel of returns.

PCA mechanics in brief: Form the sample covariance Σ^\hat\Sigma of returns. Eigendecompose Σ^=QΛQ\hat\Sigma = Q \Lambda Q^\top with eigenvalues λ1λ2\lambda_1 \ge \lambda_2 \ge \dots and orthonormal eigenvectors q1,q2,q_1, q_2, \dots. Take the top KK eigenvectors as the exposure matrix X=[q1,,qK]X = [q_1, \dots, q_K]. Factor returns are the projections ft=Xrtf_t = X^\top r_t (the eigenvectors are orthonormal, XX=IKX^\top X = I_K, so the projection needs no (XX)1(X^\top X)^{-1} term). The truncated reconstruction XΛKX+diag(residual variances)X \Lambda_K X^\top + \text{diag(residual variances)} is the factor risk model. The first principal component of an equity universe is always recognizably “the market” (all-positive weights). The next few often resemble size, rate-sensitivity, or large industry blocks, but nothing guarantees it. The eigendecomposition is derived in the appendix.

How many factors? The central choice in a statistical model. There are three standard tools:

  • Scree plot: keep components before the eigenvalue spectrum flattens.
  • Random matrix theory: for a panel with NN assets and TT observations of pure noise, the Marchenko–Pastur law says sample eigenvalues fall (asymptotically) inside [(1N/T)2,(1+N/T)2]×σ2\left[(1-\sqrt{N/T})^2, (1+\sqrt{N/T})^2\right] \times \sigma^2. Eigenvalues above the upper edge are evidence of genuine common structure. Keep those. With N=3000,T=500N = 3000, T = 500, the ratio N/T=6N/T = 6 puts the nonzero noise bulk in roughly [(16)2,(1+6)2]σ2[2.1,11.9]σ2[(1-\sqrt 6)^2, (1+\sqrt 6)^2]\,\sigma^2 \approx [2.1, 11.9]\,\sigma^2, so a sample eigenvalue has to clear about 11.9σ211.9\,\sigma^2 to count as signal, and most apparent “factors” in a sample covariance are artifacts. (Worse, with N>TN > T the sample covariance is singular: only T=500T = 500 of its eigenvalues are nonzero and the other NT=2500N - T = 2500 are exactly zero, the same rank deficiency Chapter 1 flagged. The law describes the nonzero bulk.) The law gives a principled cutoff for how many components to keep.
  • Asymptotic PCA (Connor–Korajczyk): when NTN \gg T, eigendecompose the T×TT \times T cross-product matrix instead: same factors, vastly cheaper, and statistically valid in the large-NN limit.

The interpretability problem: Eigenvectors are only identified up to rotation. Any orthogonal rotation of the factors fits identically. So statistical factors have no stable names: “factor 7” this month may be a blend of last month’s factors 6 and 9. Practitioners regress statistical factors on named factors to label them, but the labels are approximations. This is the family’s defining trade-off: it captures whatever is in the data (including drivers nobody has named yet, and fast-moving crisis structure) at the price of explaining nothing to a portfolio manager.

Where they are used: Short-horizon risk for statistical arbitrage, where reacting to current correlation structure beats interpreting it. The bigger use is as a diagnostic: if PCA on a fundamental model’s residuals finds a large common component, the fundamental model is missing a factor (Chapter 15).

4.4 Hybrid models

The most common hybrid models: a fundamental core, plus a small number of statistical factors extracted from the fundamental model’s residuals, sweeping up systematic risk the named factors miss. Vendors sell exactly this as “fundamental + statistical” variants. Other combinations: macro factors regression-mapped onto a fundamental model’s factor space, giving a macro lens on a fundamental engine, or characteristic-based exposures with PCA-cleaned covariance. The price is always interpretability at the margin. The benefit is robustness to the unknown-missing-factor problem.

4.5 Comparison

So far the families have been split by which ingredient is observed. Here they are side by side on the properties that actually drive the choice:

PropertyTime-series (macro)Cross-sectional (fundamental)Statistical
What’s knownfactor seriesexposuresnothing
What’s estimatedexposures, FF, Δ\Deltafactor returns, FF, Δ\Deltaeverything (PCA)
Main data needfactor series + returnspoint-in-time fundamentals + returnsreturns panel only
Reaction speedslow (stale betas)fast (tracks data and price)fast (re-estimated each window)
Explanatory powerlow (per-stock TS R2R^2 < 20%)high (cross-sectional R2R^2 20–40%/mo)high in-sample by construction
New listingsneed history, must proxycovered on day oneneed some history
Interpretabilityhigh (named macro factors)high (named style/industry factors)low (rotation, unstable names)
Typical usemacro scenario, rate/oil sensitivityrisk, attribution, constructionstat-arb, residual diagnostics

The data-need row is the quiet decider in practice: a fundamental model is only as good as the point-in-time fundamentals and classifications behind it (Chapter 16), while a statistical model needs nothing but a returns panel, which is why it is the fallback when fundamentals are thin or absent.

4.6 Choosing a model type

Use caseBest fitWhy
Institutional risk reporting, client communicationFundamentalInterpretable factors. Responsive exposures. Broad coverage
Performance attributionFundamentalContributions must be explainable in investment-process language
Optimized portfolio constructionFundamental (often + statistical)Need interpretable constraints and no missed common risk
Stat-arb / short-horizon riskStatisticalReacts fastest and interpretability is irrelevant
Macro scenario analysis, multi-assetMacro / time-seriesFactors are the very objects being stressed
Checking a fundamental model for blind spotsStatistical on residualsFinds structure without needing to name it

The pragmatic summary: fundamental for anything a human must read, statistical for anything only a machine must use, macro when the question is itself macroeconomic, and hybrids when the stakes justify the complexity.

4.7 Summary

  • Every factor model produces the same four ingredients, XX, ff, FF, and Δ\Delta. The families differ only in which are observed and which are estimated.
  • Time-series (macro): observable factor series, exposures regressed per stock. Interpretable, but slow to react, weak on single stocks, and silent on names without return history.
  • Cross-sectional (fundamental): exposures measured from characteristics, factor returns regressed across stocks each period. Fast, high explanatory power, broad coverage, paid for with point-in-time data infrastructure. This primer’s subject.
  • Statistical (PCA): both estimated from the returns panel alone. Captures unnamed structure at the cost of interpretability. Most useful as a diagnostic on a fundamental model’s residuals.
  • Hybrids: a fundamental core plus statistical factors on its residuals (the common variant), trading a little interpretability for robustness to factors no one has named.