Types of Factor Model · It's Just Beta

Every factor model must produce the same four ingredients: exposures $X$ , factor returns $f$ , a factor covariance $F$ , and specific risk $\Delta$ . $F$ and $\Delta$ are estimated from data in all three families, so what separates the families is which of $X$ and $f$ is observed and which is estimated. Chapter 1 previewed that split; this chapter works through each family in turn, then the hybrids (§4.4) that combine them. The choice determines the rest: the data each family needs, its reaction speed, and where it’s useful.

4.1 Time-series/Macroeconomic models

The idea: Pick factors that are directly observable as a time series: the market’s excess return, surprises in industrial production and inflation, shifts and twists of the yield curve, credit spread changes, oil prices. The unknowns are each stock’s sensitivities, estimated by a time-series regression per stock over some period of time:

$r_{it} = \alpha_i + \beta_i^\top f_t + \epsilon_{it}, \qquad t = 1, \dots, T,$

so stock $i$ ‘s exposure vector is $\hat{\beta}_i = \left(\textstyle\sum_t \tilde f_t \tilde f_t^\top\right)^{-1} \textstyle\sum_t \tilde f_t \tilde r_{it}$ (OLS on demeaned variables $\tilde f, \tilde r$ ). With factor returns observed, the factor covariance $F$ is just the sample covariance of the $f_t$ series, and $\Delta$ comes from the regression residuals $\epsilon_{it}$ .

Estimation details that matter.

Window choice: a rolling window (commonly 60 months) keeps betas current but noisy. An expanding window (all history to date) is the most stable but the slowest to react. Either way, longer windows trade noise for staleness. Exponential weighting (recent observations count more, with some chosen half-life) is the standard compromise.
Standard errors: with 60 monthly observations and one regressor, the standard error of a beta near 1 is typically 0.15–0.25, wide. Beta estimates are routinely shrunk toward 1 (e.g., Blume or Vasicek adjustment: $\hat\beta^{\text{adj}} = \lambda \hat\beta + (1-\lambda)\cdot 1$ , with $\lambda$ stock-specific in Vasicek’s precision-weighted version) to combat this noise.
Macro surprises, not levels: for macro factors only the unexpected component should be priced in the period’s return, so inputs are innovations from a forecasting model (e.g., changes vs. consensus), not raw levels. Estimating that expected component is itself a modeling problem, and errors in it feed straight into the factor returns.

Strengths: Factors are economically meaningful by construction. They directly answer questions like “what is my portfolio’s sensitivity to rates?”. Minimal data is required beyond returns and the factor series.

Weaknesses, and why this family lost the risk-model market:

Stale exposures. A company that doubles its leverage today still carries the beta of its old self for years until the regression window catches up. Measured characteristics (the fundamental model family) update immediately.
No history, no model. IPOs and recent listings cannot be regressed. They need proxies.
Low explanatory power. Macro factors explain single-stock returns poorly: the per-stock time-series regression typically yields an $R^2$ well under 20%. Too much ends up in idiosyncratic risk, violating the assumption that $\Delta$ is diagonal.
Errors-in-variables. Estimated $\hat\beta$ ‘s carry estimation noise into every downstream use.

Macro models survive in some areas: asset allocation, macro scenario analysis, and as overlays answering rate/oil/inflation sensitivity questions that characteristic-based factors do not address directly.

4.2 Cross-sectional/Fundamental models

The idea: Flip what is known. Exposures are measured from observable characteristics, fresh every period (see Chapter 3). The unknowns are the factor returns, recovered each period by a cross-sectional regression across stocks:

$r_t = X_{t-1} f_t + \epsilon_t \quad \xrightarrow{\;\text{WLS across } i\;} \quad \hat f_t.$

One regression per period, giving a time series $\{\hat f_t\}$ from which $F$ is estimated, with residuals feeding $\Delta$ . Chapter 6 is devoted to this regression. Chapter 7 focuses on the fact that each $\hat f_{kt}$ is the return of an investable long–short portfolio.

Exposures-from-characteristics react faster: The leverage example again: the company that doubles its debt sees its leverage descriptor jump at the next data update, so its risk profile updates in days. The momentum exposure of a stock with a price jump updates mechanically with the price. No regression window to wait out. Measured exposures also bring high explanatory power (cross-sectional $R^2$ of 20–40% per month for single stocks, far higher for portfolios) and cover new listings from day one (an IPO has characteristics immediately). That $R^2$ is measured across stocks within one period, so it is not comparable to the per-stock time-series $R^2$ in §4.1. The two answer different questions.

That combination is why every major commercial risk model, MSCI Barra and SimCorp Axioma among them, is built this way, and why this primer’s construction chapters (5-8) follow the architecture.

Costs: Heavy data infrastructure (point-in-time fundamentals, classifications, corporate actions, see Chapter 16). A judgment-laden factor definition process (which characteristics, which descriptor recipes, see Chapters 3 and 15). And the model only knows about the characteristics it was given. A common driver not represented in $X$ leaks into residuals (the detection-and-repair loop of Chapter 15).

The MiniModel is this type. Exposures were measured in Chapter 3. Factor returns get estimated in Chapter 6.

4.3 Statistical models

The idea: Let the returns data choose the factors. No characteristics, no chosen series. Extract the directions of greatest common variation directly from the $T \times N$ panel of returns.

PCA mechanics in brief: Form the sample covariance $\hat\Sigma$ of returns. Eigendecompose $\hat\Sigma = Q \Lambda Q^\top$ with eigenvalues $\lambda_1 \ge \lambda_2 \ge \dots$ and orthonormal eigenvectors $q_1, q_2, \dots$ . Take the top $K$ eigenvectors as the exposure matrix $X = [q_1, \dots, q_K]$ . Factor returns are the projections $f_t = X^\top r_t$ (the eigenvectors are orthonormal, $X^\top X = I_K$ , so the projection needs no $(X^\top X)^{-1}$ term). The truncated reconstruction $X \Lambda_K X^\top + \text{diag(residual variances)}$ is the factor risk model. The first principal component of an equity universe is always recognizably “the market” (all-positive weights). The next few often resemble size, rate-sensitivity, or large industry blocks, but nothing guarantees it. The eigendecomposition is derived in the appendix.

How many factors? The central choice in a statistical model. There are three standard tools:

Scree plot: keep components before the eigenvalue spectrum flattens.
Random matrix theory: for a panel with $N$ assets and $T$ observations of pure noise, the Marchenko–Pastur law says sample eigenvalues fall (asymptotically) inside $\left[(1-\sqrt{N/T})^2, (1+\sqrt{N/T})^2\right] \times \sigma^2$ . Eigenvalues above the upper edge are evidence of genuine common structure. Keep those. With $N = 3000, T = 500$ , the ratio $N/T = 6$ puts the nonzero noise bulk in roughly $[(1-\sqrt 6)^2, (1+\sqrt 6)^2]\,\sigma^2 \approx [2.1, 11.9]\,\sigma^2$ , so a sample eigenvalue has to clear about $11.9\,\sigma^2$ to count as signal, and most apparent “factors” in a sample covariance are artifacts. (Worse, with $N > T$ the sample covariance is singular: only $T = 500$ of its eigenvalues are nonzero and the other $N - T = 2500$ are exactly zero, the same rank deficiency Chapter 1 flagged. The law describes the nonzero bulk.) The law gives a principled cutoff for how many components to keep.
Asymptotic PCA (Connor–Korajczyk): when $N \gg T$ , eigendecompose the $T \times T$ cross-product matrix instead: same factors, vastly cheaper, and statistically valid in the large- $N$ limit.

The interpretability problem: Eigenvectors are only identified up to rotation. Any orthogonal rotation of the factors fits identically. So statistical factors have no stable names: “factor 7” this month may be a blend of last month’s factors 6 and 9. Practitioners regress statistical factors on named factors to label them, but the labels are approximations. This is the family’s defining trade-off: it captures whatever is in the data (including drivers nobody has named yet, and fast-moving crisis structure) at the price of explaining nothing to a portfolio manager.

Where they are used: Short-horizon risk for statistical arbitrage, where reacting to current correlation structure beats interpreting it. The bigger use is as a diagnostic: if PCA on a fundamental model’s residuals finds a large common component, the fundamental model is missing a factor (Chapter 15).

4.4 Hybrid models

The most common hybrid models: a fundamental core, plus a small number of statistical factors extracted from the fundamental model’s residuals, sweeping up systematic risk the named factors miss. Vendors sell exactly this as “fundamental + statistical” variants. Other combinations: macro factors regression-mapped onto a fundamental model’s factor space, giving a macro lens on a fundamental engine, or characteristic-based exposures with PCA-cleaned covariance. The price is always interpretability at the margin. The benefit is robustness to the unknown-missing-factor problem.

4.5 Comparison

So far the families have been split by which ingredient is observed. Here they are side by side on the properties that actually drive the choice:

Property	Time-series (macro)	Cross-sectional (fundamental)	Statistical
What’s known	factor series	exposures	nothing
What’s estimated	exposures, $F$ , $\Delta$	factor returns, $F$ , $\Delta$	everything (PCA)
Main data need	factor series + returns	point-in-time fundamentals + returns	returns panel only
Reaction speed	slow (stale betas)	fast (tracks data and price)	fast (re-estimated each window)
Explanatory power	low (per-stock TS $R^2$ < 20%)	high (cross-sectional $R^2$ 20–40%/mo)	high in-sample by construction
New listings	need history, must proxy	covered on day one	need some history
Interpretability	high (named macro factors)	high (named style/industry factors)	low (rotation, unstable names)
Typical use	macro scenario, rate/oil sensitivity	risk, attribution, construction	stat-arb, residual diagnostics

The data-need row is the quiet decider in practice: a fundamental model is only as good as the point-in-time fundamentals and classifications behind it (Chapter 16), while a statistical model needs nothing but a returns panel, which is why it is the fallback when fundamentals are thin or absent.

4.6 Choosing a model type

Use case	Best fit	Why
Institutional risk reporting, client communication	Fundamental	Interpretable factors. Responsive exposures. Broad coverage
Performance attribution	Fundamental	Contributions must be explainable in investment-process language
Optimized portfolio construction	Fundamental (often + statistical)	Need interpretable constraints and no missed common risk
Stat-arb / short-horizon risk	Statistical	Reacts fastest and interpretability is irrelevant
Macro scenario analysis, multi-asset	Macro / time-series	Factors are the very objects being stressed
Checking a fundamental model for blind spots	Statistical on residuals	Finds structure without needing to name it

The pragmatic summary: fundamental for anything a human must read, statistical for anything only a machine must use, macro when the question is itself macroeconomic, and hybrids when the stakes justify the complexity.

4.7 Summary

Every factor model produces the same four ingredients, $X$ , $f$ , $F$ , and $\Delta$ . The families differ only in which are observed and which are estimated.
Time-series (macro): observable factor series, exposures regressed per stock. Interpretable, but slow to react, weak on single stocks, and silent on names without return history.
Cross-sectional (fundamental): exposures measured from characteristics, factor returns regressed across stocks each period. Fast, high explanatory power, broad coverage, paid for with point-in-time data infrastructure. This primer’s subject.
Statistical (PCA): both estimated from the returns panel alone. Captures unnamed structure at the cost of interpretability. Most useful as a diagnostic on a fundamental model’s residuals.
Hybrids: a fundamental core plus statistical factors on its residuals (the common variant), trading a little interpretability for robustness to factors no one has named.