discreteNPIV long-term causal inference

Problem Setup

Problem setting

The target setting is a novel experiment where long-term outcomes have not yet been observed, together with historical experiments where both surrogates and long-term outcomes were recorded.

The wrapper supports both single-arm means and treated-versus-control contrasts.

Novel experiment

The novel experiment contributes only short-term surrogates, such as clicks, sessions, conversion proxies, or early retention markers.

Historical experiments

Historical experiments provide surrogate outcomes, long-term outcomes, and randomized arm assignments that shift surrogates exogenously.

Why IV instead of naive surrogate regression

A naive surrogate-index approach regresses long-term outcomes on surrogates in historical data and transports that fit to the novel experiment. That requires no unmeasured confounding between surrogates and long-term outcomes. Here, historical experiment arms are used as instruments to identify the surrogate-to-long-term map under the paper’s assumptions.

Surrogate-To-NPIV Map

How the application is translated into the core NPIV problem

The wrapper translates the surrogate problem into the core NPIV problem by using historical experiment arms as the discrete instrument.

In the core notation, historical surrogate vectors are X_hist, historical long-term outcomes are Y_hist, and the novel surrogate sample is the target sample X_new.

Schematic of the surrogate application mapping to the core NPIV problem — Historical experiments provide the instrument and identify the surrogate-to-long-term map; the novel experiment provides the target surrogate distribution on which that map is evaluated.

Application language

Historical experiment arms -> discrete instrument Z
Historical surrogate vector -> X_hist
Historical long-term outcome -> Y_hist
Novel surrogate sample -> X_new

Wrapper interface

encode_experiment_arms
estimate_long_term_mean_from_surrogates
estimate_long_term_effect_from_surrogates

Encoding And Diagnostics

Past experiments become the instrument through explicit encoding

Encoding determines the discrete instrument used in the NPIV step. The main distinction is between one historical arm per unit and overlapping historical experiments.

Sparse overlap cells are reported explicitly rather than merged automatically.

Single active arm per row

Use mode="single" when each historical unit belongs to exactly one experiment arm.
Either pass globally unique arm keys like pricing_test:treatment or pass local arm labels together with experiment_ids.
Do not pass bare labels such as control or treatment without telling the package which experiment they belong to.

Overlapping historical experiments

Use mode="overlap" when a historical unit can be exposed to multiple concurrent experiments.
The full active set of experiment-arm keys is encoded as one categorical instrument level.
The package canonicalizes overlap encodings and always reports support diagnostics through counts, singleton levels, and low-support levels.

Illustrative Example

Estimate a long-term treatment effect from surrogate-only novel data

The example below shows the wrapper interface for estimating a long-term treatment effect from historical experiments and a surrogate-only novel experiment.

The full simulated workflow lives in scripts/reproduce_surrogate_case_study.py.

import numpy as np

from discreteNPIV import estimate_long_term_effect_from_surrogates

rng = np.random.default_rng(12)

historical_experiment_ids = np.repeat(["exp_a", "exp_b", "exp_c"], 80)
historical_arm_labels = np.tile(np.repeat(["control", "treatment"], 40), 3)

arm_shift = {
    "exp_a:control": np.array([0.0, 0.1, -0.1]),
    "exp_a:treatment": np.array([0.4, 0.2, 0.0]),
    "exp_b:control": np.array([-0.1, 0.0, 0.1]),
    "exp_b:treatment": np.array([0.2, 0.3, 0.2]),
    "exp_c:control": np.array([0.1, -0.2, 0.0]),
    "exp_c:treatment": np.array([0.3, 0.1, 0.3]),
}
theta = np.array([0.8, -0.5, 0.4])

surrogates = []
outcomes = []
for experiment_id, arm_label in zip(historical_experiment_ids, historical_arm_labels, strict=True):
    key = f"{experiment_id}:{arm_label}"
    confounder = rng.normal(scale=0.35)
    x = arm_shift[key] + confounder + rng.normal(scale=0.2, size=3)
    y = x @ theta + 0.5 * confounder + rng.normal(scale=0.2)
    surrogates.append(x)
    outcomes.append(y)

X_hist = np.asarray(surrogates)
Y_hist = np.asarray(outcomes)

X_new_control = rng.normal(loc=[0.05, 0.0, 0.0], scale=0.2, size=(500, 3))
X_new_treated = rng.normal(loc=[0.25, 0.15, 0.1], scale=0.2, size=(500, 3))

effect = estimate_long_term_effect_from_surrogates(
    X_hist=X_hist,
    Y_hist=Y_hist,
    historical_arms=historical_arm_labels,
    historical_experiment_ids=historical_experiment_ids,
    X_new_treated=X_new_treated,
    X_new_control=X_new_control,
    n_splits=2,
    random_state=12,
)

print(effect.selected.estimate)
print(effect.selected.ci_lower, effect.selected.ci_upper)
print(effect.selected.method_name)

Mean output

For a single novel arm, effect.treated_mean.selected or effect.control_mean.selected gives the estimated long-term mean.

Effect output

effect.selected is the estimated long-term treatment effect with a standard error and confidence interval.

Encoding output

effect.encoding records how the historical experiments were encoded and whether some categories are sparse.

Assumptions

Conditions for a credible causal interpretation

A causal interpretation requires the historical experiments to generate relevant exogenous variation in surrogates and the novel target to be supported by that historical variation.

The estimator can be computed outside this regime, but the causal interpretation then becomes less credible.

Key assumptions

Historical experiment arms provide exogenous shifts in surrogate outcomes.
Those shifts are informative enough to identify the surrogate-to-long-term map.
The novel experiment contributes the target surrogate distribution.
The target surrogate distribution is supported by the historical experiment-induced variation.

Practical interpretation

The method does not assume that surrogates are unconfounded with long-term outcomes in historical data.
Instead it uses historical experiment arms as instruments to identify the surrogate-to-long-term relationship.
Credibility depends on historical experiments generating enough relevant surrogate variation for the novel target.

References And Navigation

Related resources

This page covers the surrogate application layer. The low-level NPIV interface remains on the main package page.

Package overview Paper GitHub repository Encoding diagnostics

The content here is drawn from the current repository sources, especially README.md, docs/long_term_surrogate_case_study.md, docs/experiment_encoding.md, and docs/loo_jackknife.md.
For the low-level NPIV notation map, see the package overview page and the notation note in the repository.