Research Package

Discrete NPIV inference with many weak instruments.

discreteNPIV is a research-oriented Python package for discrete nonparametric instrumental variable inference in the many-weak-instruments regime. It accompanies the paper Nonparametric Instrumental Variable Inference with Many Weak Instruments and is organized around two clear entry paths: core discrete NPIV and long-term causal inference with surrogates.

discreteNPIV badge showing X, Z, Y, X_new, and npJIVE
  • Many weak discrete instruments
  • npJIVE and 2SLS side by side
  • Surrogate-based long-term causal inference

Interface Selection

Choosing an entry path

The package separates a low-level NPIV interface from a surrogate application layer. Use the toggle below to choose the route that matches your data.

The homepage stays package-first, while the surrogate application lives on its own page.

Use the core API when the NPIV problem is already explicit.

Use this route when you already have observed covariates or basis features X, a discrete instrument Z, outcomes Y, and a target covariate sample X_new defining the linear functional of interest.

  • fit_structural_nuisance
  • fit_dual_nuisance
  • estimate_average_functional
Structural nuisance Dual / Riesz nuisance Debiased functional inference
See core interface

Install

Editable install and repo entry points

The repository documents local development and reproduction from the checkout. The install command below matches the README.

The repository layout is already documentation-friendly: src/discreteNPIV for the supported package code, docs for notes, scripts for runnable examples, and tests for validation.

Install

python3 -m pip install -e .

This is the documented path for working from the current source tree.

Core reproduction

The repository includes paper-style and validation scripts such as scripts/reproduce_small_paper_experiment.py and scripts/evaluate_npjive_validation.py.

Surrogate case study

The long-term application workflow is illustrated in scripts/reproduce_surrogate_case_study.py and expanded on in the separate long-term page.

Core NPIV

Fit the structural nuisance, fit the dual nuisance, estimate the functional.

At the core level, the package solves three tasks: fit the structural nuisance for the minimum-norm NPIV solution, fit the dual / Riesz nuisance for a target covariate law, and estimate a debiased average linear functional with uncertainty.

The notation in the docs is consistent throughout: X for observed features or basis representation, Z for the discrete instrument, Y for the observed outcome, and X_new for the target covariate sample.

Example

from discreteNPIV import (
    estimate_average_functional,
    fit_dual_nuisance,
    fit_structural_nuisance,
    generate_synthetic_data,
)

data = generate_synthetic_data(
    n_per_instrument=20,
    n_instruments=8,
    n_features=5,
    n_target_samples=4000,
    random_state=7,
)

structural = fit_structural_nuisance(
    data["X"],
    data["Z"],
    data["Y"],
    n_splits=2,
    random_state=7,
)

dual = fit_dual_nuisance(
    data["X"],
    data["Z"],
    data["X_new"],
    n_splits=2,
    random_state=7,
)

result = estimate_average_functional(
    data["X"],
    data["Z"],
    data["Y"],
    data["X_new"],
    n_splits=2,
    random_state=7,
)

print(structural.selected_method)
print(dual.selected_method)
print(result.selected.estimate, result.selected.se)

Main Entry Points

fit_structural_nuisance

Estimates the structural nuisance, that is, the minimum-norm NPIV solution for the structural map represented through X.

fit_dual_nuisance

Estimates the dual or Riesz representer nuisance for the target functional defined by X_new. This is the nuisance needed to debias the plug-in functional.

estimate_average_functional

Combines the structural and dual nuisances to estimate the debiased average functional E[h(X_new)], together with a standard error and confidence interval.

Workflow

The package is organized around nuisance fitting plus a debiased functional estimator.

The core estimator has three pieces: estimate the structural function h, estimate the dual or Riesz nuisance for the target functional, and combine them in a debiased estimator of E[h(X_new)].

Here X_new is the target covariate sample. It defines the average functional of interest, not a new outcome sample.

Workflow diagram for the core NPIV estimator stack
Observed data (X, Z, Y) identify the structural function and the dual nuisance, while X_new defines the target average E[h(X_new)]. These pieces are combined in estimate_average_functional.

Structural fit

The structural fit estimates the minimum-norm NPIV solution for the structural map represented through X. The returned object includes the selected estimator, the npJIVE fit, the 2SLS baseline, and tuning details.

Dual fit

The dual fit estimates the dual or Riesz representer nuisance for the target average defined by X_new. This is the nuisance required for the debiasing step, and it is returned with the same selected / npJIVE / 2SLS structure.

Inference result

The final NPIVInferenceResult reports the debiased estimate of E[h(X_new)] together with standard errors, confidence intervals, and the fitted structural and dual nuisance objects.

Theoretical Framing

Many weak instruments

The package is built for settings where the instrument is discrete and the number of instrument levels can grow with the sample size, while the support available within each instrument level may remain limited.

The motivating example in the README is long-term causal inference: many past experiments, but only a limited number of units in each.

Target functional and interpretation

The core estimand is a linear functional of the minimum-norm NPIV solution, represented in the API by the target average E[h(X_new)]. This keeps the package focused on functionals of the structural map rather than full point identification of that map.

npJIVE is the many-instrument correction

We use jackknife estimation to remove the bias that arises with many weak instruments. The package reports this npJIVE estimate alongside a grouped 2SLS baseline.

Bias correction and tuning

Group-level leave-one-out means are used inside the npJIVE nuisance construction, while regularization parameters are chosen by a stratified K-fold cross-validation routine.

API

Public package surface

The current public exports are intentionally compact: core NPIV estimators, surrogate wrappers, grouped utilities, simulation helpers, and typed result objects.

This site adds documentation, not new package interfaces.

Core estimators

  • fit_structural_nuisance
  • fit_dual_nuisance
  • estimate_average_functional

Surrogate wrappers

  • encode_experiment_arms
  • estimate_long_term_mean_from_surrogates
  • estimate_long_term_effect_from_surrogates

Utilities and results

  • group_means, leave_one_out_group_means, make_stratified_folds
  • generate_synthetic_data
  • NPIVInferenceResult, StructuralFitResult, DualFitResult

References

Related resources

The long-term application has its own page because it is a distinct entry path built on the same core NPIV estimand.

  • The website text is drawn from the current repository sources: README.md, docs/api_reference.md, docs/paper_notation_map.md, docs/experiment_encoding.md, and docs/loo_jackknife.md.
  • For the application layer and design-support discussion, continue to the dedicated long-term page.