DiD regression with imputation method

didImputation(
  y0,
  data,
  cohort,
  nevertreated.value = Inf,
  unit = NULL,
  time = NULL,
  het = NULL,
  w = NULL,
  coef = -Inf:Inf,
  OATT = FALSE,
  with.se = TRUE,
  tol = 1e-06,
  maxit = 100,
  mutatedata = FALSE,
  verbose = 0,
  effective.sample = 30
)

Arguments

y0	Formula. model for Y(0). This is the full model without leads and lags used to predict the counterfactual outcome.
data	A data frame or data.table
cohort	Character. Time of treatment identifier. By default it expect `Inf` for never treated units. You can override this with `nevertreated.value`
nevertreated.value	Any, default is `Inf`. Value encoding the cohort of never treated units.
unit	Character, default is `NULL`. Unit identifier. If `NULL`, it default to the first fixed effect.
time	Character, default is `NULL`. Time period identifier. If `NULL`, it default to the second fixed effect.
het	Character, default is `NULL`. Heterogeneity estimation. Column name of the additional group(s). Will produce an estimate for each group.
w	Numerical vector or Variable name. Default is `NULL`. Sample weights.
coef	R Expression of leads and lags (eg `coef = -5:8`). By default estimates whole dynamic effect. See Details.
OATT	Logical, default is `FALSE`. If `TRUE` then the overall average treatment effect is computed instead of ATT at each horizon.
with.se	Logical, default is `TRUE`. Should standard errors be reported
tol	Numeric, default is `1e-6`. Tolerance level for weights convergence.
maxit	Numeric, default is `100`. Maximum number of iterations for computing weights
mutatedata	Logical, default is `FALSE`. USE AT YOUR OWN RISK. This option will modify your data in place instead of making a copy of it which could be useful with large dataset if your RAM is limited. It may result in lost observations and bugs.
verbose	Numeric, default is `0`. Level of verbosity.
effective.sample	Numeric, default is `30`. Effective sample size under which the function will throw a warning. See details.

Value

A didImputation object with the results of the imputation estimation.

data

Data used for estimation.

convergence_time

User and CPU time for weights convergence.

pvalue

p-value for the positive horizon estimates.

coeftable

Table of regression coefficients.

wald

Wald statistic of the pre-trend regression.

coefs

Average treatment effects from the imputation procedure.

niterations

Number of iterations it took to compute the weights.

pre_trends

fixest regression object for the pre-trends estimation.

Details

See below for additional details on some arguments.

Estimated coefficients

By default, the function will estimate all coefficients available. You can customize this behavior with the coef option. The option takes an R expression of the form -{leads}:{lags}. The default behavior is to estimate all available coefficients such that coef = -Inf:Inf.

For pre-trend coefficients, by default the function will set the greatest leads as the reference group if -Inf is set.

Effective sample a.k.a. Herfindahl condition

Borusyak, K., Jaravel, X., & Spiess, J. give a condition on weights such that consistency of estimators holds. This condition states that the Herfindahl index of weights must converge to zero. Another interpretation is that the inverse of the index is a measure of 'effective sample size'. Authors recommand an effective sample size of at least 30. If the effective sample size is lower, a warning will be thrown.

Cohort

The cohort argument is the date of treatment of the observation. By default it expect Inf for never treated individuals. You can override this behavior by setting another value to nevertreated.value.

Standard-errors

The standard errors are computed using an alternating projection method. You can tweek its meta-parameter by changing tol and maxit.

References

Borusyak, K., Jaravel, X., & Spiess, J. (2021). Revisiting event study designs: Robust and efficient estimation. Working paper.

Stata implementation from the authors:

did_imputation (https://github.com/borusyak/did_imputation)

Another R implementation using sparse matrix inversion: didimputation (https://github.com/kylebutts/didimputation)

Author

Antoine Mayerowitz

Maxime Gravoueille

Examples

#Load example data
data(did_simulated)

# Estimate the overall average treatment effect on treated and all available pre-trends
didImputation(y ~ 0 | i + t,
              cohort = 'g',
              OATT = TRUE,
              data = did_simulated)
#> Event Study: imputation method. Dep. Var.:  y 
#> Counterfactual model:  y ~ 0 | i + t 
#> Observations: 1500 
#>         Estimate Std. Error    t value      Pr(>|t|)
#> k::-4 0.05837981  0.2279555  0.2561018  7.980837e-01
#> k::-3 0.12351493  0.2384032  0.5180927  6.048537e-01
#> k::-2 0.20413885  0.2568542  0.7947654  4.275069e-01
#> k::-1 0.05923100  0.2842246  0.2083950  8.350909e-01
#> k::0  2.32785342  0.1031787 22.5613832 1.038251e-112
#>  ...... 0  rows not shown.

# Estimate the full dynamic model
didImputation(y ~ 0 | i + t,
              cohort = 'g',
              data = did_simulated)
#> Event Study: imputation method. Dep. Var.:  y 
#> Counterfactual model:  y ~ 0 | i + t 
#> Observations: 1500 
#>      Estimate Std. Error  t value     Pr(>|t|)
#> k::0 0.973520 0.09719356 10.01630 1.292486e-23
#> k::1 2.085544 0.10992597 18.97226 2.892152e-80
#> k::2 2.990625 0.14310200 20.89856 5.518751e-97
#> k::3 3.980677 0.18913808 21.04641 2.466738e-98
#> k::4 4.774795 0.25861218 18.46315 4.087941e-76
#>  ...... 4  rows not shown.

# Estimate positive (lags) coefficients
didImputation(y ~ 0 | i + t,
              cohort = 'g',
              coef = 0:Inf,
              data = did_simulated)
#> Event Study: imputation method. Dep. Var.:  y 
#> Counterfactual model:  y ~ 0 | i + t 
#> Observations: 1500 
#>      Estimate Std. Error  t value     Pr(>|t|)
#> k::0 0.973520 0.09719356 10.01630 1.292486e-23
#> k::1 2.085544 0.10992597 18.97226 2.892152e-80
#> k::2 2.990625 0.14310200 20.89856 5.518751e-97
#> k::3 3.980677 0.18913808 21.04641 2.466738e-98
#> k::4 4.774795 0.25861218 18.46315 4.087941e-76
#>  ...... 0  rows not shown.

# Estimate first 3 post treatment coefficients
didImputation(y ~ 0 | i + t,
              cohort = 'g',
              coef = 0:2,
              data = did_simulated)
#> Event Study: imputation method. Dep. Var.:  y 
#> Counterfactual model:  y ~ 0 | i + t 
#> Observations: 1500 
#>      Estimate Std. Error  t value     Pr(>|t|)
#> k::0 0.973520 0.09719356 10.01630 1.292486e-23
#> k::1 2.085544 0.10992597 18.97226 2.892152e-80
#> k::2 2.990625 0.14310200 20.89856 5.518751e-97
#>  ...... -2  rows not shown.

# Return only point estimates
didImputation(y ~ 0 | i + t,
              cohort = 'g',
              with.se = FALSE,
              data = did_simulated)
#> Event Study: imputation method. Dep. Var.:  y 
#> Counterfactual model:  y ~ 0 | i + t 
#> Observations: 1500 
#>      Estimate Std. Error t value Pr(>|t|)
#> k::0 0.973520         NA      NA       NA
#> k::1 2.085544         NA      NA       NA
#> k::2 2.990625         NA      NA       NA
#> k::3 3.980677         NA      NA       NA
#> k::4 4.774795         NA      NA       NA
#>  ...... 4  rows not shown.