DiD regression with imputation method

didImputation(
  y0,
  data,
  cohort,
  nevertreated.value = Inf,
  unit = NULL,
  time = NULL,
  het = NULL,
  w = NULL,
  coef = -Inf:Inf,
  OATT = FALSE,
  with.se = TRUE,
  tol = 1e-06,
  maxit = 100,
  mutatedata = FALSE,
  verbose = 0,
  effective.sample = 30
)

Arguments

y0

Formula. model for Y(0). This is the full model without leads and lags used to predict the counterfactual outcome.

data

A data frame or data.table

cohort

Character. Time of treatment identifier. By default it expect Inf for never treated units. You can override this with nevertreated.value

nevertreated.value

Any, default is Inf. Value encoding the cohort of never treated units.

unit

Character, default is NULL. Unit identifier. If NULL, it default to the first fixed effect.

time

Character, default is NULL. Time period identifier. If NULL, it default to the second fixed effect.

het

Character, default is NULL. Heterogeneity estimation. Column name of the additional group(s). Will produce an estimate for each group.

w

Numerical vector or Variable name. Default is NULL. Sample weights.

coef

R Expression of leads and lags (eg coef = -5:8). By default estimates whole dynamic effect. See Details.

OATT

Logical, default is FALSE. If TRUE then the overall average treatment effect is computed instead of ATT at each horizon.

with.se

Logical, default is TRUE. Should standard errors be reported

tol

Numeric, default is 1e-6. Tolerance level for weights convergence.

maxit

Numeric, default is 100. Maximum number of iterations for computing weights

mutatedata

Logical, default is FALSE. USE AT YOUR OWN RISK. This option will modify your data in place instead of making a copy of it which could be useful with large dataset if your RAM is limited. It may result in lost observations and bugs.

verbose

Numeric, default is 0. Level of verbosity.

effective.sample

Numeric, default is 30. Effective sample size under which the function will throw a warning. See details.

Value

A didImputation object with the results of the imputation estimation.

data

Data used for estimation.

convergence_time

User and CPU time for weights convergence.

pvalue

p-value for the positive horizon estimates.

coeftable

Table of regression coefficients.

wald

Wald statistic of the pre-trend regression.

coefs

Average treatment effects from the imputation procedure.

niterations

Number of iterations it took to compute the weights.

pre_trends

fixest regression object for the pre-trends estimation.

Details

See below for additional details on some arguments.

Estimated coefficients

By default, the function will estimate all coefficients available. You can customize this behavior with the coef option. The option takes an R expression of the form -{leads}:{lags}. The default behavior is to estimate all available coefficients such that coef = -Inf:Inf.

For pre-trend coefficients, by default the function will set the greatest leads as the reference group if -Inf is set.

Effective sample a.k.a. Herfindahl condition

Borusyak, K., Jaravel, X., & Spiess, J. give a condition on weights such that consistency of estimators holds. This condition states that the Herfindahl index of weights must converge to zero. Another interpretation is that the inverse of the index is a measure of 'effective sample size'. Authors recommand an effective sample size of at least 30. If the effective sample size is lower, a warning will be thrown.

Cohort

The cohort argument is the date of treatment of the observation. By default it expect Inf for never treated individuals. You can override this behavior by setting another value to nevertreated.value.

Standard-errors

The standard errors are computed using an alternating projection method. You can tweek its meta-parameter by changing tol and maxit.

References

Borusyak, K., Jaravel, X., & Spiess, J. (2021). Revisiting event study designs: Robust and efficient estimation. Working paper.

Stata implementation from the authors:

did_imputation (https://github.com/borusyak/did_imputation)

Another R implementation using sparse matrix inversion: didimputation (https://github.com/kylebutts/didimputation)

Author

Antoine Mayerowitz

Maxime Gravoueille

Examples

#Load example data
data(did_simulated)

# Estimate the overall average treatment effect on treated and all available pre-trends
didImputation(y ~ 0 | i + t,
              cohort = 'g',
              OATT = TRUE,
              data = did_simulated)
#> Event Study: imputation method. Dep. Var.:  y 
#> Counterfactual model:  y ~ 0 | i + t 
#> Observations: 1500 
#>         Estimate Std. Error    t value      Pr(>|t|)
#> k::-4 0.05837981  0.2279555  0.2561018  7.980837e-01
#> k::-3 0.12351493  0.2384032  0.5180927  6.048537e-01
#> k::-2 0.20413885  0.2568542  0.7947654  4.275069e-01
#> k::-1 0.05923100  0.2842246  0.2083950  8.350909e-01
#> k::0  2.32785342  0.1031787 22.5613832 1.038251e-112
#>  ...... 0  rows not shown.

# Estimate the full dynamic model
didImputation(y ~ 0 | i + t,
              cohort = 'g',
              data = did_simulated)
#> Event Study: imputation method. Dep. Var.:  y 
#> Counterfactual model:  y ~ 0 | i + t 
#> Observations: 1500 
#>      Estimate Std. Error  t value     Pr(>|t|)
#> k::0 0.973520 0.09719356 10.01630 1.292486e-23
#> k::1 2.085544 0.10992597 18.97226 2.892152e-80
#> k::2 2.990625 0.14310200 20.89856 5.518751e-97
#> k::3 3.980677 0.18913808 21.04641 2.466738e-98
#> k::4 4.774795 0.25861218 18.46315 4.087941e-76
#>  ...... 4  rows not shown.

# Estimate positive (lags) coefficients
didImputation(y ~ 0 | i + t,
              cohort = 'g',
              coef = 0:Inf,
              data = did_simulated)
#> Event Study: imputation method. Dep. Var.:  y 
#> Counterfactual model:  y ~ 0 | i + t 
#> Observations: 1500 
#>      Estimate Std. Error  t value     Pr(>|t|)
#> k::0 0.973520 0.09719356 10.01630 1.292486e-23
#> k::1 2.085544 0.10992597 18.97226 2.892152e-80
#> k::2 2.990625 0.14310200 20.89856 5.518751e-97
#> k::3 3.980677 0.18913808 21.04641 2.466738e-98
#> k::4 4.774795 0.25861218 18.46315 4.087941e-76
#>  ...... 0  rows not shown.

# Estimate first 3 post treatment coefficients
didImputation(y ~ 0 | i + t,
              cohort = 'g',
              coef = 0:2,
              data = did_simulated)
#> Event Study: imputation method. Dep. Var.:  y 
#> Counterfactual model:  y ~ 0 | i + t 
#> Observations: 1500 
#>      Estimate Std. Error  t value     Pr(>|t|)
#> k::0 0.973520 0.09719356 10.01630 1.292486e-23
#> k::1 2.085544 0.10992597 18.97226 2.892152e-80
#> k::2 2.990625 0.14310200 20.89856 5.518751e-97
#>  ...... -2  rows not shown.

# Return only point estimates
didImputation(y ~ 0 | i + t,
              cohort = 'g',
              with.se = FALSE,
              data = did_simulated)
#> Event Study: imputation method. Dep. Var.:  y 
#> Counterfactual model:  y ~ 0 | i + t 
#> Observations: 1500 
#>      Estimate Std. Error t value Pr(>|t|)
#> k::0 0.973520         NA      NA       NA
#> k::1 2.085544         NA      NA       NA
#> k::2 2.990625         NA      NA       NA
#> k::3 3.980677         NA      NA       NA
#> k::4 4.774795         NA      NA       NA
#>  ...... 4  rows not shown.