#### Archive

Category Archives for "Machine Learning Methods"

## Difference in Difference Estimation [Notes]

Difference in Difference estimation is a linear regression methodology used to analyse the effect of some event in time (the treatment) by comparing results over time. The idea is to compare data before and after the treatment. If the treatment was effective then you will find material differences in the outcomes for the target variable, or Y.

Before and After. This is a basic comparison of means for the time periods before and after treatment. For this specification we assume the target (Y) in the post-treatment period is equal to the target (Y) in the pre-treatment period in the absence of treatment. Thus, any change is attributable to the treatment.

Difference in Difference estimation is the natural extension of the before and after analysis is to include a control group for comparison. The difference in difference specification allows us to do this. We can compare Y as we did for the before and after analysis. We can also compare Ys between treated and untreated groups. To complete a difference in difference specification we use two dummy variables that partition the sample into four groups. The first dummy variable, treatment, partitions the sample in two halves based on their treatment status. The second dummy variable, post, partitions the halves in quarters based on the time period. We then interact treatment and post (Post*Treatment); the coefficient on Post*Treatment estimates the statistical difference in Y.

​The Model:

Y = B0 + B1*Post + B2*Treatment + B3*Post*Treatment.

• Y, is the target variable we are interested in estimating.
• Post, is a binary dummy variable indicating whether an observation is in the post treatment period.
• Treatment, is a binary dummy variable indicating whether an received treatment or not.
• B3 is the estimator, the coefficient on Post*Treatment.