Comparison Between Treatment and Control Group: Regression Analysis

RCT
Author

Satish Bajracharya

Published

March 18, 2024

Making comparisons between treatment and control group

While measuring the impact of an intervention or a program, we randomly assign individuals to treatment and control groups and compare the average outcome in each group. In general, we ask if the outcomes are different between the groups.

Note

Researchers use regression analysis to report the results of a RCT in scientific journals or reports.

Regression equation: \(y_i= \alpha + \beta T_i + \epsilon_i\). Here, the outcome variable is denoted by \(y\), error term is deonted by \(\epsilon_i\), and treatment variable is denoted by \(T_i\). If an individual is assigned to the treatment group, \(T = 1\), otherwise \(T = 0\).

A regression not just allows us to quantify the relationship between the treatment and outcome variable but also quantify the associations between outcomes and other variables. When we introduce other variables, we say that we are controlling for other factors.

\[y_i = \alpha + \beta T_i + \gamma_1 Income_i + \gamma_2 Education_i + ... \]

Here, we are controlling for income and education in regression equation. Introducing such controls may or may not be useful (a separate blog on control variables will be uploaded later).

Measuring the average impact of an intervention?

To measure the average impact of an intervention, we estimate the average outcome in the treatment group, \(\bar y (T=1)\), and compare it with the average outcome in the control group, \(\bar y (T=0)\).

Computing average outcome for the control group and treatment group

Lets compute the average outcome for the treatment and control group. For the treatment group, the treatment variable will take on the value of one.

\[ \bar y(1) = \alpha + \beta * 1 = \alpha + \beta \] The average outcome of the treatment group is the sum of coefficients \(\alpha\) and \(\beta\)

For the control group, the treatment variable will take on the value of zero.

\[\bar y(0) = \alpha + \beta * 0 = \alpha\]

The average outcome of the control group is \(\alpha\), which is also the intercept term.

We can calculate the average impact of an intervention by calculating the difference in average outcome between the treatment and control group.

\[\bar y(1) - \bar y(0) = \alpha + \beta -\alpha = \beta\]

Population parameters and their estimates

In a regression equation, the parameter \(\hat{\beta}\) gives an estimate of the average impact of an intervention. We estimate the true population parameter \(\beta\) from a sample in a regression analysis. Hence, we use \(\hat{\beta}\) instead of \(\beta\) to denote that it is an estimate of the population parameter. We do the same for the parameter \(\alpha\). Since these coefficients are estimates, the standard error is also reported in the regression output.

Standard error

The standard error is a measure of how uncertain we are about the true underlying value of a coefficient.

The \(\beta\) coefficient is the difference in the averages of the treatment and control group. The difference is just an estimate of the difference in the true underlying mean.

\[True \space parameter = \beta \approx Estimated \space parameter = \hat{\beta}\] \[\beta = \mu_1 - \mu_0 \approx \hat{\beta} = \bar{y} (1) - \bar{y}(0)\]

Since the coefficient \(\hat{\beta}\) is an estimate of the true underlying mean, we could form a hypothesis about the true difference in mean, i.e., \(H_0: \mu_1 = \mu_0\) against \(H_A: \mu_1 \neq \mu_0\). We can test the hypothesis about \(\beta\) by forming confidence intervals around \(\hat{\beta}\) and by calculating the corresponding p-values.

Decision rules

The \(H_0\) is the benchmark against which virtually all interventions are measured. The test will yield a confidence interval and a p-value.

Note

The confidence interval gives a range of values which likely includes the true population parameter at a given confidence level.

The p-value is a probability that measures how compatible the data are with the null hypothesis. It is the probability of observing the data as extreme as what we observed, assuming that the \(H_0\) is true.

Confidence interval

  • If the null value falls within the range of the confidence interval, we fail to reject the \(H_0\).
  • If the null value falls outside the confidence interval, we reject the \(H_0\).
Sample size and confidence interval

Increasing the sample size will shrink the confidence interval, which results in an improvement in the precision of our estimates. The increase in the precision improves the likelihood of detecting the impact of an intervention.

P-value

  • If the p-value is less than the significance level, we reject the \(H_0\).
  • If the p-value is more than the significance level, we fail reject the \(H_0\).

Things to remember while conducting RCT

  • Controlling for additional variables
  • Heterogeneous treatment effects
  • Spillover effects
  • Imperfect compliance and attrition
Note

More on these topics will be uploaded later.