Satish Bajracharya - Analyzing the Impact of Receiving Monetary Incentive on Decision to Learn the Result of HIV Test

Comparing outcomes between treatment and control group

Here, we apply regression analysis to analyze the impact of receiving monetary incentive on decision to learn the result of HIV test. Visit my previous blog to learn more about the theoretical aspects discussed in this section.

We use data from “The Demand for, and Impact of, Learning HIV Status” study in Malawi. The study uses a randomized controlled trial (RCT), where individuals were provided varying degrees of monetary incentives to learn about their HIV status after receiving an HIV Test.

Note

Study: Thornton, Rebecca L. 2008. “The Demand for, and Impact of, Learning HIV Status.” American Economic Review, 98 (5): 1829-63.

Data file: Click here

Detailed description of the intervention: Click here

We use the “Thornton HIV Testing Data.dta” for the analysis.

Importing and describing the data

Execution in R

The data file is a Stata (.dta) file. To import the dataset in R, we will need to install the haven package in R and use the read_dta() function. Run the following code in R to install the haven package:

install.packages("haven")

Now, import the dataset and check the list of variables and number of observations. When you download the data file, it comes with a readme file. Please read the readme file to learn more about the variables.

The str function in R

The str () function in R provides the structure of the dataset. However, we will only use the names () and dim() function here to make the content of this analysis shorter. Please check the Stata execution section to get a detailed description of the variables.

library(haven)
# import the .dta file
data <- read_dta("C:/Data analysis/Thornton data/Data/Thornton HIV Testing Data.dta")
# List of variables
names(data)

 [1] "site"                "rumphi"              "balaka"             
 [4] "villnum"             "m1out"               "m2out"              
 [7] "survey2004"          "got"                 "zone"               
[10] "distvct"             "tinc"                "Ti"                 
[13] "any"                 "under"               "over"               
[16] "simaverage"          "age"                 "age2"               
[19] "male"                "mar"                 "educ2004"           
[22] "timeshadsex_s"       "hadsex12"            "eversex"            
[25] "usecondom04"         "tb"                  "thinktreat"         
[28] "a8"                  "land2004"            "T_consentsti"       
[31] "T_consenthiv"        "T_final_trichresult" "T_final_result_ct"  
[34] "T_final_result_gc"   "hiv2004"             "test2004"           
[37] "followup_tested"     "followupsurvey"      "havesex_fo"         
[40] "numsex_fo"           "likelihoodhiv_fo"    "numcond"            
[43] "anycond"             "bought"

# dimensions of the dataset
dim(data)

[1] 4820   44

There are 44 variables and 4,820 observations.

Execution in Stata

Use the cd command to import the dataset. The describe command provides a list of variables with their types and labels.

cd "C://Data analysis"
use "Thornton data/Data/Thornton HIV Testing Data.dta", clear  
describe



. cd "C://Data analysis"
C:\Data analysis

. use "Thornton data/Data/Thornton HIV Testing Data.dta", clear  

. describe

Contains data from Thornton data/Data/Thornton HIV Testing Data.dta
  obs:         4,820                          
 vars:            44                          12 Mar 2008 11:10
 size:       785,660                          (_dta has notes)
-------------------------------------------------------------------------------
              storage   display    value
variable name   type    format     label      variable label
-------------------------------------------------------------------------------
site            float   %9.0g                 1=Mchinji 2=Balaka 3=Rumphi
rumphi          float   %9.0g                 Rumphi
balaka          float   %9.0g                 Balaka
villnum         double  %9.0g                 VILLNUM
m1out           float   %9.0g                 Survey outcome in 1998
m2out           float   %9.0g                 Survey outcome in 2001
survey2004      float   %9.0g                 completed baseline survey
got             float   %9.0g                 Got HIV results
zone            float   %9.0g                 VCT zone
distvct         float   %9.0g                 Distance in km
tinc            float   %9.0g                 Total value of the incentive
                                                (kwacha)
Ti              float   %9.0g                 Value of incentive (kwacha)
                                                discrete
any             float   %9.0g                 Received any incentive
under           float   %9.0g                 under 1.5 km
over            float   %9.0g                 over 1.5 km
simaverage      float   %9.0g                 (mean) simaverage
age             float   %10.0g                Age
age2            float   %9.0g                 Age squared
male            float   %9.0g                 Gender
mar             float   %9.0g                 Married at baseline
educ2004        float   %9.0g                 Yrs of completed education
timeshadsex_s   byte    %8.0g                 Times per month had sex
                                                (subsample)
hadsex12        float   %9.0g                 Had sex in past 12 months
                                                (baseline)
eversex         float   %9.0g                 Ever had sex at baseline
usecondom04     float   %9.0g                 Used a condom during last year at
                                                baseline
tb              float   %9.0g                 HIV Test before baseline
thinktreat      float   %9.0g                 Think there will be ARV treatment
                                                in the future
a8              byte    %8.0g      A8         Likelihood of HIV infection
land2004        float   %9.0g                 Owned any land at baseline
T_consentsti    long    %8.0g      yesno      consent to sti test
T_consenthiv    long    %8.0g      yesno      consent to hiv test
T_final_trich~t float   %9.0g      res        final trich results
T_final_resul~t float   %23.0g     otherres   final CT results
T_final_resul~c float   %23.0g     otherres   final GC results
hiv2004         float   %9.0g                 HIV results
test2004        float   %9.0g                 HIV test in 2004
followup_tested byte    %8.0g                 Different HIV testing sample.
                                                Drop from analysis
followupsurvey  float   %9.0g                 Was interviewed at follow-up
havesex_fo      byte    %10.0g                Had sex between baseline and
                                                follow-up
numsex_fo       byte    %10.0g                Num partners between baseline and
                                                follow-up
likelihoodhiv~o int     %10.0g                Likelihood of infection at
                                                follow-up
numcond         float   %9.0g                 Number of condoms purchased at
                                                follow-up
anycond         float   %9.0g                 Any condoms purchased at the
                                                follow-up
bought          float   %9.0g                 Bought condoms on own at
                                                follow-up
-------------------------------------------------------------------------------
Sorted by:

Regression Analysis

Here, we analyze the impact of receiving any monetary incentive on the decision to receive results from study participant’s HIV test. The tinc variable records the amount of monetary incentive received (in kwacha) by the study participants. We tabulate the variable tinc to see the range of monetary incentives offered.

Execution in R

library(dplyr)
# ensure that all rows are diplayed when priting tibbles
options (tibble.print_max = Inf) 
# tabulate tinc
data |> filter(!is.na(tinc))|> # remove NA
  select(tinc) |> # select tinc from dataset
  group_by(tinc) |> # group by tinc
  summarize(count=n()) |> # create table with frequency 
  mutate(percent = count/sum(count)*100) |> # create percent variable
  round(digits = 2) # round the digits upto 2 decimal points

# A tibble: 27 × 3
    tinc count percent
   <dbl> <dbl>   <dbl>
 1     0   679   23.4 
 2    10    58    2   
 3    20   154    5.31
 4    30    81    2.79
 5    40    64    2.21
 6    50   205    7.07
 7    60    37    1.28
 8    70    40    1.38
 9    80     7    0.24
10    90     8    0.28
11   100   492   17.0 
12   110    14    0.48
13   120    82    2.83
14   130     9    0.31
15   140    42    1.45
16   150    43    1.48
17   160    28    0.97
18   170     8    0.28
19   180     9    0.31
20   200   431   14.9 
21   210    36    1.24
22   220    48    1.65
23   230    30    1.03
24   240     2    0.07
25   250    68    2.34
26   260     3    0.1 
27   300   223    7.69

Execution in Stata

cd "C://Data analysis"
use "Thornton data/Data/Thornton HIV Testing Data.dta", clear 
tabulate tinc



. cd "C://Data analysis"
C:\Data analysis

. use "Thornton data/Data/Thornton HIV Testing Data.dta", clear 

. tabulate tinc

Total value |
     of the |
  incentive |
   (kwacha) |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |        679       23.41       23.41
         10 |         58        2.00       25.41
         20 |        154        5.31       30.71
         30 |         81        2.79       33.51
         40 |         64        2.21       35.71
         50 |        205        7.07       42.78
         60 |         37        1.28       44.05
         70 |         40        1.38       45.43
         80 |          7        0.24       45.67
         90 |          8        0.28       45.95
        100 |        492       16.96       62.91
        110 |         14        0.48       63.39
        120 |         82        2.83       66.22
        130 |          9        0.31       66.53
        140 |         42        1.45       67.98
        150 |         43        1.48       69.46
        160 |         28        0.97       70.42
        170 |          8        0.28       70.70
        180 |          9        0.31       71.01
        200 |        431       14.86       85.87
        210 |         36        1.24       87.11
        220 |         48        1.65       88.76
        230 |         30        1.03       89.80
        240 |          2        0.07       89.87
        250 |         68        2.34       92.21
        260 |          3        0.10       92.31
        300 |        223        7.69      100.00
------------+-----------------------------------
      Total |      2,901      100.00

Running the Regression

Here, we only focus on analyzing the effect of receiving any financial incentive. Thus, we create a factor variable indicating whether the respondent has received an incentive or not. Once we create the treatment variable, we run a regression to analyze the impact of receiving financial incentive on the decision to obtain HIV results. The variable got indicates whether or not the respondent received the HIV result. In R we use the lm () function to run a regression. In Stata we use the regress command for the same.

Execution in R

data_1 <- data |>
  filter(!is.na(tinc)) |> #remove na in tinc
  mutate(treatment = ifelse(tinc > 0, 1, 0)) # create treatment variable
data_1$treatment <- factor(data_1$treatment, 
                       levels = c(0, 1),
                       labels = c("Control", "Treatment"))
reg <- lm(got ~ treatment, data = data_1) #run the regression
summary(reg)


Call:
lm(formula = got ~ treatment, data = data_1)

Residuals:
    Min      1Q  Median      3Q     Max 
-0.7892 -0.3387  0.2108  0.2108  0.6613 

Coefficients:
                   Estimate Std. Error t value Pr(>|t|)    
(Intercept)         0.33868    0.01696   19.97   <2e-16 ***
treatmentTreatment  0.45055    0.01920   23.47   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.4232 on 2832 degrees of freedom
  (67 observations deleted due to missingness)
Multiple R-squared:  0.1628,    Adjusted R-squared:  0.1625 
F-statistic: 550.8 on 1 and 2832 DF,  p-value: < 2.2e-16

Execution in Stata

cd "C://Data analysis"
use "Thornton data/Data/Thornton HIV Testing Data.dta", clear 
drop if missing(tinc) | missing(got)
generate treatment = cond(tinc>0, 1, 0)
label define treatment 0 "Control" 1 "Treatment"
label val treatment treatment
regress got treatment



. cd "C://Data analysis"
C:\Data analysis

. use "Thornton data/Data/Thornton HIV Testing Data.dta", clear 

. drop if missing(tinc) | missing(got)
(1,986 observations deleted)

. generate treatment = cond(tinc>0, 1, 0)

. label define treatment 0 "Control" 1 "Treatment"

. label val treatment treatment

. regress got treatment

      Source |       SS           df       MS      Number of obs   =     2,834
-------------+----------------------------------   F(1, 2832)      =    550.78
       Model |  98.6657682         1  98.6657682   Prob > F        =    0.0000
    Residual |  507.321529     2,832  .179138958   R-squared       =    0.1628
-------------+----------------------------------   Adj R-squared   =    0.1625
       Total |  605.987297     2,833  .213903035   Root MSE        =    .42325

------------------------------------------------------------------------------
         got |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
   treatment |   .4505519    .019198    23.47   0.000     .4129083    .4881954
       _cons |   .3386838   .0169571    19.97   0.000     .3054343    .3719333
------------------------------------------------------------------------------

The treatment effect of receiving a financial incentive is 0.4506 or about 45 percentage points, compared to the control group average of about 34 percentage points. The treatment effect is statistically significant (has a p-value of 0.000).

Robust Standard Errors

When making comparison between the distribution of outcomes between two groups, we assume that the two groups have the same variance even though their means differed. This assumption is called the homoskedasticity assumption. However, when the variance in the treatment and control group are different the assumption of homoskedasticity is violated, i.e., the error terms are heteroskedastic. In such cases, we have to use robust standard errors to account for heteroskedasticity. The robust standard errors do not affect the estimates of the parameters in the regression, but they tend to be larger than the unadjusted standard errors. This in turn makes the confidence interval wider.

To test for heteroskedasticity, we run the Breusch-Pagan / Cook-Weisberg test for heteroskedasticity. It tests the null hypothesis of homoskedasticity against the alternative hypothesis of heteroskedasticity. We need to install the lmtest package and run the bptest() function.

Execution in R

library(lmtest)
bptest(reg, studentize = FALSE)


    Breusch-Pagan test

data:  reg
BP = 25.191, df = 1, p-value = 5.193e-07

Execution in Stata

We use the estat hettest command in Stata to test ko heteroskedasticity.

cd "C://Data analysis"
use "Thornton data/Data/Thornton HIV Testing Data.dta", clear 
drop if missing(tinc) | missing(got)
generate treatment = cond(tinc>0, 1, 0)
label define treatment 0 "Control" 1 "Treatment"
label val treatment treatment
regress got treatment
estat hettest



. cd "C://Data analysis"
C:\Data analysis

. use "Thornton data/Data/Thornton HIV Testing Data.dta", clear 

. drop if missing(tinc) | missing(got)
(1,986 observations deleted)

. generate treatment = cond(tinc>0, 1, 0)

. label define treatment 0 "Control" 1 "Treatment"

. label val treatment treatment

. regress got treatment

      Source |       SS           df       MS      Number of obs   =     2,834
-------------+----------------------------------   F(1, 2832)      =    550.78
       Model |  98.6657682         1  98.6657682   Prob > F        =    0.0000
    Residual |  507.321529     2,832  .179138958   R-squared       =    0.1628
-------------+----------------------------------   Adj R-squared   =    0.1625
       Total |  605.987297     2,833  .213903035   Root MSE        =    .42325

------------------------------------------------------------------------------
         got |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
   treatment |   .4505519    .019198    23.47   0.000     .4129083    .4881954
       _cons |   .3386838   .0169571    19.97   0.000     .3054343    .3719333
------------------------------------------------------------------------------

. estat hettest

Breusch-Pagan / Cook-Weisberg test for heteroskedasticity 
         Ho: Constant variance
         Variables: fitted values of got

         chi2(1)      =    25.19
         Prob > chi2  =   0.0000

Note

The Breusch-Pagan test may not capture heteroskedasticity in all instances.

The low p-value suggests that we can reject the null hypothesis of homoskedasticity. In this case, it is better to use robust standard errors instead of unadjusted standard errors.

Running Regression with Robust Standard Errors

To use robust standard errors, we need to install the sandwich package and use vcovHC() function in the coeftest() function from the lmtest package.

library(sandwich)
coeftest(reg, vcov = vcovHC(reg, type = "HC1"))


t test of coefficients:

                   Estimate Std. Error t value  Pr(>|t|)    
(Intercept)        0.338684   0.018968  17.856 < 2.2e-16 ***
treatmentTreatment 0.450552   0.020858  21.601 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Execution in Stata

To run a regression with robust standard errors, we run the regress command with the robust option.

cd "C://Data analysis"
use "Thornton data/Data/Thornton HIV Testing Data.dta", clear 
drop if missing(tinc) | missing(got)
generate treatment = cond(tinc>0, 1, 0)
label define treatment 0 "Control" 1 "Treatment"
label val treatment treatment
regress got treatment, robust



. cd "C://Data analysis"
C:\Data analysis

. use "Thornton data/Data/Thornton HIV Testing Data.dta", clear 

. drop if missing(tinc) | missing(got)
(1,986 observations deleted)

. generate treatment = cond(tinc>0, 1, 0)

. label define treatment 0 "Control" 1 "Treatment"

. label val treatment treatment

. regress got treatment, robust

Linear regression                               Number of obs     =      2,834
                                                F(1, 2832)        =     466.60
                                                Prob > F          =     0.0000
                                                R-squared         =     0.1628
                                                Root MSE          =     .42325

------------------------------------------------------------------------------
             |               Robust
         got |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
   treatment |   .4505519    .020858    21.60   0.000     .4096535    .4914502
       _cons |   .3386838   .0189675    17.86   0.000     .3014922    .3758754
------------------------------------------------------------------------------

The coefficients of the treatment and constant term are the same. But the standard errors of both the parameters are larger.

Note

To learn about measuring impact in an experiment with partial compliance, click here