Comparing CTA to Boosted Regression for Estimating the Propensity Score (Invited)

Ariel Linden

Linden Consulting Group, LLC

Boosted regression (BR) has been recommended as a machine learning alternative to logistic regression for estimating the propensity score because of its greater accuracy. Commonly known as multiple additive regression trees, BR is a general, automated, data-adaptive modelling algorithm which can estimate the non-linear relationship between treatment assignment (the outcome variable) and a large number of covariates including multiple level interaction terms. However, BR is a “black-box” approach that provides scant information as to how the estimates are derived, and recent research has shown that BR can identify erroneous relationships between outcome and covariates in fabricated random data. The present paper revisits the BR approach used by Linden, et al. (2010) for estimating the propensity score and compares it to a propensity score CTA model which is generated by using the new Stata package for implementing CTA.

View journal article