Receiver operating characteristic (ROC) curve analysis in Stata

Let’s start with an example: A researcher is investigating the role of new biomarker for predicting some cancer. Outcome of cancer is binary or dichotomous which is coded as 0 (means cancer absent) and 1(means cancer present).  Higher values of the new biomarker are associated with higher chances of cancer occurrence.  These two variables (named ‘Cancer” & ‘Biomarker’ respectively) have been uploaded in Stata directly or by importing excel spreadsheet with their column headings.

In Stata software, following commands will give you a ROC plot as well as Area Under ROC curve.

rocreg Cancer Biomarker


Above two commands will yield a ROC curve plot as well as Area under ROC curve with its 95% confidence interval. It is very important to note that higher code of outcome variable must be consistent with increasing values of diagnostic predictor. A conspicuous mistake by novice researcher is reverse coding of outcome variable. In the above example, if you code ‘absence of cancer’ as 1 and ‘cancer present’ as 0, then higher code of cancer outcome is not compatible with increasing values of diagnostic biomarker and ROC curve plot will erroneously show the curve going below the diagonal.  On the other contrary, if increasing values of biomarker are protective and lower values indicate presence of cancer, then outcome variable of cancer should be reverse coded  as 0 for ‘cancer present’ and 1 for ‘cancer absent’.

Now next important issue is how to find out the best diagnostic cut off for the given biomarker in diagnosing cancer?

Any of the following 3 commands can be used either individually or one by one sequentially to get one single cut off value which yields a best trade- off between sensitivity and specificity for diagnosing cancer.

cutpt Cancer Biomarker, liu

cutpt Cancer Biomarker, youden

cutpt Cancer Biomarker, nearest

There are three different methods (Liu, Youden, Nearest) provided in Stata software. Command “nearest” will provide you a cut off value which is nearest to the left upper corner of ROC curve plot (that is- point of peak of ROC curve). A researcher can choose a cut off value depending upon the best sensitivity and specificity obtained by any of three methods.


Leave a Reply

Your email address will not be published. Required fields are marked *

  • Comment by Anonymous — @