"The success of case-mix adjustment for accurately predicting the outcome (discrimination) was evaluated using the area under the receiver operating characteristic curve (c statistic). The c statistic is the probability of assigning a greater risk of death to a randomly selected patient who died compared with a randomly selected patient who survived. A value of 0.5 suggests that the model is no better than random chance in predicting death. A value of 1.0 suggests perfect discrimination. In general, values less than 0.7 are considered to show poor discrimination, values of 0.7-0.8 can be described as reasonable and values above 0.8 suggest good discrimination."
and
"As a rank-order statistic, it is insensitive to systematic errors in calibration"
The second quote is particularly salient, as one of the key flaws in Professor Jarman's US/UK comparison may be that there were systematic differences between the admissions policy and coding in the different countries, this would not be detected by the c-statistic.
It is then interesting to look at Dr Foster's data concerning the c-statistics they have obtained for the clinical conditions that they use to determine the HSMRs of UK hospitals. Dr Foster routinely uses 56 diagnostic groups (contributing towards about 83% of UK deaths) and they are defined according to the ICD codes. Of note much of Prof Jarman's UK/US comparison only used 9 diagnostic codes, which is a little strange in itself, why not use all 56 diagnostic codes? These 9 codes covered less than half of the deaths in both countries. I have listed the codes used the their individual c-statistics based on Dr Foster's 2012 report:
Septicaemia - 0.792 (reasonable)
Acute MI - 0.759 (reasonable)
Acute heart failure - 0.679 (poor)
Acute cerebrovascular event - 0.729 (reasonable)
Pneumonia - 0.838 (good)
COPD/Bronchiectasis - 0.714 (low end of reasonable)
Aspiration pneumonitis -0.711 (low end of reasonable)
Fractured Hip - 0.756 (reasonable)
Respiratory failure - 0.745 (reasonable)
These c-statistics are not very impressive, in must be remembered that 0.5 is effectively the zero, and many of these c-statistics are around the low end of reasonable. It is interesting that Professor Jarman quotes the overall c-statistic for his 9 code model as being 0.921. Given that he individually compared each HSMR for each code, surely he should be giving the individual c-statistics for each country's subgroup for each specific code? Professor Jarman has not provided this data, it would certainly be interested to see the c-statistics for the UK and the US for each code, to see if there was a relationship between c-statistic disparity and HSMR disparity.
It is also interesting that 75 of 247 of Prof Jarman's mortality models failed their statistical Goodness of Fit tests. The measure of how well the models generally fit is also pretty poor (mean Rsquared of 0.25). It must also be reiterated that the c-statistic will not pick up errors in calibration, so if one country is systematically up-coded relative to another, then the c-statistic will not detect this. The one key question I would like to see answered, is just how did Professor Jarman select these 9 codes for the US/UK comparison? There are also other questions, like which models failed the Goodness of Fit tests and did the codes assessed have reasonable r-squared values? There is so much beneath the surface here, I am convinced this story will run and run.