Till then, my noble friend, chew upon this

A few weeks back I wrote a, uh, respectful critique of the Center for Medicare and Medicaid Services program to give U.S. hospitals quality ratings. I was pleased to learn that CMS is now re-evaluating that program and requesting public input.

Well, I was pleased for a few minutes. The document soliciting feedback acknowledges the problems I (and more qualified others) pointed out, but I had some issues with it:

  • It downplays the most serious problems
  • It proposes small tweaks to problems that need major re-thinking
  • It talks about contracting with yet another Yale-affiliated group to work through this process (at least someone can benefit from this, I guess)

Below are the comments I sent in response to the request.

Regarding convergence with k-means clustering: I think the use of k-means is not appropriate for grouping the hospital summary scores. They are smoothly distributed by construction, and there aren't any natural breaks.

If k-means is going to be used, it is essential to use multiple iterations of the algorithm to achieve convergence. Not doing so is a serious error, and I think the document improperly downplays the importance of this issue. A single iteration is not "recommended" by any authority.

Regarding Winsorisation of summary scores: As above, I think using k-means introduces more problems than it solves. If it is to be used, I don't think it really matters much whether the summary scores are trimmed. If pressed, I would say to remove the Winsorisation step to simplify the overall process.

Regarding re-sequencing reporting thresholds: The document proposes removing some hospitals' summary scores before applying clustering instead of after. I don't think this really matters much, but I'm weakly in favor of the change.

Suppose that a simple quintile rating was used instead of k-means. It would be odd to remove the non-reporting hospitals from the data after assigning the star ratings - the expectation that each rating category would have 20% of hospitals would be violated.

Regarding application of quadrature: The document isn't very clear about the actual change to the SAS code will be. I think it's important to do two things:

(1) Update and correct the model specification in the methodology report. That report contains only a single paragraph describing the model, and that paragraph doesn't even define all of the variables.

(2) Post the proposed SAS code, and explain how it connects to the mathematical model. The SAS code that's currently on the QualityNet site (link) has little connection to the model in the methodology report. (There are no logarithms in the report, for example)

Once the model and SAS code are properly documented, then the question of what approximation technique (or whether one is actually needed - is there a proof that there is not an analytic solution?) should be used can be answered.

Currently it's not clear what the SAS code is actually computing, and this undermines confidence in the entire star rating project. I wouldn't accept the current code for a student project, let alone for something as important as guiding health care decisions.

After seeing the problems with the k-means clustering, I think each SAS function call should be documented and its parameters explained (as well as what the default parameters are). This would be standard practice for annotating computer code.

This section represents my strongest request. Hospitals should be able to understand how they're being rated and how they can improve. Having to reverse-engineer an underspecified mathematical model and a poorly-documented computer program is wasteful and disadvantages small hospitals that don't have statisticians and data scientists on staff.

Regarding weighting of measure groups: The document proposes some different weights for the measure groups. While I think it makes sense to make "Effectiveness of care" more important than "Efficient use of imaging," I also think the separation of measures is an unnecessary and awkward component of the star rating process.

That is, the latent variable model seems like it might be useful for avoiding subjective assumptions about which measures are the best indicators of quality. Why add a step that needs to make a determination about how much worse mortality is than timeliness? There can be no correct answer.

Regarding negative loadings: I think the document improperly downplays the effect of negative loadings, and doesn't offer a conceptual defense for including them.

It could be the case that a higher quality hospital performs worse on some metrics. For example, suppose a great hospital is located next to a nursing home and that a terrible hospital is located next to a college campus. It could be the case that the great hospital has worse "mortality" stats than the terrible hospital.

Is that actually the effect being captured by the star rating procedure? Looking at the HAI-6 measure, which is negatively weighted in each of the given analysis periods, I doubt it. It's implausible that an infection associated with poor hand washing practices is actually a sign of quality.

I think it's more likely that the negative loadings are indeed "problematic with respect to face validity," and call into question whether the latent variable model is actually producing useful output.

Regarding public reporting thresholds: I don't have much to add here.

Regarding stratification: The goal of the rating project is to produce an "overall" quality measure. Isn't using different ratings for different hospitals is giving up on achieving that goal?

Regarding measure inclusion: I don't have much to add here, except to say that it might be worth reviewing the measures with negative loadings to see if it's actually clear which direction is positive.