The data science behind how Amino rates hospitals by cost

Economists and consumers alike have often observed (and suffered from) extreme cost variation in healthcare, not only based on the service itself and the doctor who performed it, but also determined by the facility where it was performed. Hospitals in particular are known to charge wildly different prices for similar care, and we’ve had little insight into why or how.

Our first step toward uncovering healthcare costs led us to physician level cost estimates for more than 100 procedures, searchable by the doctor one might choose to perform that procedure. Such granularity can help people stay in network, calculate their out-of-pocket costs ahead of time, and budget for a procedure.

Today, we’re excited to announce that we’ve put a dollar sign on 45,000 facility-insurer combinations across the country—including hospitals, imaging centers, and urgent care centers—so that people can not only understand the total cost of a procedure, but also know where it might be cheaper (and of the same or better quality) nearby. Just as Yelp makes it easy to compare restaurants with their straightforward dollar sign ratings, we too are making it easy to compare hospitals by cost.

Starting today, we’re rolling out this new feature in our Amino Plus experience for employers and their employees. This provides a layer of cost granularity that can help employees make smarter decisions about where to get care. For example, you could choose the same ACL surgery with the same doctor, but depending on where they operate, you could pay thousands more or less.

How did we do this? It seems like a simple feature, but a huge amount of data exploration and discovery took place to make this possible. You can read the complete methodology for our Facility Cost Ratings here. Below is a behind-the-scenes look at how we cracked the code of hospital cost variation—plus why I think this is a huge step forward for our price transparency efforts.

The “aha” moment: a clear linear pattern

This project started as an exploration into the relational patterns that exist in healthcare pricing, not just across procedures under one insurer, but among different insurers. With Amino’s database of 9 billion health insurance claims, we were able to explore this on a large scale; with nearly every large facility and insurer combination in the country.

We started by plotting the cost of hundreds of different outpatient procedures conducted at one hospital against a baseline of costs from a major insurer. Here’s an example of real data from a hospital in Washington D.C., with the baseline insurer cost for each procedure on the x-axis and reimbursements from another specific insurer on the y-axis.

From the plot alone, you can see a clear linear relationship between the observed reimbursements negotiated by this insurer and our baseline insurer costs. We found a similar linear pattern (with varying slopes) for nearly every facility-insurer combination we looked at.

It’s rare to find such a clear linear pattern—especially when you’re working with healthcare costs. It was truly an “aha” moment, and we knew that these slopes could be leveraged to quantify the relative costliness of hospitals for outpatient procedures. But first, we needed to construct an index to compare facilities side by side.

Constructing the cost index

By comparing a facility's reimbursements from each insurer to our baseline insurer costs, we were able to estimate multipliers for each facility-insurer combination, and extrapolate those to cost indices. Consider a hypothetical example of a specific procedure cost for two facilities in the same region. We can use the multiplier to compare costs for a private insurer at two different facilities:

	Facility A	Facility B
Outpatient service X	$20	$25
Outpatient service Y	$140	$175
Cost index	2	2.5

With this particular insurer, Facility B charges 1.25x more than Facility A for service X. The same is true for a different service Y. Thus, we could say that Facility B is 1.25 times more expensive than Facility A. We do this by assigning Facility A a cost index of 2, and Facility B a cost index of 2.5.

With Amino’s data, we performed this analysis across hundreds of data points per facility-insurer combination in order to get an accurate estimate of each cost index. Because each cost index is specific to one insurer, one facility could have multiple different cost indices.

From cost index into actionable information

The moment we unlocked these cost indices, we knew we’d need to find the right way to communicate this information to users. The cost indices themselves are just numbers—not very useful to the everyday consumer. But the underlying meaning is incredibly useful. We could show people how expensive the hospitals in their area actually are—something that’s never been done at scale.

To do this, we compared each facility’s cost index to the national average cost index in order to determine its relative expensiveness. Facilities that charge near the national average cost index receive a Cost Rating of 2 dollar signs. Facilities lower than the national average cost index range from 1 dollar sign to 1.5 dollar signs, and facilities higher than the national average cost index range from 2.5 dollar signs to 4 dollar signs. Each facility can have more than one Cost Rating, since we calculate its cost index across all insurers and for each specific insurer.

Here’s a breakdown of what that looks like for all facilities across all insurers:

Dollar signs	Standard deviations	Share of facilities with this dollar sign
	Less than -0.50	20.3%
	-0.50	32.0%
	-0.25	22.3%
	0.10	11.2%
	0.50	6.3%
	1.00	3.2%
	1.50	4.7%

Ultimately, this allows us to create a bird’s-eye view of hospital cost variation in any city, anywhere in the country. In Chicago, for example, cost indices range from 46% below national average to 208% above national average — within just one city.

A watershed moment for price transparency

There are so many aspects of this new feature that I’m excited about. In addition to ranking facilities by expensiveness, it could improve the accuracy of our cost estimates for procedures, particularly in cases where we don’t have many observations for a procedure and facility-insurer combination. It could also allow us to see when new contracts have kicked in, making our cost estimates more dynamic and accurate. Additionally, we plan to combine these ratings with quality measures to illuminate the highest value care in any city in the country.

From a data science perspective, we’re just starting to pull the curtain back on healthcare costs. With the right data and the right team, we can use claims data to figure out what’s been an industry secret for years—and my hope is that this newfound transparency will shift our behavior as healthcare consumers.

As for what’s next, we will soon be publishing a series of reports that will unveil these facility cost ratings within key regions (starting with New York City). We will also be looking into inpatient pricing patterns as well, in the case of more complex healthcare procedures and services. Stay tuned!