They ID and track every website that joins #Cloudflare. It’s a huge effort but those guys are on top of it. A script could check the list of domains against their list. There is also this service (from the same devs) which does some checks:
but caveat: if a non-CF domain (e.g. example.tld) has a CF host (e.g. somehost.example.tld), that tool will return YES for the whole domain.
Manually adjusting availability is a can of worms that I don’t want to open
I would suggest not bothering with any complex math, and simply do the calculation as you normally do but then if a site is Cloudflare cap whatever the calculated figure is to 98%. Probably most (if not all) CF sites would be 100% anyway, so they would just be reduced by 2%. Though it would need to be explained somewhere – the beauty of which would be to help inform people that the CF walled garden is excluding people. Cloudflare’s harm perpetuates to a large extent because people are unaware that it’s an exclusive walled garden that marginalizes people.