That’s not entirely true. It’s only very recently that browsers have started using a new system called Encrypted Client Hello which hides the domain of the request. Prior to this all requests needed too have the Host field unencrypted so the receiving server knows which certified to respond with. I imagine there’s still quite a few servers which don’t support the new setup still.
Probably don’t need to scrape it. Just query WikiData for it
https://wikidata.org