Searching internet archive for URLs containing substring

A_Zythera · 2 years ago

Searching internet archive for URLs containing substring

WindowlessBasement · 2 years ago

I have tried using the in-built Pagination API to retrieve all relevant domain entries by splitting them into blocks but, due to the way the filters are applied, this only tells me if the entry is in the current block and I have to search each one manually. I have basically no coding knowledge

Short answer: you’re asking questions that will take a program requesting data (the whole internet archive?) non-stop for a month or more. You are gonna need to learn to code if you want to interact with that much data.

I definitely don’t have the ability to automate the search process for the paginated data.

You’re going to need to automate it. A rate-limiter is going to kick in very quickly if you are just spamming the API.

explain to me like I’m 5

You need to learn for yourself if this is a project you are tackling. Also will need to familiarize yourself with the terms of service of the archive, because most services would consider scraping every piece of data they have as abusive behavior and/or malicious.