Home > Uncategorized > Find all webpages hosted on a domain via an API

Find all webpages hosted on a domain via an API

If you want to find all webpages that are hosted on a given domain, then you can use the Site: prefix in Google or Bing

However, let’s imagine you want a more extensive list, and perhaps, you want the result back in JSON format, such that you can use it in your own applications, here is where the WayBackMachine (Internet Archive) can be useful.

So, imagine that you want to see what pages are (or were) hosted on the domain webtropy.com; you’d use the url

This returns a JSON array, of webpages on that domain divided by year. Of course, you can’t be sure that all of those pages are live, but you can know that they were live on the year specified, so you can focus on the latest year. There will also be plenty of duplicates, so you’ll need to eliminate these too.

This appears to be an unofficial API, so it is subject to change without warning. However, I hope it is useful!

Categories: Uncategorized
  1. No comments yet.
  1. No trackbacks yet.

Leave a comment