Asking because I have stood up a web crawler on my home server to back up "at-risk" smaller, older websites that might be gone tomorrow. I keep bumping into links to sites like http://landroverresource.com/ that are no longer online, and I hate to think about documentation that might be lost for those of us with older vehicles.
Those of you old salts that want to look through your bookmarks for sites feel free to post them here so I and others can back them up. I will also explore solutions to submit them to the Wayback Machine for public accessibility.
Particularly look for links that are http and not https (generally indicates they're older and haven't been updated), blogs or sites on public free services like Wordpress.com or blogspot.com (free services can shut down with little notice), or with weird domains that indicates they're not easily found... want to prioritize ones that aren't likely to stay up long term.
Here's what I have at the moment:
For the curious, I'm using https://github.com/ArchiveTeam/grab-site to archive these. I can just pass it a domain, and it'll crawl the site and back up dependencies as well into a standardized open source WARC format.
Those of you old salts that want to look through your bookmarks for sites feel free to post them here so I and others can back them up. I will also explore solutions to submit them to the Wayback Machine for public accessibility.
Particularly look for links that are http and not https (generally indicates they're older and haven't been updated), blogs or sites on public free services like Wordpress.com or blogspot.com (free services can shut down with little notice), or with weird domains that indicates they're not easily found... want to prioritize ones that aren't likely to stay up long term.
Here's what I have at the moment:
Code:
https://rangeroverclassic.net
http://range-rover-classic.com[
http://www.red90.ca/rovers/index.html
http://www.glencoyne.co.uk/info.htm
https://mudtrekker.wordpress.com/
http://www.landroverstuff.com/index.htm
https://www.lrfaq.org/
https://new.lrcat.com/ #Having some trouble getting the crawler to work with this one
http://landy.ee/manuals/
http://smithies.co.nz/land_rover/ #This has Google Drive links that are harder to grab, should be lot of overlap w/ previous link
http://www.expeditionlandrover.info/index.html
https://www.roverweb.org/
https://www.roverhaul.com/
http://www.ronjacob.com/ #someone's RRC project site, seems cool.
http://www.nickslandrover.co.uk/
http://www.jpurnell.com/RR/default.htm
For the curious, I'm using https://github.com/ArchiveTeam/grab-site to archive these. I can just pass it a domain, and it'll crawl the site and back up dependencies as well into a standardized open source WARC format.
Last edited: