OpenTerrain Tutorials/Web Backup

Tutorials

Web Backup

Sometimes web pages go extinct. If the pages contained important references, then it would have been wise to have a local copy. Also, a local clone is much speedier to browse. Here is how to make a local copy with the Linux & Windows HTTrack tool:

Say, the web page to clone is http://www…/index.html. Then with HTTrack it is as easy as this:

1) Install httrack e.g. via Linux/Ubuntu command line:

> sudo apt-get install httrack

2) Download a local copy of an entire web page including all links that point to within the same web page:

> httrack http://www…/index.html

Some web servers may not allow web crawlers, just browsers.
Then add “-F firefox” to the above command line.

Some web servers may also restrict web crawlers via a respective robots.txt file.
Then add “-s0” to the above command line, so that those restrictions are disrespected.

If we want to include externally linked images, e.g. all directly linked images on an external image upload server, we use:

> httrack http://www…/index.html -n +*.jpg +*.png +*.gif

And if we do not want the entire web page structure to be downloaded, but only a particular page up to a certain link depth, we provide the -r option with the number of link levels to be downloaded, e.g. 2:

> httrack http://www…/index.html -r2 -n +*.jpg +*.png +*.gif

This only works for public pages. If you want to download websites that require authentification, we use the Firefox Extension WebScrapBook. This extension provides similar options like HTTrack, but can also do authentication via Firefox’ password manager. If it is just desired to download certain images from a password-protected website, we can use the Firefox Extension DownThemAll. This extension can search media of a specific type in an open Tab, and then filter, rename and download them.

OpenTerrain

Web Backup

Search

Options: