GNU/Linux Desktop Survival Guide
by Graham Williams
Wget Mirror Websites
20200526 A popular use case for wget is to make a complete copy of a website, perhaps for local perusal or local archival. For example, we might backup a conference website for archival and historical purposes:
$ wget --mirror --convert-links --adjust-extension --page-requisites \ --no-parent https://ausdm18.ausdm.org/
ausdm18.ausdm.orgin the current working directory. Browsing to this directory within a browser using a URL like file:///home/kayon/ausdm18.ausdm.org will interact with the local copy of the web site.
Another use case is to download all of the available Debian packages that start with r as available from a particular Debian mirror.
$ wget --mirror --accept '.deb' --no-directories \ http://archive.ubuntu.com/ubuntu/ubuntu/pool/main/r/
Useful comman line options include
--recursive) which indicates that we want
to recurse through the given URL link. The
--mirror option includes
--recursive as well as some other options
for details). The
--level=1) option specifies how many levels
we should dive into at the web site. Here we recurse only a single
--accept) resticts the download to just those
files the have a deb extension. The extenstions can be a
comma separated list. The
wget to not create any directories locally—the files are
downloaded to the current directory.
For a website that no longer exists, the wayback machine is useful. To copy a website from there, install the wayback machine downloader and then:
$ wayback_machine_downloader http://ausdm17.azurewebsites.net/