GNU/Linux Desktop Survival Guide
by Graham Williams |
|||||
Wget Mirror Websites |
20200526 A popular use case for wget is to make a complete copy of a website, perhaps for local perusal or local archival. For example, we might backup a conference website for archival and historical purposes:
$ wget --mirror --convert-links --adjust-extension --page-requisites \ --no-parent https://ausdm18.ausdm.org/ |
ausdm18.ausdm.org
in the
current working directory. Browsing to this directory within a browser
using a URL like file:///home/kayon/ausdm18.ausdm.org will
interact with the local copy of the web site.
Another use case is to download all of the available Debian packages that start with r as available from a particular Debian mirror.
$ wget --mirror --accept '.deb' --no-directories \ http://archive.ubuntu.com/ubuntu/ubuntu/pool/main/r/ |
Useful comman line options include -r
(--recursive
) which indicates that we want
to recurse through the given URL link. The
--mirror
option includes
--recursive
as well as some other options
(see the
manual page
for details). The -l 1
(--level=1
) option specifies how many levels
we should dive into at the web site. Here we recurse only a single
level. The -A .deb
(--accept
) resticts the download to just those
files the have a deb extension. The extenstions can be a
comma separated list. The -nd
(--no-directories
) requests
wget to not create any directories locally—the files are
downloaded to the current directory.
For a website that no longer exists, the wayback machine is useful. To copy a website from there, install the wayback machine downloader and then:
$ wayback_machine_downloader http://ausdm17.azurewebsites.net/ |