41.8 Wget Mirror Websites
20200526
A popular use case for wget is to make a complete copy of a website, perhaps for local perusal or local archival. For example, we might backup a conference website for archival and historical purposes:
$ wget --mirror --convert-links --adjust-extension --page-requisites \
--no-parent https://ausdm18.ausdm.org/
This will create a directory called ausdm18.ausdm.org
in the
current working directory. Browsing to this directory within a browser
using a URL like file:///home/kayon/ausdm18.ausdm.org will
interact with the local copy of the web site.
Another use case is to download all of the available Debian packages
that start with r
as available from a particular Debian
mirror.
$ wget --mirror --accept '.deb' --no-directories \
http://archive.ubuntu.com/ubuntu/ubuntu/pool/main/r/
Useful comman line options include -r
, or
--recursive
, which indicates that we want
to recurse through the given URL link. The
--mirror
option includes
--recursive
as well as some other options
(see the
manual page
for details). The -l 1
, or
--level=1
, specifies how many levels
we should dive into at the web site. Here we recurse only a single
level. The -A .deb
, or
--accept=.deb
resticts the download to just those
files the have a deb extension. The extenstions can be a
comma separated list. The -nd
, or
--no-directories
, requests
wget to not create any directories locally—the files are
downloaded to the current directory.
For a website that no longer exists, the wayback machine is useful. To copy a website from there, install the and then:
Unlike wget, fixed links are not updated to be internally consistent. That will need to be done by hand.
Your donation will support ongoing availability and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2022 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0