Go to TogaWare.com Home Page. GNU/Linux Desktop Survival Guide
by Graham Williams
Duck Duck Go


Wget UserAgent Browser Identification

20210211 Some sites will check whether a browser is being identified to download and if not they will return a 403 Forbidden response. This is to prevent the burden of automated programs using the site's bandwidth. By overriding this we are placing a burden on the websites owner. They may also employ other mechanisms to identify robots and block appropriately. They may even decide to block your IP address transiently or even permanently! So do due diligence before deciding to override the website owner's choices.

Programs and the command line wget typically may not report a UserAgent to the website from which they are downloading any files, or they may report accurately that they are wget, for example.

The reported UserAgent can be changed to avoid the 403 error:

$ wget -U "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)" https://example.com/paper.pdf

Support further development by purchasing the PDF version of the book.
Other online resources include the Data Science Desktop Survival Guide.
Books available on Amazon include Data Mining with Rattle and Essentials of Data Science.
Popular open source software includes rattle and wajig.
Hosted by Togaware, a pioneer of free and open source software since 1984.
Copyright © 1995-2020 Togaware Pty Ltd. Creative Commons ShareAlike V4.