Recursive Wget download

July 7, 2020

Wget can recursively download data or web pages. This is a key Wget feature that curl does not have. While curl is a library with a command-line front end, Wget is a command-line tool. Since recursive download requires several Wget options,

wget --recursive -np -nc -nH --cut-dirs=4 --random-wait --wait 1 -e robots=off https://site.example/aaa/bbb/ccc/ddd/

This downloads the files to whatever directory you ran the command in. To use Wget to recursively download using FTP, simply change https:// to ftp:// using the FTP directory.

Wget recursive download options:

--recursive: download recursively (and place in recursive folders on your PC)
--recursive --level=1: recurse but --level=1 don’t go below specified directory
-Q 1g: total overall download --quota option, for example to stop downloading after 1 GB has been downloaded altogether
-np: Never get parent directories (sometimes a site will link upwards)
-nc: no clobber – don’t re-download files you already have
-nd: no directory structure on download (put all files in one directory commanded by -P)
-nH: don’t put vestigial site name directories on your PC
-A: only accept files matching globbed pattern
--cut-dirs=4: don’t put a vestigial hierarchy of directories above the desired directory on your PC. Set the number equal to the number of directories on server (here aaa/bbb/ccc/ddd is four)
-e robots=off: Many sites will block robots from consuming data. Here we override this setting telling Apache that we’re (somewhat) human.
--random-wait: To avoid excessive download requests (that can get you auto-banned from downloading) we politely wait in-between file downloads
--wait 1: making the random wait time average to about 1 second before starting to download the next file. This helps avoid anti-leeching measures.