Recursive Wget download
Wget can recursively download data or web pages. This is a key feature Wget has that cURL does not have. While cURL is a library with a command-line front end, Wget is a command-line tool. Since recursive download requires several Wget options,
wget --recursive -np -nc -nH --cut-dirs=4 --random-wait --wait 1 -e robots=off https://site.example/aaa/bbb/ccc/ddd/
This downloads the files to whatever directory you ran the command in.
To use Wget to recursively download using FTP, simply change https://
to ftp://
using the FTP directory.
Wget recursive download options:
--recursive
- download recursively (and place in recursive folders on your PC)
--recursive --level=1
- recurse but
--level=1
don’t go below specified directory -Q 1g
- total overall download
--quota
option, for example to stop downloading after 1 GB has been downloaded altogether -np
- Never get parent directories (sometimes a site will link upwards)
-nc
- no clobber – don’t re-download files you already have
-nd
- no directory structure on download (put all files in one directory commanded by -P)
-nH
- don’t put vestigial site name directories on your PC
-A
- only accept files matching globbed pattern
--cut-dirs=4
- don’t put a vestigial hierarchy of directories above the desired directory on your PC. Set the number equal to the number of directories on server (here aaa/bbb/ccc/ddd is four)
-e robots=off
- Many sites will block robots from consuming data. Here we override this setting telling Apache that we’re (somewhat) human.
--random-wait
- To avoid excessive download requests (that can get you auto-banned from downloading) we politely wait in-between file downloads
--wait 1
- making the random wait time average to about 1 second before starting to download the next file. This helps avoid anti-leeching measures.