Wget for website link checking
Wget can check internal website links by:
wget --spider -r -nd -nv -w1 -o mysite.log https://www.yourwebsite.com
If everything is OK, near the bottom of mysite.log
will be
Found no broken links
Assuming .html
or .md
files, search for text in files with a program like Visual Studio Code or
findtext.py
findtext https://mysite.invalid "*.html"
Wget spider options:
--spider
- don’t store HTML files retrieved
-nd
- put output file in the current directory
-nv
- non-verbose. Minimal messages output
-w1
- wait 1 second between requests (don’t get banned by your own server for false scraping detection)
Related: Python internal / external website link checker