Entries tagged spider

Simple wget crawler for list of files

Posted on 6. August 2015 Comments

This script could be helpful to download a set of files from a webserver, that you don’t have (S)FTP access to. The input file consists of a list of filenames, one name each line.


#!/bin/bash
file=$1
WEBSERVER="http://webserver.tld/folder/"
while IFS= read -r line; do
FULLURL="$WEBSERVER$line"
wget -nc -R --spider $FULLURL
done < "$file"

At first, the first command line argument is saved into the variable file. Then the Webserver address is saved to the WEBSERVER variable. IFS stands for Internal Field Separator. It’s used to read line by line through the file in the while loop that ends in the last line. Inside of the loop, the read line is concatenated with the webserver address into FULLURL. Then, wget is used with the parameters -nc for checking if the file is not already present in the current folder, -R for downloading and –spider for checking the existence on the webserver.

You can find the script on GitHub.