wget question

Alan Jackson ajackson
Mon May 17 11:57:32 PDT 2004


On Mon, 29 Dec 2003 10:13:05 -0500
Joel Hammer <joel at hammershome.com> wrote:

> Some additional progress. I seem to be getting more files if I bypass
> the first couple of pages:
> 
> wget -r -p http://atlases.muni.cz/atlases.muni.cz/_images-common
> 
> This behavior has been puzzled. Could this be because these images are
> fetched by javascript?
> 

wget doesn't do javascript. It runs ftp behind the scenes and retrieves
only files that have links in web pages it encounters. It also respects
robots.txt files.

To retrieve a site that uses javascript or database retrievals is hard.

If you can decode the file structure, you could write a simple webpage
with links to all the images, and then run wget on *that*. I also have
a simple perl hack for sucking files off a site... as you can see by the
amount of stuff commented out, I always start with this file and hack
it until it works.


#!/usr/bin/perl -w

#	Retrieve a traffic map and display it

#	initialize

use LWP::UserAgent;
#use HTML::Parse;
#use HTML::FormatText;

$url = "http://soiweb/~rjmiller/ISMAP/houtraf/new.houtraf.gif";
$url = " http://ccc.ece.utexas.edu/~thecap/pictures/Sewanee2000/2000-07-03/";

#print "$url\n";

#	set up and open connection

	$agent = new LWP::UserAgent;
	#$agent->proxy('http','http://internet:80/'); # needed if going through firewall
	#$agent->proxy('http','http://134.163.248.80:80/'); # needed if going through firewall
	for ($i=1;$i<=50;$i++) {
		my $z="";
		#if ($i<100) {$z='0';};
		if ($i<10) {$z='0';};
		##next if $i < 40;
		#my $file = "c$z" . $i . 'a.jpg';
		#my $file = "amj$z" . $i . '.jpg';
		#my $file = "lezz$z" . $i . '.jpg';
		my $file = "tj$z" . $i . '.jpg';
		next if -s $file;
		print STDERR "fetch $file\n";
		my $newurl = $url . $file;
		$request = new HTTP::Request('GET', "$newurl");
		$response = $agent->request($request);
		if (!$response->is_success() ) {print STDERR "Couldn't get URL. Status code = ",$response->code,"\n";next;}

		$output = $response->content;


#		Display map

		open(GIF,">$file");
		print GIF $output;
		close(GIF);
		print STDERR "fetched $file\n";
	}


-- 
-----------------------------------------------------------------------
| Alan K. Jackson            | To see a World in a Grain of Sand      |
| alan at ajackson.org          | And a Heaven in a Wild Flower,         |
| www.ajackson.org           | Hold Infinity in the palm of your hand |
| Houston, Texas             | And Eternity in an hour. - Blake       |
-----------------------------------------------------------------------


More information about the Linux-users mailing list