Using wget to snapshot a web site
Kurt Wall
kwall
Wed Dec 8 19:10:22 PST 2004
On Wed, Dec 08, 2004 at 09:35:37PM -0600, Alan Jackson took 38 lines to write:
> On Wed, 07 Dec 2005 15:00:49 -0600
> Michael Hipp <Michael at Hipp.com> wrote:
>
> > I'm trying to use wget to grab an offline copy of this website so I can
> > refer to it when doing development without Internet access.
> >
> > http://wiki.wxpython.org/index.cgi/FrontPage
> >
> > But all the links in that page all look like this:
> >
> > <a href="/index.cgi/ObstacleCourse">ObstacleCourse</a>
> >
> > I can't find any combination of options for wget which will cause it to
> > follow these links. I presume it's because the link is written like an
> > absolute link when it is actually more of a relative link.
> >
> > Anyone know how to get wget to grab these or another tool which might do
> > the job?
How about wget's -E option?
-E
--html-extension
If a file of type application/xhtml+xml or text/html
is downloaded and the URL does not end with the regexp
\.[Hh][Tt][Mm][Ll]?, this option will cause the suffix
.html to be appended to the local filename. This is
useful, for instance, when you're mirroring a remote
site that uses .asp pages, but you want the mirrored
pages to be viewable on your stock Apache server.
Another good use for this is when you're downloading
CGI-generated materials. A URL like
http://site.com/article.cgi?25 will be saved as arti?
cle.cgi?25.html.
Kurt
--
I am not now, nor have I ever been, a member of the demigodic party.
-- Dennis Ritchie
More information about the Linux-users
mailing list