Printing web pages

ronnie gauthier ronnieg
Mon May 17 11:46:18 PDT 2004


Dont know much about PHP or Python but with Perl I'd just make a socket and
grab the page I want, look at:
http://www.itworld.com/nl/perl/05312001/pf_index.html
for a basic perl sockets using LWP.
the url for the book is
http://www.slackware.com/book/index.php
a socket will grab the page as it can mask itself as a browser, it does not
care what the links are formatted as so there should be no problem following
the "next" link or putting the TOC in a hash, they are even nice enough to give
you a HR to pattern match and grab whats between them.

On Fri, 4 Apr 2003 20:39:46 -0700 - Collins Richey <erichey2 at attbi.com> wrote
the following
Re: Re: Printing web pages

>On Fri, 04 Apr 2003 17:39:19 -0800
>"Net Llama!" <netllama at linux-sxs.org> wrote:
>
>> On 04/04/03 17:35, Collins Richey wrote:
>> > Are there any generalized utility programs that will grap a web
>> > page, extract the text, convert to a text (or fill-in-the-blanks)
>> > file for printing?
>> > 
>> > I'm getting ready to work on some python code to do that for
>> > printing the Slackware users' manual, but it would be nice to have a
>> > real tool.
>> 
>> html2jpeg creates jpegs (basically screenshots) of webpages:
>> http://freshmeat.net/projects/html2jpg/
>> 
>> html2ps converts html to postscript
>> http://freshmeat.net/projects/html2ps/
>> 
>> html2pdf
>> http://freshmeat.net/projects/html2pdf/
>> 
>
>Thanks,
>
>Now that I've looked at the problem a little more closely, I probably
>need more that this.  The root of what I want to retrieve is
>www.slackware.com/book which is a php beast.  What I'm looking to do is
>
>1. Retrieve the base page and follow all Next links, strip out all the
>extra crap on each page, retain and format the text, and store the
>result for printing.
>
>2. I could do this with simple python tools for a normal html site, but
>the #$@! slackware site doesn't respond to simple http requests; even
>the links are php commands.  A browser, of course, can wade through this
>with ease, but I don't want to have to save each individual page as html
>just to format it.
>
>3. All this work because the Slack folks don't provide a printable
>version.
>
>Any thoughts?
>
>--
>Collins - Slack 9.0 EXT3
>_______________________________________________
>Linux-users mailing list
>Linux-users at linux-sxs.org
>Unsubscribe/Suspend/Etc ->
>http://www.linux-sxs.org/mailman/listinfo/linux-users


More information about the Linux-users mailing list