web scraper
Ric Moore
wayward4now
Tue Nov 21 20:50:24 PST 2006
On Tue, 2006-11-21 at 22:43 -0500, Kurt Wall wrote:
> On Fri, Nov 17, 2006 at 08:16:13AM +0100, Roger Oberholtzer wrote:
> >
> > I have been toying with the idea of setting up a web scraper. Not for
> > anything untowards. Just to track current information and activities
> > related to parameters we measure. Perhaps peek a bit at the competition.
> > IBM has a short paper on the concept with a few ruby examples. But they
> > are very limited. Mainly, it was how to read web documents and find HTML
> > tags. That is the easy part. The hard part is finding the docs in the
> > first place. I know google gets one very far. It is just that I want to
> > automate this for a number of interesting items. Perhaps I really need a
> > meta search engine. Early days here.
> >
> > Anyone been there, done that? Or know where it is being done?
>
> Google has an API you can use, but I have no idea if it will do what you want vis-a-vis
> finding pages of interest.
>
> A poor man's scraper might involve running "w3m --dump" on each URL of interest if all
> you want is content or "w3m --source" on each interesting URL if you want to entire
> page. Perl has modules for scraping HTML screens, too.
>
> Conceptually, you need (at least) two components, one to find "interesting URLs" and
> one to suck up those URLs.
It's been done and is ready for transport, Captain! Just click on your
FireFox /tools / addons and look for "Stumble"... it'll add a bar to
firefox and when you click on stumble it'll take you to places you want
to go, but didn't know about 'em. It asks you a bunch of questions
concerning your preferences and has a rating system when it's used. If
you like it, click in on the thumbs up icon. If not, click on thumbs
down. What's neat is that plenty of other people with similar interests
have already found and rated the dogs, so it won't take you there.
I have yet to hit a loser. The more people of like minds rate the sites,
the better the chances you'll go somewhere on the net you'll enjoy, but
never knew about. Really cool.. you get what you described above, plus
the consensus of opinions from many others who have gone before you.
I give it a thumbs up! Plus, if you're the first to recommend the site,
you get to rate it and make a pithy comment. Ric
More information about the Linux-users
mailing list