PDF documents <OT>
Bill Campbell
linux-sxs
Thu Sep 14 14:01:22 PDT 2006
On Thu, Sep 14, 2006, Ric Moore wrote:
>I've finally managed to scan a series of documents, so I have 9 scanned
>images saved as PDF documents. How would I go about making one PDF
>document that contains each of them saved as one page each? I'm trying
>OpenOffice but I cannot find the command to insert a new page to copy
>the image to. I RTFM but cannot find it. I know this is slightly
>off-topic, but I could sure use some help. Any other method would be
>welcome as well. Ric
It's been a while since I was extensively involved with scanning
and OCR of documents so this may be somewhat dated.
The OCR software I've used with Linux (Vividata) requires TIFF input
or perhaps other standard image formats for conversion. The gocr
program supports PostScript and a wide variety of image formats.
I think you could probably use pdf2ps to convert the PDF file to
PostScript, then use convert from ImageMagick to create single
page image files ``convert -monochrome document.ps tif:pages''
I played a bit converting some court documents for Groklaw,
converting it to HTML using Readiris Pro to OCR the PDF file and
save it as an RTF document (which required a fair amount of manual
fiddling to get rid of the line numbers and only get the text of
the document). I then loaded that RTF into OpenOffice.org and
Microsoft Word from Office 2004 for Mac, saving it as HTML.
Finally I ran the HTML through a python filter I wrote that
removes all the fancy formatting and fonts inserted by M$Word and
OpenOffice.org to create a clean HTML format. The results are
available for viewing here:
http://www.celestial.com/Members/bill/Drafts/pdf2html/
Bill
--
INTERNET: bill at Celestial.COM Bill Campbell; Celestial Software LLC
URL: http://www.celestial.com/ PO Box 820; 6641 E. Mercer Way
FAX: (206) 232-9186 Mercer Island, WA 98040-0820; (206) 236-1676
The day-to-day travails of the IBM programmer are so amusing to most of
us who are fortunate enough never to have been one -- like watching
Charlie Chaplin trying to cook a shoe.
More information about the Linux-users
mailing list