pdf data recovery
Kurt Wall
kwall
Fri Nov 5 09:12:38 PST 2004
On Fri, Nov 05, 2004 at 06:01:48AM -0800, Shawn Tayler took 14 lines to write:
> Hi Guys,
>
> Have any of you ever been able to recover text data from a pdf file? I
> have a pdf doc with a 4 column listing on it. I need to recover the data
> in those 4 columns. Typing it all back in is not really an option. The
> original doc that was used to make the pdf is not available.
pdftotext should do. I don't know how well it would handle columnar data,
but at least you'd get the text. You might try:
$ pdftotext -layout -nopgbrk infile.pdf
-layout tries to preserve layout (columns, tables, etc)
-nopgbrk doesn't insert ^L characters to signal page breaks
infile.pdf is you input PDF file; the output will be named input.txt
By default, pdftotext outputs paragraphs without newlines, so I typically
pipe the output through fmt. Thus:
$ pdftotext -layout -nopgbrk infile.pdf - | fmt > infile.txt
Kurt
--
"In short, N is Richardian if, and only if, N is not Richardian."
More information about the Linux-users
mailing list