<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<p>Good to hear. Glad I could give back.<br>
</p>
<div class="moz-cite-prefix">On 9/16/2019 11:27 AM, Laura Brody
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:CAMDnY5OB-VdLwvoByJncSyomP=KKehDFxrmzNPSf6yi8DbH1kw@mail.gmail.com">
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<div dir="ltr">
<div>Just an update.... I got it all working on a Raspberry Pi 3
B+ with a 32 GB micro SD chip. Debian Linux. I wrote a script
to upload the processed file to Dropbox automatically. It is
happily working on files and will probably be done in a week
or less (it has about 55 files to process, some with 2-6
pages, but most with 50-130 pages).</div>
<div><br>
</div>
<div>All of the software was free. I already had a few Raspberry
Pi boards, so my only investment was my time.</div>
<div><br>
</div>
<div>Thank you so much for pointing me in the right direction.
Left to my own devices, I would still be researching how to
tackle this project.</div>
<div><br>
</div>
<div>Laura Brody</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Mon, Sep 9, 2019 at 10:44
PM Laura Brody <<a href="mailto:laura.k.brody@gmail.com"
moz-do-not-send="true">laura.k.brody@gmail.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div dir="ltr">
<div>Yes, I see that. Now that I know that PDFsandwhich and
tesseract will run on the Raspberry Pi and do what I need,
I have a clear idea what I need to do to get searchable
PDFs out of the files that I have. Thank you for pointing
me in the right direction. You saved me a boatload of time
and aggravation.</div>
<div><br>
</div>
<div>Laura Brody<br>
</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Mon, Sep 9, 2019 at
10:38 PM Cesar Baquerizo <<a
href="mailto:ces@cescom.com" target="_blank"
moz-do-not-send="true">ces@cescom.com</a>> wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div>
<div>
<div>
<div>
<div style="direction:ltr">Yw. You’ll also need
tesseract. They are two different Sw. Let me
know how it goes.
</div>
</div>
<div><br>
</div>
<div
class="gmail-m_-84511995742781478gmail-m_6089203737040733023ms-outlook-ios-signature">Get
<a href="https://aka.ms/o0ukef" target="_blank"
moz-do-not-send="true">Outlook for iOS</a></div>
</div>
<div> </div>
<hr style="display:inline-block;width:98%">
<div
id="gmail-m_-84511995742781478gmail-m_6089203737040733023divRplyFwdMsg"
dir="dir="ltr""><font
style="font-size:11pt" face="Calibri, sans-serif"
color="#000000"><b>From:</b> Laura Brody <<a
href="mailto:laura.k.brody@gmail.com"
target="_blank" moz-do-not-send="true">laura.k.brody@gmail.com</a>><br>
<b>Sent:</b> Monday, September 9, 2019 10:35 PM<br>
<b>To:</b> Cesar Baquerizo; Filepro_List<br>
<b>Subject:</b> Re: OT: Help getting PDF to OCR or
searchable form
<div> </div>
</font></div>
<div dir="ltr">
<div>I found a list of Linux flavors that
PDFsandwhich has been ported to and Raspberrian
Linux was on the list!</div>
<div><br>
</div>
<div>I will be be working on this project tomorrow.
Thank you so much for this lead. I don't think I
would have found it by myself.</div>
<div><br>
</div>
<div>Laura Brody<br>
</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Mon, Sep 9,
2019 at 10:27 PM Laura Brody <<a
href="mailto:laura.k.brody@gmail.com"
target="_blank" moz-do-not-send="true">laura.k.brody@gmail.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px
0px 0px 0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div dir="ltr">
<div>This is very interesting.</div>
<div><br>
</div>
<div>The only Linux box I have running at the
moment is Raspberry Pi 3 B+. I have 64GB SD
card available, so space isn't an issue. Any
idea if it will work on it?</div>
<div><br>
</div>
<div>Laura Brody<br>
</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Mon, Sep 9,
2019 at 9:54 PM Cesar Baquerizo <<a
href="mailto:ces@cescom.com" target="_blank"
moz-do-not-send="true">ces@cescom.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div>
<div>
<div>
<div>
<div style="direction:ltr">Lookup
Tesseract and Pdfsandwich. It may
help you. </div>
</div>
<div><br>
</div>
<div
class="gmail-m_-84511995742781478gmail-m_6089203737040733023gmail-m_8760251007502064830gmail-m_2990805021006981543ms-outlook-ios-signature">Get
<a href="https://aka.ms/o0ukef"
target="_blank"
moz-do-not-send="true">Outlook for
iOS</a></div>
</div>
<div> </div>
<hr style="display:inline-block;width:98%">
<div
id="gmail-m_-84511995742781478gmail-m_6089203737040733023gmail-m_8760251007502064830gmail-m_2990805021006981543divRplyFwdMsg"
dir="dir="ltr"">
<font style="font-size:11pt"
face="Calibri, sans-serif"
color="#000000"><b>From:</b>
Filepro-list
<filepro-list-bounces+ces=<a
href="mailto:cescom.com@lists.celestial.com"
target="_blank"
moz-do-not-send="true">cescom.com@lists.celestial.com</a>>
on behalf of Laura Brody via
Filepro-list <<a
href="mailto:filepro-list@lists.celestial.com"
target="_blank"
moz-do-not-send="true">filepro-list@lists.celestial.com</a>><br>
<b>Sent:</b> Monday, September 9, 2019
9:50 PM<br>
<b>To:</b> Filepro_List<br>
<b>Cc:</b> Laura Brody<br>
<b>Subject:</b> Re: OT: Help getting
PDF to OCR or searchable form
<div> </div>
</font></div>
Additional information.... <br>
<br>
I talked to the user and got some
history... <br>
<br>
The user scanned in legal documents. Saved
the images as pages in a PDF. <br>
That is why I can't search on keywords for
most of the files. A few files <br>
were typed up and then exported as PDF.
most are images of the pages. That <br>
means that OCR has to be part of the
solution. <br>
<br>
I discovered that Adobe Acobat Reader has
a setting to search all PDFs in a <br>
directory for keywords. The problem is
that these files don't contain text. <br>
They contain images of text. Adobe can't
search images and find keywords. <br>
<br>
Laura Brody <br>
<br>
On Mon, Sep 9, 2019 at 8:03 PM Laura Brody
<<a
href="mailto:laura.k.brody@gmail.com"
target="_blank" moz-do-not-send="true">laura.k.brody@gmail.com</a>>
wrote:
<br>
<br>
> I am hoping that one of you has
solved this problem before..... <br>
> <br>
> I have over a thousand pages of text
in a dozen or so PDF files. Most <br>
> files are "read-only" and I can not
do Ctrl-F to search for keywords. I <br>
> would like to be able to OCR the
files and put everything into one file <br>
> that is searchable. Or is there a
utility that will search all of the PDFs <br>
> in a directory for a keyword? <br>
> <br>
> Suggestions anyone? <br>
> <br>
> Laura Brody <br>
> <br>
-------------- next part -------------- <br>
An HTML attachment was scrubbed... <br>
URL: <<a
href="http://mailman.celestial.com/pipermail/filepro-list/attachments/20190909/935e0f40/attachment.html"
target="_blank" moz-do-not-send="true">http://mailman.celestial.com/pipermail/filepro-list/attachments/20190909/935e0f40/attachment.html</a>>
<br>
_______________________________________________ <br>
Filepro-list mailing list <br>
<a
href="mailto:Filepro-list@lists.celestial.com"
target="_blank" moz-do-not-send="true">Filepro-list@lists.celestial.com</a>
<br>
Subscribe/Unsubscribe/Subscription Changes
<br>
<a
href="http://mailman.celestial.com/mailman/listinfo/filepro-list"
target="_blank" moz-do-not-send="true">http://mailman.celestial.com/mailman/listinfo/filepro-list</a>
<br>
</div>
</div>
</blockquote>
</div>
</blockquote>
</div>
</div>
</div>
</blockquote>
</div>
</blockquote>
</div>
</blockquote>
<div class="moz-signature">-- <br>
<img src="cid:part14.09D40B22.79025816@cescom.com" border="0"></div>
</body>
</html>