<div dir="ltr">On Thu, Apr 11, 2013 at 11:36 AM, Lonni J Friedman <span dir="ltr">&lt;<a href="mailto:netllama@gmail.com" target="_blank">netllama@gmail.com</a>&gt;</span> wrote:<br><div class="gmail_extra"><div class="gmail_quote">

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5">On Thu, Apr 11, 2013 at 9:32 AM, Andrew Gould &lt;<a href="mailto:andrewlylegould@gmail.com">andrewlylegould@gmail.com</a>&gt; wrote:<br>


&gt;<br>

&gt; On Thu, Apr 11, 2013 at 11:13 AM, Lonni J Friedman &lt;<a href="mailto:netllama@gmail.com">netllama@gmail.com</a>&gt;<br>

&gt; wrote:<br>

&gt;&gt;<br>

&gt;&gt;<br>

&gt;&gt; &gt;<br>

&gt;&gt; &gt; Does the script read in an entire data source file and parse each line?<br>

&gt;&gt; &gt; Or<br>

&gt;&gt; &gt; does is read one line at a time and parse/write it prior to reading the<br>

&gt;&gt; &gt; next<br>

&gt;&gt; &gt; line?  If the entire source file is being read into memory, could it be<br>

&gt;&gt; &gt; causing a bottleneck?<br>

&gt;&gt;<br>

&gt;&gt; The script reads in an entire data source file, parsing line by line,<br>

&gt;&gt; putting the data into a hash (%hash_values).  Once that is completed,<br>

&gt;&gt; the hash is passed to sqlInsert().  So everything is already read into<br>

&gt;&gt; memory at the point in time when performance tanks.  I&#39;d expect that<br>

&gt;&gt; this would be the fast path, since its never needed to read from disk.<br>

&gt;&gt;  All of my systems have 2+GB RAM, and the data in question is always<br>

&gt;&gt; less than 30MB, so I can&#39;t imagine that this would be a swap issue, if<br>

&gt;&gt; that&#39;s what you mean?   Unless querying a key/value pair in a hash is<br>

&gt;&gt; not a good performance path in perl?<br>

&gt;&gt; _______________________________________________<br>

&gt;<br>

&gt;<br>

&gt;<br>

&gt; The script is holding the input file (&gt;150k rows?) and the hash in memory<br>

&gt; while it&#39;s reformatting the data and performing sqlinsert().  I was<br>

&gt; wondering whether the combination of processing and RAM utilization could be<br>

&gt; causing the slowdown.<br>

<br>

</div></div>Yes, that&#39;s how its behaving.  Is there a better way to do this in perl?<br>

<div class="HOEnZb"><div class="h5">_______________________________________________<br></div></div></blockquote><div><br></div><div style>I can&#39;t help with perl specifics, but when I process large files in python, I don&#39;t read the entire input file at once. I read, process and write one line at a time:</div>

<div style><br></div><div style>1. Assign input and output files to file handlers using open().</div><div style>2. Read one line from the input file, process it and write the results to the output file.  Repeat as necessary.</div>

<div style>3. Close the input and output files.</div><div style><br></div><div style>It takes less than an hour to process a large file (input file = 4.5 million rows, 2GB; output file size approximately 890MB) on a system with 2.9GHz processor and 4GB RAM running 32bit WinXP.  (They don&#39;t let me use Linux at work.)</div>

<div style><br></div></div></div></div>