<div dir="ltr">On Thu, Apr 11, 2013 at 11:36 AM, Lonni J Friedman <span dir="ltr"><<a href="mailto:netllama@gmail.com" target="_blank">netllama@gmail.com</a>></span> wrote:<br><div class="gmail_extra"><div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5">On Thu, Apr 11, 2013 at 9:32 AM, Andrew Gould <<a href="mailto:andrewlylegould@gmail.com">andrewlylegould@gmail.com</a>> wrote:<br>
><br>
> On Thu, Apr 11, 2013 at 11:13 AM, Lonni J Friedman <<a href="mailto:netllama@gmail.com">netllama@gmail.com</a>><br>
> wrote:<br>
>><br>
>><br>
>> ><br>
>> > Does the script read in an entire data source file and parse each line?<br>
>> > Or<br>
>> > does is read one line at a time and parse/write it prior to reading the<br>
>> > next<br>
>> > line? If the entire source file is being read into memory, could it be<br>
>> > causing a bottleneck?<br>
>><br>
>> The script reads in an entire data source file, parsing line by line,<br>
>> putting the data into a hash (%hash_values). Once that is completed,<br>
>> the hash is passed to sqlInsert(). So everything is already read into<br>
>> memory at the point in time when performance tanks. I'd expect that<br>
>> this would be the fast path, since its never needed to read from disk.<br>
>> All of my systems have 2+GB RAM, and the data in question is always<br>
>> less than 30MB, so I can't imagine that this would be a swap issue, if<br>
>> that's what you mean? Unless querying a key/value pair in a hash is<br>
>> not a good performance path in perl?<br>
>> _______________________________________________<br>
><br>
><br>
><br>
> The script is holding the input file (>150k rows?) and the hash in memory<br>
> while it's reformatting the data and performing sqlinsert(). I was<br>
> wondering whether the combination of processing and RAM utilization could be<br>
> causing the slowdown.<br>
<br>
</div></div>Yes, that's how its behaving. Is there a better way to do this in perl?<br>
<div class="HOEnZb"><div class="h5">_______________________________________________<br></div></div></blockquote><div><br></div><div style>I can't help with perl specifics, but when I process large files in python, I don't read the entire input file at once. I read, process and write one line at a time:</div>
<div style><br></div><div style>1. Assign input and output files to file handlers using open().</div><div style>2. Read one line from the input file, process it and write the results to the output file. Repeat as necessary.</div>
<div style>3. Close the input and output files.</div><div style><br></div><div style>It takes less than an hour to process a large file (input file = 4.5 million rows, 2GB; output file size approximately 890MB) on a system with 2.9GHz processor and 4GB RAM running 32bit WinXP. (They don't let me use Linux at work.)</div>
<div style><br></div></div></div></div>