<div dir="ltr">On Wed, Apr 10, 2013 at 6:03 PM, Lonni J Friedman <span dir="ltr"><<a href="mailto:netllama@gmail.com" target="_blank">netllama@gmail.com</a>></span> wrote:<br><div class="gmail_extra"><div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">I've got a perl script that is used to parse data from one format into<br>
another format. It works fairly well 99% of the time, however when<br>
the data that its parsing is large, the performance of the script gets<br>
awful. Unfortunately, my perl skills are marginal at best, so I'm<br>
lost on how to debug this problem.<br>
<br>
For example, for 99% of the cases, there are less than 1k rows of data<br>
to parse, and it completes in less than 10 seconds. However, for the<br>
remaining 1%, there are over 150k rows, and the script takes hours<br>
(3+) to finish. I'm hoping that this is due to something inefficient<br>
in my perl, that can be fixed easily, but I'm not sure what that might<br>
be.<br>
<br>
The slow part of the script is this subroutine:<br>
######<br>
sub sqlInsert {<br>
my ($fh, $app, $status, $entry, $table_testlist_csv_path,%hash_values) = @_;<br>
my $now=strftime("%Y-%m-%d %H:%M:%S", localtime) ;<br>
my $entryVals = join(',', map { "\"$$entry{$_}\""} qw(suiteid<br>
regressionCL cl os arch build_type branch gpu subtest osversion));<br>
my $testid = $hash_values{$app} ;<br>
<br>
# we need to add an escape character in front of all double quotes<br>
in a testname, or the dquotes will be stripped out when the SQL COPY<br>
occurs<br>
$app =~ s/"/~"/g ;<br>
print $fh <<END;<br>
"$now","$app","$status","$testid",$entryVals<br>
END<br>
}<br>
<br>
########<br>
<br>
The perl process spikes the CPU at 100% while its running. If I<br>
strace the perl process while its chugging away, the output looks like<br>
this:<br>
########<br>
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0<br>
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0<br>
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0<br>
write(6, "ource/Dst_Offset11\",\"PASSED\",\"58"..., 4096) = 4096<br>
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0<br>
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0<br>
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0<br>
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0<br>
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0<br>
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0<br>
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0<br>
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0<br>
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0<br>
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0<br>
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0<br>
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0<br>
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0<br>
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0<br>
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0<br>
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0<br>
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0<br>
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0<br>
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0<br>
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0<br>
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0<br>
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0<br>
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0<br>
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0<br>
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0<br>
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0<br>
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0<br>
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0<br>
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0<br>
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0<br>
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0<br>
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0<br>
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0<br>
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0<br>
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0<br>
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0<br>
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0<br>
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0<br>
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0<br>
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0<br>
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0<br>
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0<br>
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0<br>
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0<br>
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0<br>
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0<br>
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0<br>
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0<br>
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0<br>
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0<br>
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0<br>
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0<br>
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0<br>
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0<br>
write(6, "C3R_Tests/OrC_16u_C3R_Test/Sourc"..., 4096) = 4096<br>
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0<br>
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0<br>
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0<br>
########<br>
<br>
So for some crazy reason its polling /etc/localtime repeatedly, in<br>
between writing out rows of data that have been parsed. I have no<br>
clue why its behaving this way. I'm guessing that maybe querying the<br>
hash is the bottleneck? Is there a better data structure to use? Is<br>
there some way that I can profile this stuff?<br>
<br>
<br>
thanks<br>
_______________________________________________<br>
Linux-users mailing list<br>
<a href="mailto:Linux-users@linux-sxs.org">Linux-users@linux-sxs.org</a><br>
<a href="http://mailman.celestial.com/mailman/listinfo/linux-users" target="_blank">http://mailman.celestial.com/mailman/listinfo/linux-users</a><br>
</blockquote></div><br></div><div class="gmail_extra" style>Lonnie,</div><div class="gmail_extra" style><br></div><div class="gmail_extra" style>Caveat: I only know a little Python.</div><div class="gmail_extra" style><br>
</div><div class="gmail_extra" style>Is sqlinsert run for each line? If so, isn't localtime being accessed with each row? Can you derive $now once, at the beginning of the script, and then pass the value to sqlinsert? One side benefit of this would be that all of the rows processed in the same run would have the same $now, which you could use later to identify and remove the entire batch if needed.</div>
<div class="gmail_extra"><br></div><div class="gmail_extra" style>Best of luck,</div><div class="gmail_extra" style><br></div><div class="gmail_extra" style>Andrew</div><div class="gmail_extra"><br></div></div>