any perl performance experts awake?

Lonni J Friedman netllama at gmail.com
Thu Apr 11 08:00:23 PDT 2013


On Wed, Apr 10, 2013 at 4:56 PM, Andrew Gould <andrewlylegould at gmail.com> wrote:
> On Wed, Apr 10, 2013 at 6:03 PM, Lonni J Friedman <netllama at gmail.com>
> wrote:
>>
>> I've got a perl script that is used to parse data from one format into
>> another format.  It works fairly well 99% of the time, however when
>> the data that its parsing is large, the performance of the script gets
>> awful.  Unfortunately, my perl skills are marginal at best, so I'm
>> lost on how to debug this problem.
>>
>> For example, for 99% of the cases, there are less than 1k rows of data
>> to parse, and it completes in less than 10 seconds.  However, for the
>> remaining 1%, there are over 150k rows, and the script takes hours
>> (3+) to finish.  I'm hoping that this is due to something inefficient
>> in my perl, that can be fixed easily, but I'm not sure what that might
>> be.
>>
>> The slow part of the script is this subroutine:
>> ######
>> sub sqlInsert {
>>     my ($fh, $app, $status, $entry, $table_testlist_csv_path,%hash_values)
>> = @_;
>>     my $now=strftime("%Y-%m-%d %H:%M:%S", localtime) ;
>>     my $entryVals = join(',', map { "\"$$entry{$_}\""} qw(suiteid
>> regressionCL cl os arch build_type branch gpu subtest osversion));
>>     my $testid = $hash_values{$app} ;
>>
>>     # we need to add an escape character in front of all double quotes
>> in a testname, or the dquotes will be stripped out when the SQL COPY
>> occurs
>>     $app =~ s/"/~"/g ;
>>     print $fh <<END;
>> "$now","$app","$status","$testid",$entryVals
>> END
>> }
>>
>> ########
>>
>> The perl process spikes the CPU at 100% while its running.  If I
>> strace the perl process while its chugging away, the output looks like
>> this:
>> ########
>> stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0
>> stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0
>> stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0
>> write(6, "ource/Dst_Offset11\",\"PASSED\",\"58"..., 4096) = 4096
>> stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0
>> stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0
>> stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0
>> stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0
>> stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0
>> stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0
>> stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0
>> stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0
>> stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0
>> stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0
>> stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0
>> stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0
>> stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0
>> stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0
>> stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0
>> stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0
>> stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0
>> stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0
>> stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0
>> stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0
>> stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0
>> stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0
>> stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0
>> stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0
>> stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0
>> stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0
>> stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0
>> stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0
>> stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0
>> stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0
>> stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0
>> stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0
>> stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0
>> stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0
>> stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0
>> stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0
>> stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0
>> stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0
>> stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0
>> stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0
>> stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0
>> stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0
>> stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0
>> stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0
>> stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0
>> stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0
>> stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0
>> stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0
>> stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0
>> stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0
>> stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0
>> stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0
>> stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0
>> stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0
>> write(6, "C3R_Tests/OrC_16u_C3R_Test/Sourc"..., 4096) = 4096
>> stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0
>> stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0
>> stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0
>> ########
>>
>> So for some crazy reason its polling /etc/localtime repeatedly, in
>> between writing out rows of data that have been parsed.  I have no
>> clue why its behaving this way.  I'm guessing that  maybe querying the
>> hash is the bottleneck?  Is there a better data structure to use?  Is
>> there some way that I can profile this stuff?
>>
>>
>> thanks
> Lonnie,
>
> Caveat:  I only know a little Python.
>
> Is sqlinsert run for each line?  If so, isn't localtime being accessed with
> each row?  Can you derive $now once, at the beginning of the script, and
> then pass the value to sqlinsert?  One side benefit of this would be that
> all of the rows processed in the same run would have the same $now, which
> you could use later to identify and remove the entire batch if needed.

Yes, that was definitely a problem, thanks for spotting it.  I've
moved $now outside of sqlInsert() such that its only computed once.
Unfortunately, that didn't make any difference in performance.  If I
strace the process, I no longer see thousands of /etc/localtime
references, and the only output is:
write(6, "ULL\",\"PASSED\",\"1240\",\"78383\",\"0\""..., 4096) = 4096
write(6, "ataExchangeAndInitialization/Cop"..., 4096) = 4096
write(6, "/Src_Offset4\",\"PASSED\",\"1277\",\"7"..., 4096) = 4096

which corresponds to the "print $fh" invocations.  Also the speed of
each "print $fh" is around once every 3 seconds for no obvious reason.
 So something else seems to be a bottleneck?


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
L. Friedman                                    netllama at gmail.com
LlamaLand                       https://netllama.linux-sxs.org


More information about the Linux-users mailing list