any perl performance experts awake?
Bill Campbell
linux-sxs at celestial.com
Wed Apr 10 17:15:01 PDT 2013
On Wed, Apr 10, 2013, Lonni J Friedman wrote:
>I've got a perl script that is used to parse data from one format into
>another format. It works fairly well 99% of the time, however when
>the data that its parsing is large, the performance of the script gets
>awful. Unfortunately, my perl skills are marginal at best, so I'm
>lost on how to debug this problem.
>
>For example, for 99% of the cases, there are less than 1k rows of data
>to parse, and it completes in less than 10 seconds. However, for the
>remaining 1%, there are over 150k rows, and the script takes hours
>(3+) to finish. I'm hoping that this is due to something inefficient
>in my perl, that can be fixed easily, but I'm not sure what that might
>be.
>
>The slow part of the script is this subroutine:
>######
>sub sqlInsert {
> my ($fh, $app, $status, $entry, $table_testlist_csv_path,%hash_values) = @_;
> my $now=strftime("%Y-%m-%d %H:%M:%S", localtime) ;
> my $entryVals = join(',', map { "\"$$entry{$_}\""} qw(suiteid
>regressionCL cl os arch build_type branch gpu subtest osversion));
> my $testid = $hash_values{$app} ;
>
> # we need to add an escape character in front of all double quotes
>in a testname, or the dquotes will be stripped out when the SQL COPY
>occurs
> $app =~ s/"/~"/g ;
> print $fh <<END;
>"$now","$app","$status","$testid",$entryVals
>END
>}
Somebody has already pointed out the use of the strftime/localtime
for every iteration. This reminds me of my first programming
experience in FORTRAN almost 50 years ago where the "I'm an
Engineer, not a Programmer and Proud of It" person who wrote the
program computed the square root of PI/2.0 every time in a
subroutine that was called over 20,000 times per run. I
calculated it once, put it in COMMON, and cut the run time from
30 minutes to 5 minutes.
There are a few things you might do to improve this.
+ Use one of the database interfaces (DBI) available in Perl to connect
to the database. It's been quite a while since I did this as I'm
primarily doing Python these days so I don't remember the details.
The DBI libraries typically have facilities to properly quote as
necessary.
+ I think that most SQL databases have a now() function that will get
the current time, and that would probably be much more efficient than
doing it externally. I have a link to the PostgreSQL page on this
here.
http://www.postgresql.org/docs/8.2/static/functions-datetime.html
+ If the SQL back end has stored procedures, it might be most efficient
to have one handle the time automatically on insert.
Bill
--
INTERNET: bill at celestial.com Bill Campbell; Celestial Software LLC
URL: http://www.celestial.com/ PO Box 820; 6641 E. Mercer Way
Voice: (206) 236-1676 Mercer Island, WA 98040-0820
Fax: (206) 232-9186 Skype: jwccsllc (206) 855-5792
Property must be secured, or liberty cannot exist. -- John Adams
More information about the Linux-users
mailing list