Python is not a Magic Wand

8 Jun 2015

The other day I overheard some people talking about their favourite tools on their job as security professionals. Besides the question of which tool being “the best” is a ridiculous one (each tool is good for something), they reached consensus that Python was the most awesome tool they had in their arsenal. One of them even stated that Python was awesome in his day-job where he had to parse many logs. From my own experience, I want to show you that Perl is actually three times faster than Python when it comes to parsing (large) text files and searching for strings in them using regular expressions.

I used my MacBook with 4GB ram and 2.8GHz i7 cpu and an SSD to test these short scripts on. I created a 1GB text file containing only capital A’s and a ‘.’ every 1000th byte (so there are 1000000 dots in total). These scripts count how many dots there are in the 1GB file:

Perl (using no libaries, just built-in functions):

#!/bin/perl
open(my $fh, "<", "speedtest.txt");

$count = 0;
while ( <$fh> ) {
        $tmp = () = $_ =~ m/\./g;
        $count = $count + $tmp;
}

print $count . "\n";
close($fh);

Python (using pre-compiled regular expressions):

import re
file = open('speedtest.txt')

regex = re.compile(r'\.')
counter = 0

fh = open('speedtest.txt')
while True:
        line = fh.readline()
        if not line: break
        m = regex.search(line)
        if m:
                counter += 1
fh.close()
 
print 'Number of matches: %d' % counter

Results: Running the perl script first gives:

$ time perl count.pl
1000000

real    0m2.718s
user    0m2.348s
sys     0m0.268s

Python achieves a whopping:

$ time python count.py
Number of matches: 1000000

real    0m9.194s
user    0m8.798s
sys     0m0.381s

So this proves that Python (when using pre-compiled regular expressions which is the fastest way of performing a regex match in Python) is more than 3 times slower when it comes to searching text files. So use perl the next time you parse logs ;-)

Max Duijsens About computer security and other hobbies

Python is not a Magic Wand

Max Duijsens
About computer security and other hobbies