Monthly Archives: November 2018

Perl vs Python

Here at Cranfield University we work a lot with data in our GIS and data-related teaching and research. A common challenge is in transferring a complex dataset that is in one format into another format to make it useable. Many times there are tools we can use to help in that manipulation, both proprietary and open source. For the spatial datasets we often work with, we can use the range of data convertors in ArcGIS and QGIS, we can use the fantastic ‘Feature Manipulation Engine’ (FME) from Safe inc., or its manifestation in ArcGIS – the data interoperability tool, then again we can look to libraries such as the Geospatial Data Abstraction Library (gdal) for scripted functionality. As ever in computing, there are many ways of achieving our objectives.

However, sometimes there is nothing for it but to hack away in a favourite programming scripting language to make the conversion. Traditionally we used the wonderfully eclectic ‘Perl‘ language (pathologically eclectic rubbish lister – look it up!!) More recently the emphasis has perhaps shifted to Python as the language of choice. Certainly, if we are asked by our students which general purpose programming language to use for data manipulation, we advise Python is the one to have experience with on the CV.

If we have a simple data challenge, for example, we might want to convert an ASCII text file with data in one format to another format and write it out to a new file. We might want to go say from a file in this format (in ‘input.csv’):

AL1 1AG,1039499.00,0
AL1 1AG,383009.76,10251

To this format (in ‘output.csv’) …

UK,Item 1,R,AL1 1AG,,,,,,,,,,1039499.00,0,,,,
UK,Item 2,R,AL1 1AG,,,,,,,,,,383009.76,10251,,,,

For this Perl is a great solution – integration the strengths of awk and sed. Perl can produce code which quickly chomps through huge data files. One has to be careful as to how the code is developed, to ensure its readability. Sometimes, coming back to a piece of code one can struggle to remember how it works for a while – and this is especially so where the code is highly compacted.

#!/usr/bin/env perl
# Call as ‘perl <in_file> > <out_file>‘
# e.g. perl input.csv > output.csv
use Text::CSV;
my $csv = Text::CSV->new({sep_char => ',' });
while (<>) {
  if ($csv->parse($_)) {
    my @fields = $csv->fields();
    printf("UK,Item %d,R,%s,,,,,,,,,,%s,%s,,,,\n",$j++,@fields[0],@fields[1],@fields[2]);

The equivalent task in Python is equally simple, and perhaps a little more readable…

#!/usr/bin/env python
# python3 code
# Call as 'python3'
import csv
o = open('output.csv','w')
with open('input.csv', 'r') as f:
   reader = csv.reader(f)
   mylist = list(reader)
j = 0
for row in mylist:
   o.write('UK,Item {:d},R,{:s},,,,,,,,,{:s},{:s},,,,\n'.format(j, row[0], row[1], row[2]$

Note the code above is Python3 not Python2. Like Perl (with cpan), Python is extensible (with pip) – and in fact one really needs to use extensions (modules, or imported libraries) to get the most out of it (and to help prevent you needing to reinvent the wheel and introducing unnecessary errors). There is no need to write lots of code for handling CSV files for example – the csv library above does this very efficiently in Python. Likewise, if say we want to write data back out to JSON (JavaScript Object Notation format), again the json library can come to the rescue:

import csv
import json
jsonfile = open('/folderlocation/output.json', 'w')
with open('/folderlocation/input.csv', newline='', encoding='utf-8-sig') as csvfile:
    reader = csv.reader(csvfile)
    for row in reader:
        print(', '.join(row))
        json.dump(row, jsonfile)

There is probably not really a lot in the difference between the two languages – it all rather depends on ones preferences. However, for GIS professionals, Python expertise is a must as it is adopted as the scripting language of choice in ArcGIS (in fact even being shipped with ArcGIS). Other alternatives exist of course for these sorts of tasks – ‘R‘ is one that comes to mind – again being equally extensible.