IOT Project – Using an ESP32 device to monitor a web service

There is a lot of interest in the Internet of Things here at Cranfield University. Especially now there is a new generation of super-cheap EPS8266 and ESP32 devices which can be deployed as IoT controllers. Many of these devices are now also available with in-built OLED screens – very helpful for showing messages and diagnostics.

We will use one of these devices to develop our project.

The project
The device
Coding the EPS32
Configuring the development environment
Connecting to the device
USB drivers
OLED Screen drivers
Configuring the Arduino IDE
Uploading the Source Code
The Source Code
– – Authorisation
In operation
Next steps

The project

Nowadays, web services are used for all sorts of applications – for providing access to data and functionality online. A useful application then for the EPS32 is for it to act as a monitor for a web service, repeatedly polling the service to see if it is operating correctly. If the web service goes down, we need to know – the EPS32 can keep an eye on the service and report any problems.

This project describes how an EPS32 device can be configured and programmed to monitor a web service. The web service we will monitor is developed (in node.js) with an API that includes a ‘current status’ call – if all is well calling this returns a success message which we can capture.

The device

For this project, we are using the TTGO-WiFi-Bluetooth-Battery-ESP32-Module-ESP32-0-96-inch-OLED-development-tool from Aliexpress, although these devices are widely available from many retailers. This particular model also has a battery holder for a long-use LIPO battery on the rear of the board.

Coding the EPS32

There are a few options for coding the ESP devices. Most easily, it is possible to use the Arduino development environment (with a few tweaks). Another possibility is using the Atom implementation at platformio: We used the Arduino tool.

Configuring the development environment

The Arduino development environment by default does not have the libraries and configurations to allow it to programme the EPS32. There are a few steps needed to enable this.

Connecting to the device

The EPS32 board has a micro-USB port, permitting connection to the programming computer. We used an Apple Mac laptop for programming – so a micro-USB to USB-C cable/convertor was required.

USB drivers

The EPS32 device needs a software driver installed on the programming computer. We used the Silicon Labs drivers available online here. There are other alternatives (some commercial) for drivers – notably the Mac-usb-serial drivers online.

OLED Screen drivers

The EPS32 device also has an in-built OLED screen. Although very small, at 128*64 pixels, this mono display is quite large enough to show text messages with different fonts, simple bitmap graphics, progress bars and drawing elements (lines, rectangles etc.) – amazing! However, a library is needed to allow access to this device. We used the ThingPulse OLED library.

The library can be downloaded as a zip file from GitHub to the library folder in the Arduino installation folder. On the Mac for example this is:


This library comes with lots of example programmes showing how to programme the screen, how to encode bitmaps, add progress bars, create fonts etc. From the code below, it can be seen that the Sketch ‘SSD1306SimpleDemo.ino‘ offers a great starting point for learning, referencing for example the ways described at Squix for encoding images and fonts.

Configuring the Arduino IDE

Configuring the Arduino IDEHaving installed these drivers and libraries, the Arduino IDE then needs to be configured.

To do this, the board was set as a device of type ‘ESP32 Arduino’ -> ‘ESP32 Dev Module’, the baud rate set to 115200. Having installed the USB driver above, the port could be set to ‘/dev/cu.SLAB_USBtoUART’.

Uploading the Source Code

The Arduino IDE allows one to compile and upload code to the device. Critically, one has to hold down the ‘Boot’ button on the device as the programme is uploaded (for a few seconds) to allow the device code to be uploaded. If the boot button is NOT held down, there will be errors reported and the code will not be uploaded! (this took ages to work out!!)

The Source Code

In coding the device in the Arduino development environment, one can refer usefully to the Arduino code reference. The final working code is shown below. Note the calls to the Serial monitor to allow debugging information to be shown while the device is connected to the computer.

// TTGO WiFi & Bluetooth Battery ESP32 Module - webservices checker
// Import required libraries
#include "Wire.h"
#include "OLEDDisplayFonts.h"
#include "OLEDDisplay.h"
#include "OLEDDisplayUi.h"
#include "SSD1306Wire.h"
#include "SSD1306.h"
#include "images.h"
#include "fonts.h"
#include "WiFi.h"
#include "WiFiUdp.h"
#include "WiFiClient.h"
// The built-in OLED is a 128*64 mono pixel display
// i2c address = 0x3c
// SDA = 5
// SCL = 4
SSD1306 display(0x3c, 5, 4);

// WiFi parameters
const char* ssid = "MYSSID";
const char* password = "MYWIFIKEY";

// Web service to check
const int httpPort = 80;
const char* host = "MYWEBSERVICE_HOSTNAME";

void setup() {
	// Initialize the display

	// Start Serial
	// Connect to WiFi
	display.drawString(0, 0, "Going online");
	display.drawXbm(34, 14, WiFi_Logo_width, WiFi_Logo_height, 			 WiFi_Logo_bits);
	WiFi.begin(ssid, password);
	while (WiFi.status() != WL_CONNECTED) {
	Serial.println("WiFi now connected at address");
	// Print the IP address

void loop() {
	Serial.print("\r\nConnecting to ");
	display.drawString(0, 0, "Check web service");
	Serial.println("Check web service");

	// Setup URI for GET request
	// if service is up ok, return string will contain: 'Service running'

	WiFiClient client;
	if (!client.connect(host, httpPort)) {
		Serial.println("Connection failed");
		display.drawString(0, 0, "Connection failed");

	client.print("GET " + url + " HTTP/1.1\r\n");
	client.print("Host: " + (String)host + "\r\n");
	// If authorisation is needed it can go here
	//client.print("Authorization: Basic AUTHORISATION_HASH_CODE\r\n");
	client.print("User-Agent: Arduino/1.0\r\n");
	client.print("Cache-Control: no-cache\r\n\r\n");

	Serial.print("GET " + url + " HTTP/1.1\r\n");
	Serial.print("Host: " + (String)host + "\r\n");
	// If authorisation is needed it can go here
	//Serial.print("Authorization: Basic AUTHORISATION_HASH_CODE\r\n");
	Serial.print("User-Agent: Arduino/1.0\r\n");
	Serial.print("Cache-Control: no-cache\r\n\r\n");

// Here's an alternative form if the service API uses HTTP POST
client.print("POST " + url + " HTTP/1.1\r\n");
client.print("Host: " + (String)host + "\r\n");
// If authorisation is needed it can go here
//client.print("Authorization: Basic AUTHORISATION_HASH_CODE\r\n");
client.print("User-Agent: Arduino/1.0\r\n");
client.print("Cache-Control: no-cache\r\n\r\n");

	// Read all the lines of the reply from server
	bool running = false;
	while (client.available()) {
		String line = client.readStringUntil('\r\n');
	 	if (line == "Service running") {
	 		running = true;
	if (running == true) {
		display.drawString(0, 25, "Service up OK");
	} else {
	 	display.drawString(0, 25, "Service DOWN");
		// Text/email administrator

// Here's some alternative methods to read web output
while (client.available()) {
	 char c =;
int c = '\0';
unsigned long startTime = millis();
unsigned long httpResponseTimeOut = 10000; // 10 sec
while (client.connected() && ((millis() - startTime) < 	 	 	 httpResponseTimeOut)) {
	 if (client.available()) {
	 	 c =;
	} else {

	Serial.println("Closing connection");
	display.drawString(0, 0, "Closing connection");
	// progress bar
	for (int i=1; i<=28; i++) {
	 	float progress = (float) i / 28 * 100;
	 	delay(500); // = all adds up to delay 14000 (14 sec)
		// draw percentage as String
	 	display.drawProgressBar(0, 32, 120, 10, (uint8_t) progress);
	 	display.drawString(64, 15, "Sleeping " + String((int) progress) + "%");
	 	Serial.print((int) progress);Serial.print(",");
	delay (1000);


Note that the connection to the service can use either HTTP GET or POST according to need (POST is considered a better approach). A further embellishment for security is if the web service uses authorisation (username and password to connect). If it does, then a hash of the combination of username and password can be passed in the header as shown in the code. To do this we use the excellent Postman tool. Postman allows one to manually create a connection conversation with an API server, including say a basic authorisation, and then view the full code of this – which can be copied into the Arduino code as shown above.

Note that it is critical to have a second carriage return at the end of the HTTP conversation (shown as ‘\r\n‘ in the code – so the last item has ‘\r\n\r\n’ for the blank line). Without this blank line it will not work!

In operation

Here is a short video of the device in operation. Excuse the use of image stabilisation – original video was filmed handheld.

The code is designed to open a connection, check on the status of the web service, then sleep for a period before repeating in an endless loop.

Next steps

This code currently only flashes up on the tiny screen when the service is found to be up or down. To be really useful, the tool should be able to alert one or more administrators – perhaps by push messaging to their mobile phones, or email.

The next stage can add this capability using approaches using Prowl, and Avviso. Perhaps the subject of a future blog posting.

Using SQLite on a Raspberry Pi

There is a lot of interest in the amazing Raspberry Pi 3 computer here at Cranfield University. Sometimes, an application we build – for example on the Raspberry Pi – needs to store data – for example we might wish to store data from sensors. For this we need a database. However, again sometimes the overhead of having a full-blown database server up and running is too much. We need a simple, localised way to store and retrieve data. Fortunately there is a great solution to this, ‘SQLite’ (

This brief tutorial shows how to setup and configure SQLite on a Raspberry Pi, and gives and example if it in use.

To start with, as with every time we use the Pi, we should update and upgrade the distribution – to ensure all the software within the Debian distribution are up-to-date. As described on the Raspberry Pi pages (, this is pretty straight forward.

First, update your system’s package list by entering the following command:

sudo apt-get update

Next, upgrade all your installed packages to their latest versions with the command:

sudo apt-get dist-upgrade

Once this is done, we are ready to install SQLite:

sudo apt-get install sqlite3

We will use the database to store data from within programming code – for example a software application that stores off sensor data. However, in the first instance, we can also run SQLite interactively to ‘create’, ‘read’, ‘update’ and ‘delete’ data directly in SQL tables. SQLite comes with a command line interpreter (CLI), that provides a command prompt to enter in commands. If we create a new database, we can try this out:

sqlite3 sensor_db

This starts up SQLite and will, at the same time, also create a new database – a single file, in this case called ‘sensor_db’ in your current folder.

Now we can create a database structure in our new database:

'temperature' REAL NOT NULL,
'humidity' REAL NOT NULL,
'datetime_int' INTEGER NOT NULL,
'sensor_id' INTEGER NOT NULL);

Note the semi-colon ending the statement. Now we can check that was created OK:

PRAGMA table_info([readings]);

Let’s place a couple of dummy data items (rows) into the table:

INSERT INTO readings (temperature, humidity, datetime_int, sensor_id) values (18.5,45.3,strftime('%s','now'),1);
INSERT INTO readings (temperature, humidity, datetime_int, sensor_id) values (19.4,42.8,strftime('%s','now'),1);

Now select the data to make sure it was stored correctly. Note we are using one of the two means of storing dates, here using integers not strings (see

SELECT temperature, humidity, datetime(datetime_int, 'unixepoch'), sensor_id FROM readings;

The data should be shown…

To quit the interactive mode:


SQLite is a powerful database solution for small applications. Programming it is just like its larger server-based equivalents. It is worth spending time reading the SQLite tutorials for further information ( Another good tutorial is

Google Earth 3D on the Oculus Rift

There is a lot of interest in the area of virtual reality and visualisation of synthetic environments here at Cranfield University. A few years ago now, Cranfield University was fortunate to receive support from the UK Natural Environment Research Council, NERC (Natural Environment Research Council) Big Data Capital Equipment Award (NE/LO12774/1) which provided for a state-of-the-art virtual reality suite comprising of a 3D projection system. This award included a 3D software package called Geovisionary from the company Virtalis.

Geovisionary offers a participatory experience in virtual reality; a back-projection system throws up images on a screen for a group of people to see together using 3D goggles. Geothread has a post about the use of this system and a video of it in use.

However, for a more immediate experience, a virtual reality headset is required. Our facility has now taken delivery of the amazing Oculus Rift environment. With the ‘Rift’, one wears a full headset with high-resolution stereo viewing screens, together with built in headphones. Orientation is achieved through a range of hard controllers, from the basic controller which is rather like a TV remote control, to a gaming Xbox controller, to the new 3D hand controllers which come in pairs and allow really intuitive hand gestures. These gestures can include actions such as picking up items and even throwing items (use of the retaining cord is advised!)

There are a wide range of apps available which use virtual reality, from the obvious games, to personal productivity tools, data visualisation and spatial data interaction.

Perhaps one of the most exhilarating experiences for those with cartographic interests is the port of the Google Earth app to the Oculus Rift. This places you apparently literally within 3D city landscapes, and in natural environments – with the most intuitive ability to zoom and fly around. A special mode enforces ‘human scale’ viewing – meaning you can ‘walk’ along streets, viewing the world around you really as if you were there – completely amazing!

Here are some screenshots of city scapes captured from the system to give an impression of the experience. Views are shown respectively of Milton Keynes, Manchester, Bristol, Peterborough and Birmingham.

This approach was recently used successfully to support the publication of a Smart City frameworks paper:

Sally P. Caird & Stephen H. Hallett (2018) Towards evaluation design for smart city development, Journal of Urban Design, DOI:

Exploring traffic times data

A recent investigation here at Cranfield University considered the sources of road journey traffic time data, and this blog recounts some of that investigation. First of all comes the sources of the data.

Highways England Data

Thanks to the fantastic open data revolution we now have a huge wealth of public data available via the portal. Here for example we can source data on traffic times from the Highways England agency. Their traffic times data can be obtained from

This data series provides average journey time, speed and traffic flow information for 15-minute periods since April 2009 on all morotways and ‘A’ roads managed by the Highways Agency, known as the Strategic Road Network, in England, with journey times and speeds estimated using a combination of sources, including Automatic Number Plate Recognition (ANPR) cameras, in-vehicle Global Positioning Systems (GPS) and inductive loops built into the road surface.

For example, we downloaded the CSV file: ‘Feb15.csv‘ relating to  February 2015 data. The first line of which by example reads:

LinkRef Link Description Date Time Period AverageJT Average Speed Data Quality Link Length Flow
AL215 A120 between A133 and A1232 (AL215) 2015-02-10 00:00:00 67 305.47 105.12 1 8.9200000762939453 286.50

This line of data relates to a stretch of road north of Colchester, UK on the A120. Key information here being that on 10th February 2015, for this c.9km stretch of road, it took 287 seconds (c4.8mins) to drive. The time of day is given as 67. This number is one of 96 15-minute intervals in the day that the data refers to (0-95 where 0 indicates 00:00 to 00:15). 67 is therefore 4:45:00 PM to 5:00:00 PM (see a useful table at the end of this article for working this out).

Google Traffic Data

Another useful source of data is from Google. The Google routing and traffic functions can be used by making a call to the Google ‘Distance Matrix’ API, described here:

Using the excellent ‘Postman‘ tool, We can formulate and test a REST call to the Google distancematrix API.{s{{H{ovD:&destinations=enc:g{t{HqtiE:&departure_time=now&traffic_model=best_guess&key=<API KEY>

Parameters for this API are as follows:
units = metric values (e.g. km)
origins = startint point (encoded)
destinations = finish point (encoded)
departure time = can’t be historical, ‘now’ = keyword
traffic model = best guess (not optimistic/pessimistic)
API key = the personal API

The parameters origins and  destinations hold locations in latitude and longitude. As an alternative to decimal degree values for these, there can be encoded values used in the URL. To encode loctions the polyline utility can be used: See

The resultant response to this REST call, made using Postman to send query off (GET), is:

    "destination_addresses": [
        "A120, Colchester CO7, UK"
    "origin_addresses": [
        "A120, Ardleigh, Colchester CO7, UK"
    "rows": [
            "elements": [
                    "distance": {
                        "text": "8.0 km",
                        "value": 7993
                    "duration": {
                        "text": "5 mins",
                        "value": 278
                    "duration_in_traffic": {
                        "text": "5 mins",
                        "value": 301
                    "status": "OK"
    "status": "OK"

Key information here being that at the time of making the call (‘now’), for this c.8km stretch of road, it took between 278 to 301 seconds (c4.6 to 5.0 mins) to drive. Key to this is the difference between the ‘duration’ and ‘duration_in_traffic’ values. Google note the allows you to ‘receive a route and trip duration (response field: duration_in_traffic) that take traffic conditions into account’. Note that ‘the departure_time must be set to the current time or some time in the future. It cannot be in the past’.

So in this way the Google approach allows a definition of the delays in drive time caused by traffic conditions. Although this cannot be determined retrospectively, a speculative future date can be selected whereby a prediction is made based on previous traffic conditions.


The table used to calculate the time period for the Highways England data, described above:

Period From To
0 12:00:00 AM 12:15:00 AM
1 12:15:00 AM 12:30:00 AM
2 12:30:00 AM 12:45:00 AM
3 12:45:00 AM 1:00:00 AM
4 1:00:00 AM 1:15:00 AM
5 1:15:00 AM 1:30:00 AM
6 1:30:00 AM 1:45:00 AM
7 1:45:00 AM 2:00:00 AM
8 2:00:00 AM 2:15:00 AM
9 2:15:00 AM 2:30:00 AM
10 2:30:00 AM 2:45:00 AM
11 2:45:00 AM 3:00:00 AM
12 3:00:00 AM 3:15:00 AM
13 3:15:00 AM 3:30:00 AM
14 3:30:00 AM 3:45:00 AM
15 3:45:00 AM 4:00:00 AM
16 4:00:00 AM 4:15:00 AM
17 4:15:00 AM 4:30:00 AM
18 4:30:00 AM 4:45:00 AM
19 4:45:00 AM 5:00:00 AM
20 5:00:00 AM 5:15:00 AM
21 5:15:00 AM 5:30:00 AM
22 5:30:00 AM 5:45:00 AM
23 5:45:00 AM 6:00:00 AM
24 6:00:00 AM 6:15:00 AM
25 6:15:00 AM 6:30:00 AM
26 6:30:00 AM 6:45:00 AM
27 6:45:00 AM 7:00:00 AM
28 7:00:00 AM 7:15:00 AM
29 7:15:00 AM 7:30:00 AM
30 7:30:00 AM 7:45:00 AM
31 7:45:00 AM 8:00:00 AM
32 8:00:00 AM 8:15:00 AM
33 8:15:00 AM 8:30:00 AM
34 8:30:00 AM 8:45:00 AM
35 8:45:00 AM 9:00:00 AM
36 9:00:00 AM 9:15:00 AM
37 9:15:00 AM 9:30:00 AM
38 9:30:00 AM 9:45:00 AM
39 9:45:00 AM 10:00:00 AM
40 10:00:00 AM 10:15:00 AM
41 10:15:00 AM 10:30:00 AM
42 10:30:00 AM 10:45:00 AM
43 10:45:00 AM 11:00:00 AM
44 11:00:00 AM 11:15:00 AM
45 11:15:00 AM 11:30:00 AM
46 11:30:00 AM 11:45:00 AM
47 11:45:00 AM 12:00:00 PM
48 12:00:00 PM 12:15:00 PM
49 12:15:00 PM 12:30:00 PM
50 12:30:00 PM 12:45:00 PM
51 12:45:00 PM 1:00:00 PM
52 1:00:00 PM 1:15:00 PM
53 1:15:00 PM 1:30:00 PM
54 1:30:00 PM 1:45:00 PM
55 1:45:00 PM 2:00:00 PM
56 2:00:00 PM 2:15:00 PM
57 2:15:00 PM 2:30:00 PM
58 2:30:00 PM 2:45:00 PM
59 2:45:00 PM 3:00:00 PM
60 3:00:00 PM 3:15:00 PM
61 3:15:00 PM 3:30:00 PM
62 3:30:00 PM 3:45:00 PM
63 3:45:00 PM 4:00:00 PM
64 4:00:00 PM 4:15:00 PM
65 4:15:00 PM 4:30:00 PM
66 4:30:00 PM 4:45:00 PM
67 4:45:00 PM 5:00:00 PM
68 5:00:00 PM 5:15:00 PM
69 5:15:00 PM 5:30:00 PM
70 5:30:00 PM 5:45:00 PM
71 5:45:00 PM 6:00:00 PM
72 6:00:00 PM 6:15:00 PM
73 6:15:00 PM 6:30:00 PM
74 6:30:00 PM 6:45:00 PM
75 6:45:00 PM 7:00:00 PM
76 7:00:00 PM 7:15:00 PM
77 7:15:00 PM 7:30:00 PM
78 7:30:00 PM 7:45:00 PM
79 7:45:00 PM 8:00:00 PM
80 8:00:00 PM 8:15:00 PM
81 8:15:00 PM 8:30:00 PM
82 8:30:00 PM 8:45:00 PM
83 8:45:00 PM 9:00:00 PM
84 9:00:00 PM 9:15:00 PM
85 9:15:00 PM 9:30:00 PM
86 9:30:00 PM 9:45:00 PM
87 9:45:00 PM 10:00:00 PM
88 10:00:00 PM 10:15:00 PM
89 10:15:00 PM 10:30:00 PM
90 10:30:00 PM 10:45:00 PM
91 10:45:00 PM 11:00:00 PM
92 11:00:00 PM 11:15:00 PM
93 11:15:00 PM 11:30:00 PM
94 11:30:00 PM 11:45:00 PM
95 11:45:00 PM 12:00:00 AM

Connecting a Raspberry Pi to eduroam wifi

Raspberry Pi connected to eduroam wifi

Connecting to eduroam

One of the things we want to do here at Cranfield University, is to connect Raspberry Pi computers to the University WiFi network – called ‘Eduroam’. The Eduroam system is used by all universities in the UK. However, the built in WiFi on board the new Raspberry Pi 3 doesn’t seem to connect to the Eduroam WiFi network on campus, out of the box. Clicking on the WiFi icon in the top right of the Pi’s desktop shows the eduroam network as a greyed out option. On the Cranfield campus, the local networks ‘Cranfield Web’ and ‘Cranfield Setup’ are both available to join. Here are some instructions on how to connect the Pi to the Eduroam WiFi network.

Select Cranfield Web/Setup, open a browser and go to the web address You may be prompted for a username and password, your Cranfield University username and password should be supplied here.

Once at, select Cranfield University from the list of institutions when prompted and then press the button titled ‘Linux’. This should download a small script which you should run. You can either run this by clicking the file once it has downloaded at the bottom of your browser or navigating to its location within your filesystem and running it from there.

If the script does not initially run (but simply opens in a text editor), you may need to add executable permissions to the downloaded script. Navigate to the location of your script in a terminal window and run the command:

sudo chmod u+x <filename>

When the scripts runs it will prompt you with some dialogue boxes that ask for your eduroam username and password. Your eduroam username should take the form <username> (where <username> is your regular Cranfield username) and you should use your regular Cranfield password.

The script will eventually tell you that it was unable to update your network settings, however the important config we need will be stored in a .cat_installer folder in your home directory. Using a terminal window, navigate to this folder:

cd .cat_installer

 You will see a file cat_installer.conf. The contents of this file need to be copied and pasted into the file wpa_supplicant.conf which resides in /etc/wpa_supplicant/. Navigate to this folder:

cd /etc/wpa_supplicant

You will need to edit the wpa_supplicant.conf file using superuser permissions. You can edit this file using the text editor nano:

sudo nano wpa_supplicant.conf

Using your right mouse button (ctrl+v doesn’t work here), paste in the config that you copied earlier at the bottom of the file.

Save this file (in nano, press ctrl+O), then exit (ctrl+x).

You will now need to restart the system:

sudo shutdown –r now

When the system boots back up to desktop again, you should now be connected to eduroam. Note, clicking on the wifi icon in the top right of the screen will show that you are connected to eduroam, but it will still be greyed out. This is fine.

Be aware that this configuration will have your password stored in plaintext in the wifi configuration file. It is, therefore, essential that access to your Pi is configured with a secure password as anyone able to the sudo command on the Pi will be able to read this file.  If you are planning for several people to have access to a terminal/ssh on your Pi, it is recommended you connect the device to the network via a wired ethernet connection.

Apache Spark, Zeppelin and geospatial big data processing

There is much interest here at Cranfield University in the use of Big Data tools, and with our parallel interests in all things geospatial, the question arises – how can Big Data tools process geospatial data?

In this blog, we investigate the use of Apache Spark, Apache Zeppelin and a couple of geospatial libraries. In an earlier blog, we set up Spark and Zeppelin, and now we extend this to use these additional tools. Note that this exercise is undertaken with a MacBook, although the instructions should work with Linux just as well.

There are few geospatial libraries for Big Data processing that work with Spark/Hadoop. Some of those that exist include the Hadoop offering from ESRI, Magellan, and GeoSpark.


To set up GeoSpark, we downloaded the library ‘geospark-0.3.2-spark-2.x.jar’ from and saved the file off locally, e.g. to


Next, in the Apache Spark installation ‘conf’ folder, we copied the template file ‘spark-defaults.conf.template’ to ‘spark-defaults.conf’ ready for editing – we need to tell Spark to use the GeoSpark jar library.

Now, we edited the conf configuration file to add the line at the end to reference the jar, e.g.

spark.jars /Users/sparkuser/spark/jars/geospark-0.3.2-spark-2.x.jar

Sourcing data

We need some spatial data for our test. We downloaded sample data files ‘zcta510-small.csv‘ and ‘arealm-small.csv‘ (online as above), to a local data location, e.g. /Users/sparkuser/spark/data/geospark.

The datasets take the following form:




The code

We now followed exactly the GeoSpark example tutorial code, in the Scala language.
First, we need to ensure the correct libraries are loaded and available:

import org.datasyslab.geospark.spatialOperator.RangeQuery
import org.datasyslab.geospark.spatialRDD.PointRDD
import org.datasyslab.geospark.spatialOperator.JoinQuery
import org.datasyslab.geospark.spatialRDD.RectangleRDD
import com.vividsolutions.jts.geom.Envelope
import org.datasyslab.geospark.spatialOperator.KNNQuery
import org.datasyslab.geospark.spatialRDD.PointRDD
import com.vividsolutions.jts.geom.Coordinate
import com.vividsolutions.jts.geom.GeometryFactory
import com.vividsolutions.jts.geom.Point

Now we can run the following code and observe the following:

// Start an example Spatial Range Query without Index
val queryEnvelope=new Envelope (-113.79,-109.73,32.99,35.08);
val objectRDD = new PointRDD(sc, "/Users/sparkuser/spark/data/geospark/arealm-small.csv", 0, "csv"); /* The O means spatial attribute starts at Column 0 */
val resultSize = RangeQuery.SpatialRangeQuery(objectRDD, queryEnvelope, 0).getRawPointRDD().count(); /* The O means consider a point only if it is fully covered by the query window when doing query */

queryEnvelope: com.vividsolutions.jts.geom.Envelope = Env[-113.79 : -109.73, 32.99 : 35.08]
objectRDD: org.datasyslab.geospark.spatialRDD.PointRDD = org.datasyslab.geospark.spatialRDD.PointRDD@52b8d9a6
resultSize: Long = 445

// Start an example Spatial Range Query with Index
val queryEnvelope=new Envelope (-113.79,-109.73,32.99,35.08);
val objectRDD = new PointRDD(sc, "/Users/sparkuser/spark/data/geospark/arealm-small.csv", 0, "csv"); /* The O means spatial attribute starts at Column 0 */
objectRDD.buildIndex("rtree"); /* Build R-Tree index */
val resultSize = RangeQuery.SpatialRangeQueryUsingIndex(objectRDD, queryEnvelope,0).getRawPointRDD().count(); /* The O means consider a point only if it is fully covered by the query window when doing query */

queryEnvelope: com.vividsolutions.jts.geom.Envelope = Env[-113.79 : -109.73, 32.99 : 35.08]
objectRDD: org.datasyslab.geospark.spatialRDD.PointRDD = org.datasyslab.geospark.spatialRDD.PointRDD@2c3e8ebf
resultSize: Long = 445

// Start an example Spatial KNN Query without Index
val fact=new GeometryFactory();
val queryPoint=fact.createPoint(new Coordinate(-109.73, 35.08));
val objectRDD = new PointRDD(sc, "/Users/sparkuser/spark/data/geospark/arealm-small.csv", 0, "csv"); /* The O means spatial attribute starts at Column 0 */
val resultSize = KNNQuery.SpatialKnnQuery(objectRDD, queryPoint, 5); /* The number 5 means 5 nearest neighbors */

fact: com.vividsolutions.jts.geom.GeometryFactory = com.vividsolutions.jts.geom.GeometryFactory@35f6b599
queryPoint: com.vividsolutions.jts.geom.Point = POINT (-109.73 35.08)
objectRDD: org.datasyslab.geospark.spatialRDD.PointRDD = org.datasyslab.geospark.spatialRDD.PointRDD@76d6439b
resultSize: java.util.List[com.vividsolutions.jts.geom.Point] = [POINT (-109.538914 35.123446), POINT (-108.729849 37.196678), POINT (-117.105253 33.48551), POINT (-120.679839 35.25764), POINT (-120.860368 35.398047)]

// Start an example Spatial KNN Query with Index
val fact=new GeometryFactory();
val queryPoint=fact.createPoint(new Coordinate(-109.73, 35.08));
val objectRDD = new PointRDD(sc, "/Users/sparkuser/spark/data/geospark/arealm-small.csv", 0, "csv"); /* The O means spatial attribute starts at Column 0 */
objectRDD.buildIndex("rtree"); /* Build R-Tree index */
val resultSize = KNNQuery.SpatialKnnQueryUsingIndex(objectRDD, queryPoint, 5); /* The number 5 means 5 nearest neighbors */

fact: com.vividsolutions.jts.geom.GeometryFactory = com.vividsolutions.jts.geom.GeometryFactory@24046396
queryPoint: com.vividsolutions.jts.geom.Point = POINT (-109.73 35.08)
objectRDD: org.datasyslab.geospark.spatialRDD.PointRDD = org.datasyslab.geospark.spatialRDD.PointRDD@6db7719d
resultSize: java.util.List[com.vividsolutions.jts.geom.Point] = [POINT (-109.538914 35.123446), POINT (-108.729849 37.196678), POINT (-108.135158 37.242491), POINT (-107.596572 37.000003), POINT (-107.79524 37.225479)]

// Start an example Spatial Join Query without Index
val objectRDD = new PointRDD(sc, "/Users/sparkuser/spark/data/geospark/arealm-small.csv", 0 ,"csv","rtree",4); /* The O means spatial attribute starts at Column 0, number 4 means 4 RDD partitions, "rtree" means use R-Tree Spatial Partitioning Grid */
val rectangleRDD = new RectangleRDD(sc, "/Users/sparkuser/spark/data/geospark/zcta510-small.csv", 0, "csv"); /* The O means spatial attribute starts at Column 0 */
val joinQuery = new JoinQuery(sc,objectRDD,rectangleRDD);
val resultSize = joinQuery.SpatialJoinQuery(objectRDD,rectangleRDD).count();
objectRDD.totalNumberOfRecords  /* see for API */

objectRDD: org.datasyslab.geospark.spatialRDD.PointRDD = org.datasyslab.geospark.spatialRDD.PointRDD@730e3723
rectangleRDD: org.datasyslab.geospark.spatialRDD.RectangleRDD = org.datasyslab.geospark.spatialRDD.RectangleRDD@2bf31c8c
joinQuery: org.datasyslab.geospark.spatialOperator.JoinQuery = org.datasyslab.geospark.spatialOperator.JoinQuery@36cecee7
resultSize: Long = 9989

// Start an example Spatial Join Query with Index
val objectRDD = new PointRDD(sc, "/Users/sparkuser/spark/data/geospark/arealm-small.csv", 0 ,"csv","rtree",4); /* The O means spatial attribute starts at Column 0, number 4 means 4 RDD partitions, "rtree" means use R-Tree Spatial Partitioning Grid */
val rectangleRDD = new RectangleRDD(sc, "/Users/sparkuser/spark/data/geospark/zcta510-small.csv", 0, "csv"); /* The O means spatial attribute starts at Column 0 */
val joinQuery = new JoinQuery(sc,objectRDD,rectangleRDD);
objectRDD.buildIndex("rtree"); /* Build R-Tree index */
val resultSize = joinQuery.SpatialJoinQueryUsingIndex(objectRDD,rectangleRDD).count();

objectRDD: org.datasyslab.geospark.spatialRDD.PointRDD = org.datasyslab.geospark.spatialRDD.PointRDD@1301fbdd
rectangleRDD: org.datasyslab.geospark.spatialRDD.RectangleRDD = org.datasyslab.geospark.spatialRDD.RectangleRDD@ebfb5e7
joinQuery: org.datasyslab.geospark.spatialOperator.JoinQuery = org.datasyslab.geospark.spatialOperator.JoinQuery@197ff4a6
resultSize: Long = 9989