Category Archives: Big Data

Matters relating to the handling of large datasets

Arduino – making a simple BlueTooth data logger

Introduction – Arduino
Arduino_01Another area of informatics interest, here at Cranfield University is the use of the amazing Arduino microprocessor board for various projects. With the increasing emergence of the ‘Internet of Things’, ‘big data’ and machine to machine communication, the Arduino represents a great starting point for learning about this field.

What we wanted was to develop the basis for a simple data logger using an Arduino ‘Uno’, using a simple temperature and humidity sensor module, used to take readings that can be read off remotely with data retrieved via BlueTooth.

This post assumes you have already installed the Arduino IDE and are able to build and run programmes, or ‘sketches’. The Arduino site has an excellent Getting Started page if not.

Arduino_JY_MCU_BlueToothThe first thing is to get the BlueTooth working. For this we bought an inexpensive JY-MCU module from the website This unit has 4 pins, VCC voltage (3.3-6v); TX; RX; Gnd. Typically the other two connectors, State and Key, do not have pins soldered in.

The BlueTooth JY-MCU unit is advertised as being able to take power at either 3.3 or 5v. Many designs on the web for using this unit use resistors to split the voltage, but for this application we connected directly VCC to the Arduino 3.3v, and the Gnd to Gnd. The receive and transmit pins were connected respectively to digital pins 10 and 11. When hooked up with the software sketch below, the Bluetooth RX goes to the SoftwareSerial TX, and the BlueTooth TX to the SoftwareSerial RX respectively.

Configuring the BlueTooth module
Once connected, you can create a new sketch to allow communications. The first thing needed is configuration of the BlueTooth module settings – achieved by sending simple ‘AT’ commands to the unit. Byron’s Blog documents these codes really clearly, for example sending the module the command ‘AT+BAUD4’ sets its internal serial baud rate to 9,600bps. Note the device must be in an ‘unpaired’ state before these settings can be received.

A sketch can be set up to configure the module the way required, thus:

/* Include the software serial port library */
#include <SoftwareSerial.h>
/* to communicate with the Bluetooth module's TXD pin */
#define BT_SERIAL_TX 10
/* to communicate with the Bluetooth module's RXD pin */
#define BT_SERIAL_RX 11
/* Initialise the software serial port */
SoftwareSerial BluetoothSerial(BT_SERIAL_TX, BT_SERIAL_RX);

void setup() {
/* Set the baud rate for the hardware serial port */
/* Set the baud rate for the software serial port */

// Should respond with OK

// Should respond with its version

// Set pin to 1234

// Set the name to BLU

// Set baudrate from 9600 (default) to 57600
// * Note of warning * - many people report issues after increasing JY-MCU
// baud rate upwards from the default 9,600bps rate (e.g. 'AT+BAUD4')
// so you may want to leave this and not alter the speed!!


// Function to pass BlueTooth output through to serial port output
void waitForResponse() {
while (BluetoothSerial.available()) {

void loop() { }

Alternatively, by contrast to pre-programmed statements as above, Clinertech’s great sketch here allows AT values to be typed in and set interactively.

The warning in the code above highlights the potential pitfalls of changing the speed of the internal BlueTooth serial communication to be higher than the default 9,600bps. We found that 57,600bps worked OK (‘AT+BAUD7’) on our unit. However, note that once this change is made, the code connecting to the Software Serial port also needs its speed adjusting to the new rate selected.

Temperature and Humidity
Once the BlueTooth module is configured correctly, the next step is to introduce the temperature/humidity module to the Arduino Uno. For this, we used a ‘DHT-11‘ module. This has three pins, + (vcc), – (gnd) and signal (we connected this to Digital pin 12).

Note that a software library ‘TinyDHT’ was used to manage the communications with this sensor. The library is included in the code below in the same way the SoftwareSerial library is included.

// BT Data Logger
// BlueTooth Configuration
/* Include the software serial port library */
#include <SoftwareSerial.h>
/* to communicate with the Bluetooth module's TXD pin */
#define BT_SERIAL_TX 10
/* to communicate with the Bluetooth module's RXD pin */
#define BT_SERIAL_RX 11
/* Initialise the software serial port */
SoftwareSerial BluetoothSerial(BT_SERIAL_TX, BT_SERIAL_RX);

// DHT-11 Configuration
#include <TinyDHT.h> // lightweight DHT sensor library
// Uncomment whatever type sensor you are using!
#define DHTTYPE DHT11 // DHT 11
//#define DHTTYPE DHT22 // DHT 22 (AM2302)
//#define DHTTYPE DHT21 // DHT 21 (AM2301)
#define TEMPTYPE 0 // Use 0 for Celsius, 1 for Fahrenheit
#define DHTPIN 12
DHT dht(DHTPIN, DHTTYPE); // Define Temp Sensor

void setup() {
/* Set the baud rate for the software serial port */
BluetoothSerial.begin(57600); // Initialise BlueTooth
dht.begin(); // Initialize DHT Teperature Sensor
BluetoothSerial.print("Starting ...");

void loop() {
// Take readings
int8_t h = dht.readHumidity(); // Read humidity
int16_t t = dht.readTemperature(TEMPTYPE); // read temperature

if ( t == BAD_TEMP || h == BAD_HUM ) { // if error conditions (see TinyDHT.h)
} else {
BluetoothSerial.print("Temperature: ");
BluetoothSerial.print(", Humidity: ");

This code sets up the ‘software serial’ port to receive the output from the BlueTooth module. Readings of temperature and humidity are then taken and output constantly.

SoftwareSerialAccessing the data
To access the data being sent by the Arduino Uno, a few steps are required. First you need a computer with a BlueTooth capability. If your computer doesn’t have BlueTooth, inexpensive ‘USB BlueTooth dongles’ can be bought. By example, the instructions to do this using a MacBook are as follows: the laptop BlueTooth is turned ‘on’, and the ‘System preferences’ -> ‘Network’ dialogue opened. The ‘BlueTooth option is selected, and the Arduino module should then hopefully appear in the available devices list and can be selected and ‘paired’ (using the pairing number set earlier – e.g. the default being ‘1234’). Finally, with BlueTooth still connected and paired, the last step is to open a serial monitor window, connected to the BlueTooth port, which is then used for monitoring the BlueTooth Software Serial port and the data being generated. To achieve this last step, open the Arduino IDE. First select the menu ‘Tools’ -> ‘Port’, then select the BlueTooth device (with the name set earlier); finally select ‘Tools’ -> ‘Serial Monitor’. As long as the baud rate matches that of the BlueTooth module the data readings should be shown as here.

What next?
Having this all working is just the start of a bigger project. One option would next be to attach to the Arduino an SD card writer, to allow data to be saved locally, with BlueTooth then used to access the data periodically. Data could be time-stamped using a separate clock module, or geo-positioned with a GPS module.

Processing Temperature Humidity Data LoggerTo do something more useful with the data being received, there are also a number of options. The ‘Processing’ language is gaining interest (see, and can be used to extract data (perhaps re-formatted as a data stream), suitable for graphing or further analysis. Of interest, the Arduino IDE itself is a subset of Processing. One excellent example using Processing that we followed and adapted here is Bhatt’s P2_DHT11_Logger project.

A further development beyond this could be to write a mobile device ‘app’ to make the connection via the mobile device’s BlueTooth. Future posts here may develop on these themes.

‘DREAM’ – a new Centre for Doctoral Training in ‘Data, Risk And Environmental Analytical Methods’

top top numerical methods‘DREAM’ is a new Centre for Doctoral Training in ‘Data, Risk And Environmental Analytical Methods’, established between four leading Universities – Cranfield University, Newcastle University, the University of Cambridge, and the University of Birmingham, that over the next several years will support 30 PhD students undertake doctoral research seizing the opportunities in ‘big data’ and analytics, designing and implementing effective risk mitigation strategies across the environmental sciences for academe, industry, NGOs and government.

‘Big data’ refers to extremely large data sets, often sourced in real-time through sensors and networks. The term emerged describing the volume, variety and velocity of data sets produced by our use of the ubiquitous information devices around us, from mobile phones to software logs, and from unmanned aerial vehicles and satellite remote sensing devices to embedded sensor networks.

‘Big data’ presents significant challenges, being viewed as difficult to curate, collate, process, and analyse using conventional techniques, ‘big data’ presents significant challenges. As we learn to collect and interpret ‘big data’ intelligently and purposefully, we can deliver significant benefits for Governments and industry. This presents a major opportunity to use ‘big data’ effectively to better manage risks in complex environmental systems. Understanding how these systems become unstable and create unpredictable situations is a key challenge as the complexity and interactions of our networked world increase. Different informatics and analytical techniques in extracting knowledge are now required to understand environmental risks and inform decision making.

Example DREAM research will focus on themes such as the real-time assessment of the geohazard risks posed by flooding, drought, heatwaves and ground instabilities, and the mechanisms for mitigation and response through improved observation and monitoring technology, coupled forecasting and catastrophe modelling techniques, all aimed to underpin balanced decision support strategies. Research will consider the use of ‘big data’ in understanding systemic failures in critical infrastructure systems, drawing on novel sources of environmental data and multi-hazard assessment. The challenges of the changing global climate, and the threats posed by extreme meteorological events, exert direct challenges in managing the long-term protection of geobiophysical systems. Probabilistic modelling techniques, drawing on ensembles of climatic projections will inform new thinking about risk analysis and modelling, and particularly the human social impacts and consequences. The research will draw on a new generation of ‘big data’ computational, analytical and visualisation technologies to address these themes.

DREAM represents a pedigree consortium of four leading Universities, led by Cranfield University, with Newcastle University, the University of Cambridge, and the University of Birmingham – having deep expertise in environmental risk management and the application of intelligent technologies to large data sets. With the support of NERC, the universities have come together to train the next generation of risk specialists addressing the opportunities from ‘big data’, to support decision-making in industry, Government and beyond, and so releasing benefits for business and society.

DREAM postgraduates will span environmental risk scientists and informaticians, whose interdisciplinary research will encompass themes including spatial epidemiology, environmental geohazard assessment, offshore energy, climate change impacts, and geo-demographic enquiry: and whose expertise will maximise the untapped potential of fundamental data capture, assimilation and management, the operation of multi-sensor instrumented environments, real time data trapping, advanced analytics, and decision science.

Key to the DREAM consortium is the extensive partnership with Big Data and policy and risk sector institutions, including: EU-JRC, ESRI-UK, Atkins Global, BGS; CEFAS; CEH; Defra; OS; EA; Herbert Smith Freehill Lawyers; Marine Scotland; LAs; MMO; SNH; James Hutton Institute; and Landmark Information Group, permitting the research to address stakeholder-driven, challenging themes, and allowing talented researchers to develop skills to support insightful, high impact, industrially-relevant Doctorates.

Merry Christmas from all at Geothread

Once again, we’re preparing to wrap up for the year here at Cranfield University. Before we sign off, we’ll leave you with something that’s become a bit of a tradition with the Geothread team over the past couple of years; our Christmas Twitter map. Last year we mapped the spread of festive cheer across the country, according to Twitter users. Once again, we’ve collected a sample of 60,000 georeferenced tweets mentioning the word Christmas, along with a handful of other related keywords. These have been grouped by county and then normalised against a random sample of tweets taken earlier in the year to eliminate the effects of population density.

Comparing against the same map last year, it’s clear that there are several areas that are consistent in their anticipation of the festive season; Central Wales and North Yorkshire in particular. Anglesey and Conwy in North Wales certainly seem to be getting excited about things this year, whilst Cumbria and the Scottish Highlands don’t seem to be feeling the same level of enthusiasm that they did 12 months ago.

Christmas tweets 2014
Apologies to anyone we missed out in Ireland. Unfortunately the random sample of tweets we used for normalisation did not include coverage of all of the British Isles, which is why there are some holes in the final map.

As a bonus this year, we’ve included an interactive map of the raw data collected from twitter, for you to explore below.

Merry Christmas and a happy New Year from all at Geothread!

UK Soil Observatory wins Geospatial Excellence Award

AGI Award Winner

AGI Award Winner

Following on from the UK Soil Observatory’s recent nomination for a Geospatial Excellence Award, Cranfield is delighted to announce that the UKSO won the AGI award for Excellence with Impact. The award recognises projects which have achieved outstanding success or impact – whether this be within an organisation or at a local, national or international scale.

Commenting on the UKSO, the judges described it as “An ambitious project with huge potential as a spatial research resource for a range of fields including agriculture and geotechnical engineering”.

Cranfield University’s National Soil Resources Institute (NSRI) was pleased to play a part in the development of the UKSO, contributing several of its soil related datasets to the project. The UKSO draws together soils data from institutions such as the British Geological Survey (BGS), the James Hutton Institute (JHI) and the Agri-Food and Biosciences Institute (AFBI) and provides a unified starting point for accessing consolidated soil datasets via a series of interactive web maps and other web based resources. Further information on the UKSO is available on the project website.

UK Soil Observatory nominated for Geospatial Excellence award

UK Soil Observatory

UK Soil Observatory

Cranfield is pleased to announce that the UK Soil Observatory (UKSO) has been shortlisted by the Association for Geographic Information (AGI) in their upcoming 2014 Awards for Geospatial Excellence. The UKSO was nominated for the AGI Award for Excellence with Impact. AGI describe this award as recognising projects which have achieved outstanding success or impact, measured against societal, humanitarian, environmental or financial benchmarks.

Cranfield University’s National Soil Resources Institute (NSRI) was pleased to play a part in the development of the UKSO, contributing several of its soil related datasets to the project. The UKSO draws together soils data from institutions such as the British Geological Survey (BGS), the James Hutton Institute (JHI) and the Agri-Food and Biosciences Institute (AFBI) and provides a unified starting point for accessing consolidated soil datasets via a series of interactive web maps and other web based resources. Further information on the UKSO is available on the project website.

The AGI awards have been launched to mark the AGI’s 25th anniversary. They are due to take place on Tuesday 11th November 2014. Further details on the awards are available here.

Flood Risk Modelling of Rail Infrastructure

A recent MSc student group project, recently concluded at Cranfield University, Bedfordshire, and run on behalf of Network Rail, has investigated novel methodologies for integrated flood risk modelling of rail infrastructure.

Delays are costly for Network Rail. 2012/13 was the second wettest year in the UK national record and resulted in significant disruption to rail services and infrastructure. Some £136 million in compensation was paid to train operators in consideration of unplanned delays and cancellations in that year. Winter 2013/14 saw more challenging weather conditions and impacts on delays. In February 2014 the Department of Transport announced it would provide £31 million to fund rail resilience projects in the South West including the installation of rainfall, river flow and groundwater monitoring at key risk locations.

Flooding is a major contributor to rail delays. To help develop a proactive approach to flood risk assessment, a project was commissioned at Cranfield University to develop methods and tools to help Network Rail. The project was conducted by students from the Masters courses in both Environmental Informatics and Geographical Information Management. The project set out to address a number of key objectives. First, to evaluate existing flood risk assessment methods and flood models to identify techniques applicable to Network Rail’s infrastructure; second to develop approaches for flood risk modelling utilising datasets provided by Network Rail, as well as other available data within 3 selected study areas (fluvial, coastal and surface run-off); thirdly to implement the approach within a GIS framework; and fourthly to develop a web tool to enable visualisation of risk assessments by non-GIS experts.

Given the size and scale of Network Rail’s operations, it is unlikely that there is a single solution to predicting flood risk to Network Rail’s assets. However, this project saw the development and use of a data analytical technique from the world of ‘Big Data’, called CART, or ‘Classification and Regression Tree’. Use of CART ‘inference algorithms’ has helped ascertain the key contributory factors for helping explain the flooding events in the case study areas selected. CART profiles were used both to examine static ‘legacy’ data, as well as more dynamic time-series data. The use of these techniques has helped identify a customised data-oriented approach to flood risk modelling that shows considerable promise, and which could now potentially be extended to other parts of the network beyond the case study areas, as well as to other types of incident (for example, landslips or embankment failures). The approach adopted should be seen as complementary with traditional hydrological modelling approaches that would need to be undertaken for specific site requirements. However, further development of the data driven method, and a systematic approach to reviewing incidents and communicating flood risk to stakeholders, may provide further opportunities to reduce the costs of delays.
As the project concluded, a number of key recommendations emerged that would improve information used for strategic decision-making, as well as providing a platform for cost effective data driven flood risk mitigation. Firstly, the importance of clean and categorised incident data has become evident. Appropriate future mechanisms are therefore required to develop operational processes to ensure recording of new incidents capture and codify locations and, where known, the root causes of flooding. The data driven approach adopted in this study has delivered impressive and promising results, but further studies should now be undertaken to develop data driven prediction of asset flood risk further. Such work could commence, for example, with a target route network and use an iterative approach. Another outcome of the work has been in identifying the importance of adopting the means to visualise and communicate visually the modelling results. The web-based portal developed for dissemination of the flood risk profiles, flooding alerts and other data sources, direct from GIS, has proved a powerful means to communicate risk. Further to this, the project has also usefully trialled the use of 3D ‘virtual reality’ visualisation and projection techniques for analysing flood incidents, and educating stakeholders in improved flood risk management. The benefits of a range of software tools were evaluated. Overall, it is seen that the techniques and tools developed during this project can contribute usefully to managing the rail network and related national critical infrastructure.

Dr Stephen Hallett, whose students undertook the project, said: “This project has provided Network Rail with a powerful methodology for undertaking integrated flood risk assessment, made all the more timely after the recent extreme flooding events. The approach adopted highlights how a data-driven approach can help account for contributory factors to flooding, both proximal to the track, but also in the surrounding catchment areas, such as soil type, landuse, land cover and meteorological conditions.”

Student Project Leader: David Medcalf
Student Team Members: Usman Muhammad Buhari, David Cavero Montaner, Jose J. Cavero Montaner, Santiago Gamiz Tormo, Life Magobeya, Kerry Mazhindu-Page, Alan Yates.
Academic Supervisors: Dr Stephen Hallett (, Tim Brewer (

About Cranfield University
Cranfield University is a globally significant centre of expertise and enterprise in science, technology, engineering and management. The University is an exceptional environment for strongly business-engaged research and innovation and for postgraduate and post-experience education and training.
‘Environment’ is a key strategic theme at Cranfield. We have been contributing to the ‘green economy’ for over 40 years with deep expertise in environmental governance and sustainability, natural resource management, agriculture and land management, energy and the environment, environmental engineering for the treatment of water, wastes and contaminated soils and environmental health and food.