Sunday, November 20, 2016

Confusing presidential ballot

I am lucky enough to have an absentee ballot in US presidential elections. Of course I knew that Clinton and Trump were battling it out for the Democrats and Republicans. I vaguely heard that there were additional candidates for the Libertarian and Green parties, so I was expecting to see 4 names on the ballot, However, when I got my absentee ballot, I was surprised to see 8 candidates that I could vote for, Closer scrutiny revealed that many of the candidates were listed more than once.

Hillary Clinton is the candidate for the Democratic Party, but is also listed separately as the candidate for the Women's Equality Party and the candidate for the Working Families Party (neither of which I heard of before). Likewise Donald Trump is listed as the Republican Party candidate and separately listed as the candidate for the Conservative Party (which I thought was based in the UK).

I had vaguely heard that Gary Johnson was running for the Libertarian candidate, but the ballot also lists him as the candidate for the Independence Party. Poor Jil Stein of the Green Party is the only candidate who has to make do with a single listing on the ballot.

What I am not sure of is if it makes any difference if I tick the box beside the Democratic Party or the box beside the Working Families Party? Could it be that Hillary and/or Donald are splitting their vote or do the counters amalgamate both sets of ticks?

Tuesday, November 8, 2016

My weather forecast tracking project.on BlueMix

In Ireland people are obsessed about the accuracy (or otherwise) of weather forecasts. This is why I previously did a small project to compare the accuracy of a three different online weather forecast services for Ireland. This project was limited and, only looked at Ireland over a short time period. However, because the way I collected data was only partially automated, it was impractical to extend it without a radical change.

IBM has recently launched the BlueMix cloud platform with lots of cool features which make it ideal to do what I wanted i.e. collect, store and process weather data. Therefore I decided to re-implement my earlier idea with a much wider scope.

There are lots of services available on the internet for current weather data and forecasts. As I mentioned before, there are even some which offer historic weather data, but none which offer access to historic forecasts - i.e. they will tell you what they predict tomorrow's weather is going to be, but they won't tell you what they predicted yesterday for today's weather. Therefore I need to collect and store the weather forecasts so I can later analyse them.

There are so many different services offering weather forecasts on the internet that it was hard to decide which ones to use. I picked the services to use mainly based upon the ease of use of their API:
I started by collecting data for cities with which I had some association, but this collection didn't cover the globe very well. So I then added in a few random cities which increased my geographic and climate type distribution. (e.g. Antarctica covered an extreme in both directions).

The current list of cities monitored are:
  • Antarctica (not really a city, but I picked a point beside the south pole).
  • Canterbury, New Zealand
  • Cape Town, South Africa
  • Coral Springs, Florida, USA
  • Clemmons, North Carolina, USA
  • Dublin, Ireland
  • Galway, Ireland
  • Honolulu, Hawaii, USA
  • Isle of Wight, UK
  • Lanzarote, Canary Islands, Spain
  • Luxor, Egypt
  • Perth, Western Australia
  • Rio de Janeiro, Brazil
I know that most of these services update their forecasts several times a day, but I decided to make things simple by collecting weather data and forecasts once a day at midday. Working out midday in local time would be complex, especially since some of the cities implement daylight savings time at different times of the year and in different directions. Instead I decided to use the concept of solar midday - i.e. the time when the sun is highest at the city. This had the advantage of spreading out the sampling so that the system is not overloaded by taking many samples at the same time - e.g. the Isle of Wight has a solar midday at 5 minutes past 12 UTC, while Lanzarote has a solar midday of 54 minutes past 12 UTC despite the fact that the two locations use the exact same local time.

All of the weather services I used allow me to specify the latitude and longitude of where I want data or a forecast for. For larger cities I have a reasonably wide choice of where exactly within the city boundary I specify. However, the services don't have enough weather stations to provide coverage of everywhere. When I request data about a specific location the services will instead return information about the closest location that they have information about. This possibly explains some of the differenced between the services.

I added Antarctica to the list of cities so that I would have an extremely cold climate to compare with warmer locations like Luxor, Egypt. In fact, Antarctica  is not a city as such: there are no cities in such an inhospitable place. Unsurprisingly, not all of the weather services provide data or forecasts for Antarctica. Only OpenWeatherMap and DarkSky provide weather reports for Antarctica, but surprisingly BlueMix does also provide forecasts for here.

Technology Overview

As a computer geek, I am naturally most interested in the inner workings of how the data gets collected and processed. Therefore I decided to include a brief overview here, with the detail saved for a later post. However, if you are only interested in weather then jump to the next section which describes how to access the data.
  • As I mentioned, the project used the Bluemix platform. More specifically, I used the "Internet-of-Things (IOT) template" which provided me with the Node-Red programming environment and a Cloudant NoSQL database service to store my data. This was my first significant application to develop with Node-Red, but I had no problems getting to grips to this programming environment. It was also my first significant use of a NoSQL database. I created a database for the forecasts, but I also created additional ones to store logs of the various steps in the process.  Putting data into the database was really easy, but getting it back out again was not so easy - on a few occasions I was tempted to use a traditional SQL database.
  • All of the weather services report the weather data in different formats and so I needed to extract the data into a common format so they could be compared. This inevitably meant that I needed to discard some interesting information which is only supplied by one service. An example of the type of record I store is shown below:
    {
      "provider": "darksky",
      "request_date": "2016-11-01T00:32:00.980Z",
      "city": "Canterbury",
      "days_in_advance": 7,
      "temp_max": 18.26,
      "temp_min": 7.54,
      "temp": 18.26,
      "pressure": 997.09,
      "humidity": 68,
      "wind_speed": 11.62,
      "rain": true,
      "forecast_date": "2016-11-08T00:32:00.980Z"
    }
    
    The sample shown is a forecast from the darksky service for the city of Canterbury, New Zealand. The forecasts was retrieved shortly after midnight GMT (which is close to midday locally) on the 1st of November, Since the forecasts is for 7 days in advance, it is a prediction of what weather would be like on 8th of November. Most of the fields are self explanatory, but it should be noted that the precision is different from service to service. DarkSky gives 2 decimal places of accuracy for its temperature prediction while most give only 1 decimal place or even round them to the nearest degree centigrade. The value "rain": true implies that they predicted it would be raining on 8th of November. However, the value of the rain field currently comes with a serious health warning because I don't have a good grasp of the different ways that the various services report on rain. This will be improved in the future.
  • The example I gave is of a forecast. A historical weather reading is similar with the only difference being that it won't contain days_in_advance or forecast_date fields.
  • Most of the effort went into collecting the data, but I also needed to provide a way to get the data back out again. I implemented a simple web page (described below) which contains a Google chart widget and some buttons to control what data is displayed on the chart. In addition to this web page (accessed by the /ui url) I also needed to implement 2 other urls. The /data url is a service that gets called by the /ui page which returns raw weather data in JSON format. In addition I implemented the /check url which counts the number of weather reports and forecasts which have been retrieved today for each city. This url is used by a simple script that executes on a machine running on the Amazon cloud which sends me an email if the count doesn't match what is expected. I thought it was important to have this running outside of BlueMix because it is better not to have a service monitoring itself.

How to Access the Data

The simple instruction is to go to http://bodonovan.mybluemix.net/ui and if
 I did a reasonable job you should be able to figure it out yourself. So just go ahead and try it, the description here is just in case you get stuck.

The bulk of the page comprises a google chart widget with a line chart for what data you have chosen.  Below it there are a number of controls to select data for charting.

  • You must specify the date range you want to chart by means of a to and from date fields. If you view the page in Google Chrome or Internet Explorer browser, these fields will have a fancy date picker associated with them, but if you use Firefox they act like simple text entry fields. You can specify dates as far back as August 2016 which is when I started collecting data, but the data is very patchy before October 2016 when I implemented the full range of providers and cities.
  • There are drop downs to specify which provider you want data from, what city you are interested in and what feature you want t o chart. The features supported are temperature, pressure, humidity, wind speed and wind direction.In the future, I hope to support rainfall, but this is not working yet because of the different ways that the different services report rainfall. You also have an option to select actual data or forecasts. If you select forecasts, you also must specify how many days in advance you want the forecast for (not all service providers provide forecasts as many days in advance)
  • When you have made all of the selections, you can click on the "load data" button to add the relevant line to the chart. You can then change your selections and add more lines to compare anything you want. For example you might want to compare actual data for your city to what the forecast was 1/5/7 days in advance. You can also compare what the different providers are forecasting to see if there is consensus - or you can compare the actual data they report to see if they even agree on this. You can even go crazy and compare temperature in one city with pressure in another city, but comparing different feature is unlikely to be satisfactory because I don't support rescaling the different measurement scales.
  • If the database is missing data about some of the days you requested, it will draw a line with gaps. However, if select something which we have no data about (e.g. asking for a 7 day forecast from a service which only supports 5 days) you with get a "no data" warning.
  • With so many possibilities it is inevitable that you will make some mistakes. Therefore there is a "remove data" button to remove the most recently added line from the chart. There is also a "clear chart" button which removed all lines from the chart.
  • When you have interesting data displayed, there is an "export data" button which allows you to export the displayed readings to a CSV file so you can analyse them further with other tools.

Future work

  • The UI for accessing the data is very basic so I intend to implement an improved UI soon.
  • The check service regularly informs me that certain services have failed to report data for a particular city, but I have no idea why. If I implement better error tracking and logging I should be able to reduce the error rate.
  • From an Irish perspective, one of the most important weather features is rain. Therefore, I intend to improve the way rain data and forecasts are handled.
  • I reckon that there is a reasonably wide spread of cities and weather forecast services supported, but I will add more if there is a demand.
  • Last, but not least, I hope to get some real users so I can learn how the service can be useful.