Clem Henrikson, senior marketing analyst of the Environmental Research Social Institute (ESRI), the GIS market leader states:
“When you use a computer to map you need data because if the user doesn’t have data the computer won’t have anything to show, it’s such an obvious statement that t so often gets overlooked.”
“In the 21st Century we are striving to see more and more data appear and in the case of geospatial data getting the data becomes easier and easier.”
So what data can be mapped by a computer?
Essentially there are two types of geospatial data which are layered on top of each other:
Raster data is the base mapping and imagery tiles made of pixels like a television screen and commonly comes as a geotiff.
Vector data which is geo-referenced points, lines and polygons that represent features on the map like roads or events and usually comes in a shape ile (shp.) format or as a comma separated value (csv.) spreadsheet with co-ordinate data attached that can be created into a shapefile by the GIS.
These types of data will have the third type of geo-spatial data; attribute data in the background that can be interrogated by the GIS when it performs spatial analysis.
This data is expensive, complex and time-consuming to create and the majority was created by governmental agencies such as the Ordnance Survey in the UK and the Census Bureau in the US.
David Donald is data editor at the Centre of Public Integrity and world’s foremost authority in data journalism methods. He believes that in the past governments were reluctant to release this data.
“They feel the need to recoup the costs and so GIS data has often had a heavier price tag put on it by governments,” he said.
Another important source of data is that these days most GIS can geo-code tabular data that comes in spreadsheets with no geo-referencing.
A csv. spread sheet file can be joined with a geo-referenced vector file if it has a similar column in its attribute field similar to how Google Fusion Tables works.
This has open up a lot of public data to the GIS user as this data is a lot more readily available and in greater supply. A public authority boundary shape file from the Office of National Statistics will be able to geo-code a lot of open source data tabular especially as public bodies work with standard area codes.
However this source of data can often lack the precision and attribute data of properly created GIS data. Additionally the process of geo-coding can be inherent with errors and there is little way of checking the geographical integrity of the data.
Also the emergence of the open source age and changing attitudes in government towards transpency and access to data has improved the availability of digital data able to be manipulated by GIS.
In relation to the US Steve Doig, now a professor and the Knights Chair in Journalism specialising in CAR methods at the University of Arizona, believes:
“We certainly have had an advantage for a long time here in the US in there being at least good lip service to a culture of open government data, public records and laws … journalists have learnt to make use of these laws and to fight to getting spatial data through public records requests.”
Donald also believes that in the US: “We have a lot of good national shape files, a lot of that comes from the National Security Agency (NSA) and a lot of it also comes because the Census Bureau has been such a leader in providing US based shape files and is really a breath of fresh air in our Government, they are open, you can get things online, they are experts over the phone. They like to share, they like to make sure you understand things.”
However in the UK Jennifer LaFleur, currently director of CAR at ProPublica has noted: “I carry out training in the UK every summer and the big issue has just been getting good shape files.”
David Herzog, a professor at the Missouri School of Journalism and academic advisor to the National Institute for Computer-Assisted Reporting (NICAR), a joint programme of the Missouri School of Journalism and Investigative Reporters and Editors.
He agrees with LaFleur that: “I know you guys have the Ordnance Survey (OS) where a lot of the data is heavily restricted and licenced which is sad as a lot of work journalists and consulting companies do is not possible without these shapefiles.”
However Doig has noticed:
“The Ordnance Survey and government organisations in the UK on their own initiative are starting to put useful spatial data online … Governments are discovering that in many ways it is more efficient to be transparent about at least some of the data they have by putting it on websites rather than to require people to come down to see it and to put in on paper. So there’s saving in time and effort.”
In the past the OS was seen as the pantomime villain when it came to the public release of government data in particular the raster and vector mapping that could be used by a GIS.
Now the OS, along other public organisations such as the Office of National Statistics are releasing their tabular and geo-spatial data:
“As part of an encouragement of governmental transparency and fostering of innovation under the guidance of the newly formed Open Data Institute (ODI),” says Ian Holt, head of developer outreach at the OS.
The aim of the ODI is to; “catalyse the evolution of an open data culture to create economic, environmental, and social value. It will unlock supply, generate demand, create and disseminate knowledge to address local and global issues.”
The ODI says on its website: “Coupled with the UK’s presidency of the Open Government Partnership, the signing of the G8 Open Data Charter, and the US Presidential Executive Order coincided to push a single message: data is open and machine-readable by default more.”
In the UK we also have the Association Geographic Information (AGI) which further encourages best practice in the release of open geospatial data to the public.
The result of these initiatives is the release of more government data, not just in the UK but globally.
So where can we get this open source data?
At the top there are the open data supermarkets such as Data.gov, Data.Gov.UK,Office of National Statistics, Stats Wales, DataPolice.UK, the Guardian World Data Store and Publicdata.eu.
All of these sites can be searched either directly or through using opendatasource.org, an aggregator of open data which also includes that held in the World Data Bank.
Another place to look for geo-spatial data is at the independent geo-specialist organisations such as OS Open Data, GeoCommons, GeoComm, GoGeo and ShareGeo and also DBpedia which pulls across geographical information from Wikipedia.
GIS departments of educational institutions like that at Edinburgh University are also valuable sources of geo-spatial data.
Proprietary desktop GIS like ArcMap have their own background raster mapping built in with their licences which can be of excellent quality.
Alternatively they and open source GIS like QGIS have plug-ins that allow them to import open source raster mapping and imagery from web mapping services such as OpenStreetMap, Google Bing and Yahoo and online GIS such as CartoDB.
Mapping from these sources are usually of good quality but the user may have potential issues with the depth of analysis, resolution, metadata, date and limited offline use.
Donald remains stoic when it comes to sources of geospatial information. He believes a journalist should always be looking everywhere and not just at the recognised sources above:
“There are lots of avenues for finding data and having done a fair amount of international training my attitude is always that a journalist should be assuming that the data is out there, he or she just needs to do and find it, it doesn’t mean that its always going to be done in a government agency, or in an agency you think will have it.”
This is the belief of Ian Holt at the OS:
“If the data isn’t available pre-packaged and catalogued, you need to head out foraging across the internet. There is a lot of open data in the wild – you just need to know how to spot it,” he said.
Journalists have several options. They can search for it in internet search engines like Google or Google docs, specialist data search engines such as getthedata.org or use a data scraper such as Scraperwiki. Alternatively submit a Freedom of Information request.
Books by leading UK data journalists such as Simon Rodgers, Heather Brooke, Paul Bradshaw, Claire Miller and the internationally collaborative Data Journalism Handbook also give good advice and ideas on where to source all kinds of data and are well worth a read if new to data journalism.
An important caveat is that just because data comes from a reliable source does not mean that it will necessarily be accurate or suitable to use.
Philip Meyer (Precision Journalism – A Reporter’s Introduction to Social Science Methods, 2002) notes:
“The world has become so complicated, the growth of available information so explosive, that the journalist needs to be a filter, as well as a transmitter, an organiser and interpreter, as well as one who gathers and delivers facts.”
LaFleur believes: “There will always be problems with any data. You need to integrity check it the same as you would do with any dataset looking for things that are not right. We may have guidelines but we also need to be observant.”
Henriksen warns: “At the time of Hurricane Katrina there was difficulty in using Google for example there was confusion over when the imagery was taken so as a journalists you need to be very sensitive to the fact that data does come from different vintages and you as a journalists need to know what that vintage is.”
Holt believes: “When you look at these datasets be very wary of what they are and what they are providing and how often they are updated and the quality of that data, these are all things that you should consider.”
Holt is also keen to stress that; “free isn’t always open”, in that just because data is free doesn’t necessarily allow the user to use it. Copyright and licensing caveats may apply as well as the limitation that; “open data is not always good data”, suitable for consumption.
Donald also warns the journalist: “He or she as a reporter needs to be aware of that and either attempt to find these problems and clean through data or if that proves somewhat prohibitive to make sure the caveats out there and that they understand the limitations of the data and what they can report from it.”
Heather Brooke (The Silent State, 2011), questions the availability and validity of data and notes that: “Government statistics make up four – fifths of all official statistics.”
If we are to use this data for reporting then we must ensure that it is free of; “such tactics as statistical manipulation used by the powerful to spin or slant information so that we are able to judge the effectiveness of a policy or decision,” states Brooke.
In order to counter these allegations the UK Government has published a Code of Practice for Statistics.
These standards are:
- Meeting users’ needs
- Impartiality and objectivity
- Sound Methods and assured quality
- Resources (Sufficient to provide data)
- Proportionate burden (Does benefit outweigh cost?)
- Frankness and Accessibility
If these eight categories can be assured then most if not all of the issues surrounding the release of official data should be minimised, but journalists should always be aware of the issues concerning data outlined by the experts above.
Thank you to the following fpor their contributions to this blog: