Wednesday, December 18, 2013

Surface Temperatures at EGU 2014 - now in PICO format

A guest post by Stephan Matthiesen of the Earthtemp Initiative

Following last year's success, the EarthTemp Network is again organising a session at the EGU 2014 and are looking for abstract submissions. The deadline is 16 Jan., and the EGU Assembly is 27 April - 2 May 2014.

The session "Taking the temperature of the Earth: Temperature Variability and Change across all Domains of Earth's Surface" is motivated by the need for better understanding of in-situ measurements and satellite observations to quantify surface temperatures and invites contributions that emphasize sharing knowledge and make connections across different domains and sub-disciplines. They can include, but are not limited to, topics such as:
  • How to improve remote sensing of ST in different environments
  • Challenges from changes of in-situ observing networks over time
  • Current understanding of how different types of ST inter-­relate
  • Nature of errors and uncertainties in ST observations
  • Mutual/integrated quality control between satellite and in-situ observing systems.
  • What do users of surface temperature data require for practical applications (e.g. in environmental or health sciences)?
We are also excited to try out the new interactive PICO format. PICO (Presenting Interactive COntent) is a new session format developed by EGU designed to be more interactive and specifically to encourage more interaction between presenters and audience.

In practice, this means that there will not be the traditional split between oral presentations and poster session. Each author will get her/his 2 minutes of oral presentation in front of the whole audience. Once this general presentation is over, each author gets their "own" screen and can show the complete presentation, with plenty of time for discussions.

We are thrilled to try out this new format, as it supports the vision of the EarthTemp Network to experiment with new ways of encouraging dialogue and collaborations.

We hope you will help us making this session again a big success with your submissions. The convener is happy to answer all questions about the new PICO format and help with technical issues,
so that your contribution will have an impact.

Abstracts can be submitted (deadline 16 January 2014) through the session website.

Some more information on the EarthTemp Network can be found on the EarthTemp Network Website. We have also just published the final version of the EarthTemp position paper, arising from the first workshop in Edinburgh, in the EGU open access journal Geoscientific Instrumentation, Methods and Data Systems:

Merchant, C. J., Matthiesen, S., Rayner, N. A., Remedios, J. J., Jones, P. D., Olesen, F., Trewin, B., Thorne, P. W., Auchmann, R., Corlett, G. K., Guillevic, P. C., and Hulley, G. C.: The surface temperatures of Earth: steps towards integrated understanding of variability and change, Geosci. Instrum. Method. Data Syst., 2, 305-321, doi:10.5194/gi-2-305-2013, 2013.

Thursday, November 7, 2013

Summary of Regional Inhomogeneities in Surface Temperature

A few months ago the ISTI Benchmarking group made a call to the homogenisation community to submit any times/characters of known inhomogeneities occurring in different regions.

Thanks to the many contributors we now have an overview for a number of countries:
Central Europe

There is always room for more. If you come across any useful information or would just like to see what is there so far please go to:

This is an online editable document so please do add info. A static version will be scraped once a month.


Tuesday, October 29, 2013

Request for support sent from World Meteorological Organization to National Met Services

Earlier this month a letter was circulated from WMO on behalf of the Director General to the Permanent Representatives of its member services (such as NOAA, Met Office, KNMI, Meteo France, BoM, CMA etc.). This letter is a result of discussions at the meeting of the Global Climate Observing System's Atmospheric Observations Panel for Climate in its session earlier this year (report here) and facilitated by GCOS. The letter focusses upon databank aspects of the Initiative. Specifically it asks for help in:
  • Confirming the holdings for stations under the auspices of the national service;
  • Sharing any metadata that is associated with these holdings;
  • Help in sharing any other national data either collected by the national service or not held directly by the national service; and
  • Making available any parallel measurement holdings undertaken by the service as part of their network operations to manage / understand change.
As the letter is an official letter it has been translated into several additional languages. These are currently available online (third party hosted so no guarantee of perpetuity availability clearly). Versions are available in English, French, Russian, Spanish and Arabic.To my knowledge although clearly addressed to the national PRs there is no bona fide restriction on their use in support of appropriate requests through other channels. Clearly any reuse should be appropriate and have a clear cover letter to distinguish that it is supporting material.

While we are discussing the databank ... we are now very much in the final stretch of the development of a stable release. The methods paper has been accepted and as soon as it is available in AOP (probably in the next week or two) we will announce so here. In the meantime i's are being dotted and t's crossed to enable us to go out of beta to a release candidate version concurrent with the paper appearance. Barring discovery of major issues in process forensics the release candidate will become version 1 as soon as due processes have been undertaken.

Tuesday, August 20, 2013

Benchmarking and assessment workshop

Cross-posted from the benchmarking blog.

The workshop agenda and full report can be found here

Below is the executive summary.

1st – 3rd July 2013 Benchmarking Working Group Workshop Report Executive Summary National Climatic Data Center (NCDC) of the National Oceanic and Atmospheric Administration (NOAA), Asheville, NC, USA

Attended in person:
Kate Willett (UK), Matt Menne (USA), Claude Williams (USA), Robert Lund (USA), Enric Aguilar (Spain), Colin Gallagher (USA), Zeke Hausfather (USA), Peter Thorne (USA), Jared Rennie (USA)

Attended by phone:
Ian Jolliffe (UK), Lisa Alexander (Australia), Stefan Brönniman (Switzerland), Lucie A. Vincent (Canada), Victor Venema (Germany), Renate Auchmann (Switzerland), Thordis Thorarinsdottir (Norway), Robert Dunn (UK), David Parker (UK)

A three day workshop was held to bring together some members of the ISTI Benchmarking working group with the aim of making significant progress towards the creation and dissemination of a homogenisation algorithm benchmark system. Specifically, we hoped to have: the method for creating the analog-clean-worlds finalised; the error-model worlds defined and a plan of how to develop these; and the concepts for assessment finalised including a decision on what data/statistics to ask users to return. This was an ambitious plan for three days with numerous issues and big decisions still to be tackled.

The complexity of much of the discussion throughout the three days really highlighted the value of this face-to-face meeting. It was important to take time to ensure that everyone understood and had come to the same conclusion. This was aided by whiteboard illustrations and software exploration, which would not have been possible over a teleconference.

In overview, we made significant progress in terms of developing and converging on concepts and important decisions. We did not complete the work of Team Creation as hoped, but necessary exploration of the existing methods was undertaken revealing significant weaknesses and ideas for new avenues to explore have been found.

The blind and open error-worlds concepts are 95% complete and progress was made on the specifics of the changepoint statistics for each world. Important decisions were also made regarding missing data, length of record and changepoint location frequency. Seasonal cycles were discussed at length and more research has been actioned. A significant first go was made at designing a build methodology for the error-models with some coding examples worked through and different probability distributions explored.

We converged on what we would like to receive from benchmark users for the assessment and worked through some examples of aggregating station results over regions. We will assess both retrieval of trends and climate characteristics in addition to ability to detect changepoints. Contingency tables of some form will also be used. We also hope to have some online or assessment software available so that users can make their own assessment of the open worlds and past versions of benchmarks. We plan to collaborate with the VALUE downscaling validation project where possible.

From an intense three days all participants and teleconference participants gained a better understanding of what we're trying to achieve and how we are going to get there. This was a highly valuable three days, not least through its effect of focussing our attention prior to the meeting and motivating further collaborative work after the meeting. Two new members have agreed to join the effort and their expertise is a fantastic contribution to the project.

Specifically, Kate and Robert are to work on their respective methods for Team Creation, utilising GCM data and the vector autoregressive method. This will result in a publication describing the methodology. We aim to finalise this work in August.

Follow on teleconferences, Team Corruption will focus on completing the distribution specifications and building the probability model to allocate station changepoints. This work is planned for completion by October 2013. Release of the benchmarks is scheduled for November 2013.

Team Validation will continue to develop the specific assessment tests and work these into a software package that can be easily implemented. This work is hoped to be completed by December 2013, but there is more time available as assessment will take place at least 1 year after benchmark release.

Friday, July 19, 2013

New Initiative Implementation Plan published covering 2013-2015

This afternoon we have published a new version of the Initiative over-arching Implementation Plan to cover the period 2013-2015. The document is available from here (pdf, 3.3 Mb).  The plan covers the following major aspects (amongst others):
  • Completion of first version databank release
  • Formulation of databank updates strategy including real-time updates and period of record updates
  • Work on metadata collection
  • Work on construction of a database of parallel (collocated) measurements 
  • Construction of the first set of benchmark analogs and their documentation
  • Development of assessment methods
  • Promotion of benchmarking activities and attempts to ensure multiple groups participate to realize the scientific benefits
  • Communications and outreach
  • Cooperation with other relevant activities
  • Creation of a working group concerned with dataset dissemination and user support
You are, of course, welcome to comment on any aspect of the plan here including things you think we are missing that we should be trying to cover (within the stated remit of the Initiative obviously).

Friday, June 28, 2013

Earthtemp position paper on surface temperatures published - comments are welcome

The EarthTemp community position paper, which reviews the state of art of measuring the surface temperatures of Earth and recommends steps to improve our understanding, has been published as a discussion paper in the open access journal Geoscientific Instrumentation, Methods and Data Systems (Discussions) (GI(D)). As the paper aims to capture a community position and influence future science funding and policy decisions, all interested colleagues are encouraged to submit comments or reviews during the discussion phase (until 6 August) which will then be considered (together with the invited peer-reviews) when preparing the final version of the paper.

The workshop and the paper 
The paper captures the ideas and recommendations developed during the first annual EarthTemp Network meeting in Edinburgh (25-27 June 2012), with invited 55 participants from five continents. The meeting placed particular emphasis on encouraging discussions, networking and collaboration between the participant, featuring networking activities to build relationships across the new community, overviews of the state of the art in the field, and a series of 20 intensive small-group discussions on current gaps in our knowledge and scientific priorities on 5 to 10 year timescales across a number of themes. Chris J. Merchant, the PI of the EarthTemp network, drafted a first version of the paper (available on based on notes and presentations from the chairs of the breakout discussions. The current version for peer-reviewed publication was then developed in several stages, incorporating comments from the Network meeting participants as well as more details, examples and references. The wider community of surface temperature providers and users now has an opportunity to add comments and reviews during the open review phase of the journal Geoscientific Instrumentation, Methods and Data Systems (Discussions) (GI(D)). Like all open access journals published by the European Geoscientific Union (EGU), GI(D) has a two stage publication process: Submitted papers are published very rapidly after an editor's access review as discussion paper. This discussion paper is then sent out for peer-review to reviewers assigned by the handling editor. At the same time, all interested colleagues also have the opportunity to submit comments or reviews through the GI website. Both the official invited peer reviews and the spontaneous comments are public (the official reviewers may chose to remain anonymous, though), and the authors are required to respond (publicly) and address all comments adequately for the final version of the paper. Besides being open access, this open review system adds another aspect of transparency to the work, as readers can monitor the quality of the reviews and the authors' responses. The deadline for comments on the GID journal website is 6 August 2013. We plan to communicate the paper and its recommendations widely to a range of scientific organisations, funding and policy bodies.  

The recommendations 
 The workshop identified the following needs for progress towards meeting societal needs for surface temperature understanding and information, which are summarised in the chart and explained in more detail in the paper: 1 Develop more integrated, collaborative approaches to observing and understanding Earth’s various surface temperatures 2 Build understanding of the relationships between different surface temperatures, where presently inadequate 3 Demonstrate novel underpinning applications of various surface temperature datasets in meteorology and climate 4 Make surface temperature datasets easier to obtain and exploit, for a wider constituency of users 5 Consistently provide realistic uncertainty information with surface temperature datasets 6 Undertake large-scale systematic intercomparisons of surface temperature data and their uncertainties 7 Communicate differences and complementarities of different types of surface temperature datasets in readily understood terms 8 Rescue, curate and make available valuable surface temperature data that are presently inaccessible 9 Maintain and/or develop observing systems for surface temperature data 10 Build capacities to accelerate progress in the accuracy and usability of surface temperature datasets. 
Your views and comments 
We want to encourage all users and providers of surface temperature data, in all domains and for all applications, to contribute your views on the recommendations made in the paper, through the open review on the GID journal website. In view of recommendation 3, we would particularly like to hear from users from outside the climatological and meteorological communities, for example scientists who apply temperature data to questions in ecology, health and epidemiology, society, or public engagement: what kind of data is useful for you and what are barriers to their use in your work?  

Merchant, C. J.; Matthiesen, S.; Rayner, N. A.; Remedios, J. J.; Jones, P. D.; Olesen, F.; Trewin, B.; Thorne, P. W.; Auchmann, R.; Corlett, G. K.; Guillevic, P. C.; Hulley, G. C. (2013): The surface temperatures of the earth: Steps towards integrated understanding of variability and change. Geoscientific Instrumentation, Methods and Data Systems Discussions, 3(1), 305–345. DOI:

You can find out more about Earthtemp at and follow on twitter at

Friday, June 21, 2013

IDL users of the netcdf files databank take note

If using certain versions of IDL (mine for example - 8.0) and sequentially reading, manipulating and then writing out the files there is a pretty spectacular bug in IDL that causes a random crash. It seems that IDL is not dynamically clearing memory between reads for certain unfathomable file combinations. If you are getting issues randomly in such a loop then reset all the read in variables to =0 at the end of the loop each time and these crashes will simply disappear. Ours is clearly not to wonder why, particularly on a Friday ...

Friday, June 14, 2013

Databank beta 4 release

Today we released a fourth beta release of the databank holdings. The associated readme file is available at The fourth beta release involves changes to the metadata format (see the README) and the provision of files in netcdf format. We have made every effort to make these netcdf files compliant with the CF conventions. Put more honestly we have managed to fool the available online python script-based checker into concurring that these are CF compliant. But we would greatly appreciate anyone taing these for a road test and telling us if we are missing things or things break so we can modify the netcdf file formats as necessary.

The formal release is still planned to occur upon acceptance of the methods paper which at this time remains as submitted status at the journal.

The set of stations remains over 30,000, with many longer stations than available in GHCNMv3

Tuesday, June 11, 2013

Notes from meeting open to all initiative participants

Last week we held a call with an open invitation to all those actively participating in the initiative. This call concentrated upon where we stand today and where we want to aim for in the coming two years. It is hoped that we can release a new Implementation Plan to replace the current one by the end of July.

The call notes provide a reasonable snapshot of where we stand today as well as what the open issues are and hence I am publicizing via the blog (there are very many meetings of the steering committee and various groups which it would get more than a tad repetetive (and boring) to highlight each one on this blog but salient / general interest calls we will highlight). Comments on the call notes and constructive suggestions for what we should be aiming to do in the coming two years are most welcome.

The call notes are available from here.

As a heads up we also intend to release a fourth beta of the databank this week. This will include a modicum of further station blacklisting but also a (hopefully) CF-compliant netcdf format version so that people can test this and highlight any issues in our netcdf conversion before formal release of a first version - the timing of which is still dependent upon methods paper acceptance. We'll have a further post when this is up.

Thursday, May 16, 2013

Meeting announcement: Characterising surface temperatures in data-sparse and extreme regions (with a focus on high-latitude domains)

2nd Annual EarthTemp Network Meeting 
12-14 June 2013, Copenhagen 

Data-sparse and extreme regions is the topic of the second year of the network. A focus will be the high-latitude domains, but the network remains inclusive and open to surface temperature researchers of all backgrounds who are interested in sharing knowledge and making connections across sub-discipline boundaries. 

The workshop aims to facilitate collaborations between researchers and will have substantial dialogue and networking activities as well as invited overview presentations, panel discussions and the opportunity to present your work in poster sessions. Sessions planned for the programme are: 

Overview presentations by invited speakers, followed by panel discussions on 
- High-latitude surface temperatures: synthesis of datasets and what they tell us (Kevin Wood) 
- Arctic Land Surface Temperature: Variability and Change (Claude Duguay) 
- Sea Surface Temperature Changes in Polar Regions (Pierre Le Borgne) 
- Sea-Ice Surface Temperature Measurements: Status and Utility (Jacob Hoyer) 
Plenary discussion: Combined interpretation of Arctic temperatures Networking activities 

Breakout discussion groups on 
- Techniques for matching measurements and retrievals across different platforms 
- Measurement of high-latitude Surface Temperature (SST): Why is it difficult, and how can we do it better? 
- Satellite Land Surface Temperatures (LST) in high latitudes and high altitudes: How can we exploit them better? 
- The EarthTemp White Paper: Turning recommendations into actions 

Poster sessions & Poster discussions 

There is no attendance fee. 

The workshop is limited to 50 participants. 

Monday, May 13, 2013

Call for regional inhomogeneity info

To create realistic benchmarks we would like to reproduce times and locations of known sources of inhomogeneity as best we can. Please can you help us. If you know of any regional/countrywide changes to the observing system over time please can you list them here or point us to some documentation/reference. Any information is valuable - even if its quite vague.

Ideally we'd like to know:

WHEN - specific date or month or year or even decade etc.
WHERE - a region, a country, an international GTS/WMO change etc.
WHAT - a change in shelter, thermometer type, automation, observing time/practice etc.
HOW - are there any estimates of the size/direction/nature of the effect of this change?

Please post here and encourage others to do so. We then hope to reward you with some realistic error-worlds to play with.

Kate (and the Benchmarking working group)

Thursday, April 18, 2013

Initiative posters at 2013 EGU

There were two posters presented at EGU this year outlining Initiative progress and promoting the use of the databank by groups interested in the challenge of surface temperature data homogenization. I didn't attend myself, but the second hand feedback I have received off the named presenters is that they were generally well received.

One thing that was requested by some was help in getting funding. Sadly, we don't have funding for anything directly, but we are more than happy to write letters of support for any work that furthers the aims of the initiative to funding bodies.

Friday, March 22, 2013

Initiative progress report published

The initiative progress report has been published and shared with our 'sponsors'. This provides a useful overview of what has been acheived and what is intended to occur in the next year. Its been delayed due to demands on folks time. But still, better late than never. Comments and feedback are welcome. All progress reports are archived at

Monday, March 18, 2013

Databank Release: Beta #3

We are nearing an official version 1 release of the global land surface databank. However, because there have been major changes since the last beta release in December, it seemed adequate to push out one more beta for the public to provide any comments.

The beta3 release can be found here: Within that directory one can find all the data and code used, along with some graphics depicting the results of all the merge variants.

In addition, the previous betas are still available to look at, if anyone wishes to run comparisons
The next couple of posts will highlight changes and additions to this beta release, however here are the highlights:
  • A blacklist of candidate stations was generated to either fix known errors with its metadata/data, or withhold the station completely. This is a required input file for the code to run and is provided with this beta release
  • Some minor code changes were applied, including withholding stations when the metadata probability was near perfect, but the data comparisons were so poor the station became unique (when it should have merged). In addition, odd characters were removed from the station name before the Jaccard Index was run.
  • The format of stage 3 data was changed so that it was consistent with all stage 2 data. In addition, all data provenance flags have been ported over in order to be open and transparent
  • Algorithm output is included with each variant result, in order to provide information about each candidate station and how it made it's decision to merge / unique / withhold. A future post will go into great detail about each output file.
As usual these are not considered the final revisions prior to an official Version 1 release. In addition all documentation provided on the FTP site will be superseded with a published version of the databank merge methodology paper, which we are working hard to submit to a peer-reviewed journal soon.

If you wish to provide comments, please feel free to send an e-mail to

Thursday, February 7, 2013

A database with daily climate data for more reliable studies of changes in extreme weather

Repost from the blog Variable Variability.

In summary:
  • We want to build a global database of parallel measurements: observations of the same climatic parameter made independently at the same site
  • This will help research in many fields
    • Studies of how inhomogeneities affect the behaviour of daily data (variability and extreme weather)
    • Improvement of daily homogenisation algorithms
    • Improvement of robust daily climate data for analysis
  • Please help us to develop such a dataset


One way to study the influence of changes in measurement techniques is by making simultaneous measurements with historical and current instruments, procedures or screens. This picture shows three meteorological shelters next to each other in Murcia (Spain). The rightmost shelter is a replica of the Montsouri screen, in use in Spain and many European countries in the late 19th century and early 20th century. In the middle, Stevenson screen equipped with automatic sensors. Leftmost, Stevenson screen equipped with conventional meteorological instruments.
Picture: Project SCREEN, Center for Climate Change, Universitat Rovira i Virgili, Spain.

We intend to build a database with parallel measurements to study non-climatic changes in the climate record. This is especially important for studies on weather extremes where the distribution of the daily data employed must not be affected by non-climatic changes.

There are many parallel measurements from numerous previous studies analysing the influence of different measurement set-ups on average quantities, especially average annual and monthly temperature. Increasingly, changes in the distribution of daily and sub-daily values are also being investigated (Auchmann and Bönnimann, 2012; Brandsma and Van der Meulen, 2008; Böhm et al., 2010; Brunet et al., 2010; Perry et al., 2006; Trewin, 2012; Van der Meulen and Brandsma, 2008). However, the number of such studies is still limited, while the number of questions that can and need to be answered are much larger for daily data.

Unfortunately, the current common practice is not to share parallel measurements and the analyses have thus been limited to smaller national or regional datasets, in most cases simply to a single station with multiple measurement set-ups. Consequently there is a pressing need for a large global database of parallel measurements on a daily or sub-daily scale.

Also datasets from pairs of nearby stations, while officially not parallel measurements, are interesting to study the influence of relocations. Especially, typical types of relocations, such as the relocation of weather stations from urban areas to airports, could be studied this way. In addition, the influence of urbanization can be studied on pairs of nearby stations.

Daily data

Daily datasets are essential for studying the variability of and extremes in weather and climate. Looking at the physical causes of inhomogeneities, one would expect that many of the effects are amplified on days with special weather conditions and thus especially affect the tails of the distribution of the daily data. Now that the interest in extreme weather and thus in daily data has increased, more and more people are also working on the homogenization of daily data. Increasingly, developers of national and regional temperature datasets have homogenised the temperature distribution (see e.g., Nemec et al., 2012; Auer et al., 2010; Brown et al., 2010; Kuglitsch et al., 2009, 2010). Further improvements in the quantity and quality of such datasets, and a deeper understanding of remaining deficiencies, are important for climatology.

Application possibilities of parallel measurements

The most straightforward application of such a dataset would be a comparison of the magnitude of the non-climatic changes to the magnitude of the changes found in the climate record. We need to know whether the non-climatic changes are large enough to artificially hide or strengthen any trends or perturb decadal variability. In addition, such a dataset would help us to better understand the physical causes of inhomogeneities. A large and quasi-global dataset would enable to analyse how the magnitude and nature of inhomogeneities differ depending on the geographical region and the microclimate.

The dataset would also benefit homogenisation science in multiple ways. It may reveal typical statistical characteristics of inhomogeneities that would allow for a more accurate detection and correction of breaks. The dataset would facilitate the development of physical homogenisation methods for specific types of breaks that are able to take the weather conditions into account; similar to the method developed for the transition of Wild screens to Stevenson screens for Switzerland by Auchmann and Brönnimann (2012). It would also allow for the development of generalised physical correction methods suitable for multiple climatic regions. Finally, the dataset would improve the ability to create realistic validation datasets, thus improving our estimates of the remaining uncertainties. This in turn again benefits the development of better homogenisation methods.

Organisational matters

As an incentive to contribute to the dataset, initially only contributors will be able to access the data. After joint publications, the dataset will be opened for academic research as a common resource for the climate sciences. These two stages will also enable us to find errors in the dataset before the dataset is published.

The International Surface Temperature Initiative (ISTI) and the European Climate Assessment & Dataset (ECA&D) are willing to host the dataset. This is great, because it makes the dataset more visible for contributors and users alike. We are still looking for an organisational platform that could facilitate the building of such a dataset. Any ideas for this are appreciated.

A preliminary list with parallel measurements can be found in our Wiki.

If you have any ideas or suggestions for such an initiative, if you know of further parallel datasets, or if you just want to be kept informed, please update our Wiki, comment at Variable Variability or send an email to Furthermore, if you know someone who might be interested, please inform him or her about this initiative. Thank you.

Scientists involved in this initiative are:

  • Enric Aguilar (University of Tarragona, Spain)
  • Renate Auchmann (University of Bern, Switzerland)
  • Ingeborg Auer (Zentralanstalt für Meteorologie und Geodynamik, Austria)
  • Andreas Becker (Global Precipitation Climatology Centre, Deutscher Wetterdienst, Germany)
  • Stefan Brönnimann (Institute of Geography, University of Bern, Switzerland)
  • Michele Brunetti (Institute of Atmospheric Sciences and Climate of the National Research Council, Italy)
  • Sorin Cheval (National Meteorological Administration, Romania)
  • Peter Domonkos (University of Tarragona, Spain)
  • Aryan van Engelen (Royal Netherlands Weather Service, The Netherlands)
  • José Guijarro (Agencia Estatal de Meteorología, Spain)
  • Franz Gunther Kuglitsch (GFZ German Research Centre for Geosciences, Germany)
  • Monika Lakatos (Hungarian Meteorological Service, Hungary)
  • Øyvind Nordli (Meteorologisk institutt, Norway)
  • David Parker (UK MetOffice, United Kingdom)
  • Mário Gonzalez Pereira (Universidade de Trás-os-Montes e Alto Douro, Portugal)
  • Tamas Szentimrey (Hungarian Meteorological Service, Hungary)
  • Peter Thorne (National Climatic Data Center, USA; International Surface Temperature Initiative)
  • Victor Venema (University of Bonn, Germany)
  • Kate Willett (UK MetOffice, United Kingdom)

Related posts

Future research in homogenisation of climate data – EMS 2012 in Poland
A discussion on homogenisation at a Side Meeting at EMS2012
What is a change in extreme weather?
Two possible definitions, one for impact studies, one for understanding.
HUME: Homogenisation, Uncertainty Measures and Extreme weather
Proposal for future research in homogenisation of climate network data.
Homogenization of monthly and annual data from surface stations
A short description of the causes of inhomogeneities in climate data (non-climatic variability) and how to remove it using the relative homogenization approach.
New article: Benchmarking homogenization algorithms for monthly data
Raw climate records contain changes due to non-climatic factors, such as relocations of stations or changes in instrumentation. This post introduces an article that tested how well such non-climatic factors can be removed.


Auchmann, R., and S. Brönnimann. A physics-based correction model for homogenizing sub-daily temperature series, J. Geophys. Res., 117, art. no. D17119, doi: 10.1029/2012JD018067, 2012.

Auer I., Nemec J., Gruber C., Chimani B., Türk K. HOM-START. Homogenisation of climate series on a daily basis, an application to the StartClim dataset. Wien: Klima- und Energiefonds, Projektbericht, 34 p., 2010.

Brandsma, T. and J.P. van der Meulen, Thermometer Screen Intercomparison in De Bilt (the Nether-lands), Part II: Description and modeling of mean temperature differences and extremes. Int. J. Climatology, 28, pp. 389-400, 2008.

Brown, P. J., R. S. Bradley, and F. T. Keimig. Changes in extreme climate indices for the northeastern United states, 1870–2005, J. Clim., 23, 6555–6572, doi: 10.1175/2010JCLI3363.1, 2010.

Böhm, R., P.D. Jones, J. Hiebl, D. Frank, M. Brunetti, M. Maugeri. The early instrumental warm-bias: a solution for long central European temperature series 1760–2007. Climatic Change, 101, pp. 41–67, doi: 10.1007/s10584-009-9649-4, 2010.

Brunet, M., J. Asin, J. Sigró, M. Banón, F. García, E. Aguilar, J. Esteban Palenzuela, T.C. Peterson and P. Jones. The minimization of the screen bias from ancient Western Mediterranean air temperature records: an exploratory statistical analysis. Int. J. Climatol., doi: 10.1002/joc.2192, 2010.

Klein Tank, A.M.G., Wijngaard, J.B., Können, G.P., Böhm, R., Demarée, G., Gocheva, A., Mileta, M., Pashiardis, S., Hejkrlik, L., Kern-Hansen, C., Heino, R., Bessemoulin, P., Müller-Westermeier, G., Tzanakou, M., Szalai, S., Pálsdóttir, T., Fitzgerald, D., Rubin, S., Capaldo, M., Maugeri, M., Leitass, A., Bukantis, A., Aberfeld, R., van Engelen, A. F.V., Forland, E., Mietus, M., Coelho, F., Mares, C., Razuvaev, V., Nieplova, E., Cegnar, T., Antonio López, J., Dahlström, B., Moberg, A., Kirchhofer, W., Ceylan, A., Pachaliuk, O., Alexander, L.V. and Petrovic, P. Daily dataset of 20th-century surface air temperature and precipitation series for the European Climate Assessment. Int. J. Climatol., 22, pp. 1441–1453. doi: 10.1002/joc.773, 2002. Data and metadata available at

Kuglitsch F.G., Toreti A., Xoplaki E., Della-Marta P.M., Luterbacher J., Wanner H. Homogenisation of daily maximum temperature series in the Mediterranean. Journal of Geophysical Research, 114, art. no. D15108, doi: 10.1029/2008JD011606, 2009.

Kuglitsch F.G., Toreti A., Xoplaki E., Della-Marta P.M., Zerefos C.S., Türkes M., Luterbacher J. Heat wave changes in the eastern Mediterranean since 1960. Geophysical Research Letters, 37, L04802, doi: 10.1029/2009GL041841, 2010.

Meulen, van der, JP, T Brandsma. Thermometer screen intercomparison in De Bilt (The Netherlands), part I: Understanding the weather-dependent temperature differences. Int. J. Climatol., 28, 371-387, 2008.

Nemec, J., Ch. Gruber, B. Chimani, I. Auer. Trends in extreme temperature indices in Austria based on a new homogenised dataset. Int. J. Climatol., doi: 10.1002/joc.3532, 2012.

Perry, M., Prior, J. and Parker, D.E., 2006: An assessment of the suitability of a plastic thermometer screen for climatic data collection. Int. J. Climatol., 27, 267-276.

Trewin, B. A daily homogenized temperature data set for Australia. Int. J. Climatol., doi: 10.1002/joc.3530, 2012.

Thorne, Peter W., Kate M. Willett, Rob J. Allan, Stephan Bojinskski, John R. Christy, Nigel Fox, Simon Gilbert, Ian Jolliffffe, John J. Kennedy, Elizabeth Kent, Albert Klein Tank, Jay Lawrimore, David E. Parker, Nick Rayner, Adrian Simmons, Lianchun Song, Peter A. Stott, and Blair Trewin 2011: Guiding the Creation of A Comprehensive Surface Temperature Resource for Twenty-First-Century Climate Science. Bull. Amer. Meteor. Soc., 92, ES40–ES47. doi: 10.1175/2011BAMS3124.1. More information at:

Tuesday, February 5, 2013

Databank highlighted in EOS issue 5th Feb

A brief communication in EOS appeared today outlining the databank available from*. The piece is by Jay Lawrimore who heads the task team, Jared Rennie who has done the bulk of the work and myself as a quasi-passenger to the whole enterprise.

This seems an apposite time to update on where we stand vis-a-vis a full first version release. We have done a first blacklisting sweep and are going back for a second try based upon what we learned to see whether we can catch any more issues.

The in-house development version that is a modification of beta 2 now stands at just over 32,500 stations. We have removed 'Atlantis' stations and resolved a large number of issues over wrong geolocation. We almost certainly won't have caught them all at whatever point we release - that's inevitable. But we are increasingly confident we'll have resolved the truly low-hanging fruit issues.

At the same time we have been revising the longer methods paper that Jared is leading based upon author feedback and to reflect the blacklisting. We plan to submit that to the journal soon.

Bottom line is that we are currently shooting for a release of version 1  of the databank mid-to-late March after necessary approval procedures have been followed in mid-March. Of course, that schedule is subject to change if we find any additional issues in the interim.

* The actual paper, for now, is behind a paywall. We will investigate whether we can post a copy and if so will provide a link to such an unrestricted copy in an update to this post.

Monday, January 28, 2013

More on efforts at data rescue and digitization - reposted press release

Note: This is a copy of a press release by two sister organizations, ACRE and IEDRO that I am reposting for them upon request. Please send further requests to the named contacts. Peter

Contact: Malia Murray
Cell: 301.938.9894

The two independent organizations agree to work together for the digital preservation of and access to global historical weather and climate data through international data rescue, and digitization efforts as well as undertake and facilitate the recovery of historical, instrumental, surface terrestrial, and marine global weather observations.

   November 26, 2012 – Toulouse, France- Over fifty international climate, social and humanities scientists and representatives from the archival and library communities with common interests in climate services, gathered at Météo-France for the 5th Atmospheric Circulation Reconstructions over the Earth (ACRE) Workshop from the 28th-30th November 2012.  There they witnessed the signing of the Memorandum of Understanding (MoU) that will join efforts within the global climate services industry. The 18th session of the Conference of the Parties to the United Nations Framework Convention on Climate Change (UNFCCC) and the 8th Session of the Conference of the Parties serving as the Meeting of the Parties to the Kyoto Protocol were meeting simultaneously at the Qatar National Convention Centre in Doha, Qatar.

   This agreement forms the foundation of the User Interface Platform (UIP), the pillar of the World Meteorological Organization (WMO), and Global Framework for Climate Services (GFCS). The partnership unites the highest caliber of international experience and resources to ensure the finest of climate services. The partnership specifically enhances the areas of data rescue (DARE), the establishment of an International Data Rescue (I-DARE) Inventory, and identification of high priority weather and climate datasets. This arrangement also opens opportunities for collaborative funding of vital historical and contemporary weather and climate data. This is essential to the provision of, and access to, climate services.

    Together the ACRE and IEDRO communities, and their various partners, will develop the largest single source of primary weather and climate services. This will create opportunities to access long records of weather data that will be available for the full range of analyses. Dr. Rob Allan, International ACRE Project Manager, “The merger of ACRE and IEDRO under a new MoU is a major step towards building the infrastructure and funding support needed to reinvigorate and sustain international data rescue activities.  It will create a platform for wider partnerships with the global community and encourage funders to see the potential value in long, historical databases of global weather for use by the climate science and applications community, policy and decision makers, educators, students and the wider public.

  The resulting climate services data will contribute high quality, high resolution, weather and climate data available through free and open exchange via the National Oceanic and Atmospheric Administration (NOAA), International Comprehensive Ocean-Atmosphere Data Set (ICOADS), the International Surface Pressure Data bank (ISPD), and the International Surface Temperature Initiative (ISTI) databases.

  The delegates also expressed the need for the establishment of an additional database where all hydrometric DR&D efforts would be listed and updated by their sponsors or program managers. IEDRO will begin building this new database once funding is secured.

For more information about this topic, or to schedule an interview with Dr. Richard Crouthamel, please contact him directly at 410.867.1124 or e-mail him at

Friday, January 25, 2013


Things have gone a bit quiet of late I realize. In part this is due to real-life which has a habit of getting in the way. But in large part its because we have been grappling with the creation of a blacklist. 'We' here is the very definition of the royal we as it would be fairer to state that Jared has been grappling with this issue.

There be gremlins in the data decks constituting some of the input data to the databank algorithm - both dubious data and geolocation metadata. We knew this from the start but have stayed blacklisting until we got the algorithm doing sort of what we thought it should and everyone was happy with it. Now we have attacked the problem for several weeks. Here are the four strands of attack:

1. Placing a running F-test through the merged series to find jumps in variance. This found a handful of intra-source cases of craziness. We will delete these stations through blacklisting.

2. Running through NCDC's pairwise homogenization algorithm to see whether any really gigantic breaks in teh series are apparent. This found no such breaks (but rest assured there are breaks and the databank is a raw data holding and not a data product per se).

3. First difference series correlations with proximal neighbors. We looked for cases where correlation was high and distance was high, correlation was low and distance was low and correlation was perfect and distance low. These were then looked at manually. Many are longitude / latitude assignation errors. For example we know Dunedin on the South Island of New Zealand is in the Eastern Hemisphere:
This is Dunedin. Beautiful place ...

And not the Western Hemisphere:
This is not the Dunedin you were looking for ... Dunedin is not the new Atlantis

 But sadly two sources have the sign switched. The algorithm does not know where Dunedin is so is doing what it is supposed to. So, we need to tell it to ignore / correct the metadata for these sources so we don't end up with a phantom station.

There are other issues than simple sign errors in lat / lon that these picked up. One of the data decks has many of its French stations longitudes inflated by a factor of 10, so a station at 1.45 degrees East is wrongly placed at 14.5 degrees East. Pacific island stations appear to have recorded under multiple names and ids which confounds the merging in many cases.

4. As should be obvious from the above we also needed to look at stations proverbially 'in the drink', so we have pulled a high resolution land-sea mask and run through all stations against that. All cases demonstrably wet (greater than 10Km = .1 degree resolution at equator and many sources are only to 0.1 degree accuracy) are getting investigated.

Investigations have used the trusty googlemaps and wikipedia route in general with other approaches where helpful. Its time consuming and thankless. The good news is 'we' (Jared) are (is) nearly there.

The whole blacklist file will be one small text file the algorithm reads and one very large pdf that justifies each line in that text file. As people find other issues (and there undoubtedly will be - we will only catch worst / most obvious offenders even after several weeks on this) we can update and rerun.

Tuesday, January 15, 2013

First public talk on Databank Merge Results: AMS Annual Meeting

While there have been previous talks and posters about the International Surface Temperature Initiative, as well as the overall structure of the Global Land Surface Databank, last week marked one of the first times our group presented work on the merged product in a public fashion. I was given the opportunity to present at the 93rd Annual Meeting of the American Meteorological Society. The room was full of climatologists on the national and international scale, and we even gained a few contacts in hopes to receive more data for our databank effort.

In order to continue our aims to be open and transparent, the abstract from the conference can be found here, and the presentation used can be located here. The presentation was also recorded, and once AMS puts the audio online, we will also try and link to it.

Wednesday, January 9, 2013

How should one update global and regional estimates and maintain long-term homogeneity?

Prompted by recent discussions in various blogs and elsewhere (I'm writing this from a flaky airport connection on a laptop so no links - sorry) it seems that, for maybe the umpteenth time, there are questions about how the various current global and some national estimates are updated. Having worked in two organizations that take two very distinct approaches I thought it worth giving some perspective. It may also help inform how others who might come in and use the databank to create new products choose to approach the issue.

The fundamental issue of how to curate and update a global, regional or national product whilst maintaining homogeneity is a vexed one. Non-climatic artifacts are not the sole preserve of the historical portion of the station records. Still today stations move, instruments change, times of observation change etc. etc. often for very good and understandable reason (and often not ...). There is no obvious best way to deal with this issue. If ignored for long enough station, local and even regional series can become highly unrealistic if large very recent biases are not dealt with.

The problem is also intrinsically inter-linked with the question as to which period of the record we should adjust for non-climatic effects. Here, at least there is general agreement that adjustment should be made to match the most recent apparently homogeneous segment so that today's readings can be easily and readily compared to our estimates of past variability and change without performing mental gymnastics.

At one extreme of the set of approaches is the CRUTEM method. Here, real-time data updates are only made to a recent period (I think still just post-2000) and no explicit assessment of homogeneity is made at the monthly update granuality (there is QC applied). Rather adjustments and new pre-2000 data effectively are caught up with major releases or network updates (e.g. with entirely new station record additions / replacements / assessments normally associated with a version increment and manuscript). This ensures values prior to a recent decade or so remain static for most month to month updates but at a potential cost if a station inhomogeneity occurs in the recent past which is de facto unaccounted for. This can only then be caught up with through a substantive update.

At the other extreme is the approach undertaken in GHCN / USHCN. Here the entire network is reassessed based upon new data receipts every night using the automated homogenization algorithm. New modern periods of records can change the identification of recent breaks in stations that contribute to the network. Because the adjustments are time-invariant deltas applied to all points prior to an identified break the impact is to change values in the deep past to better match modern data. So, the addition of station data for Jan 2013 may change values estimated for Jan 1913 (or July 1913) because the algorithm now has enough data to find a break that occurred in 2009. This then may affect the nth significant figure of the national / global calculation in 1913 on a day to day basis. This is why with GHCNv3 a system of version control of v3.x.y.z.ddmmyyyy was introduced and each version archived. If you want bit replication to be possible of your analysis then explicitly reference the version you used.

What is the optimal solution? Perhaps this is a 'How long is a piece of string?' class of question. There are very obvious benefits to either approach or any number of others. In part it depends upon the intended uses of the product. If interested in serving homogeneous station series as well as aggregated area averaged series using your best knowledge as of today perhaps something closer to NCDC's approach. If interested mainly in large scale average determination and under a reasonable null that at least on a few years timescale the inevitable new systematic artifacts average out as gaussian over broad enough space scales the CRUTEM approach makes more sense. And that, perhaps, is fundamentally why they chose these different routes ...


Saturday, January 5, 2013

High School Students Engage in Climate Research

Please note that this is a guest post by Rich Kurtz, a teacher from Commack, NY state

A few years ago I had a student interested in climate change, my job as a science teacher was to work with the student to help her develop a project.  In a circuitous way my student and I were introduced to Mr. John Buchanan, the Climate Change Student Outreach Chairperson for the Casualty Actuarial Society.  Mr. Buchanan helped us develop a project using data from logbooks of weather from the 1700’s recorded by a Philadelphia farmer, Phineas Pemberton.  
Phineas Pemberton sample log page Jan. 1790, Philadelphia

My student was given the opportunity to present her data at the 3rd ACRE Workshop, Reanalysis and Applications conference in Baltimore, MD.  That meeting opened up the door to authentic learning opportunities for my students.  At the meeting I had the privilege of meeting scientists and educators from a broad spectrum of organizations.  Those professionals inspired me to investigate the possibility of introducing my students to the issues of climate change using historical weather data.   This has been a fruitful avenue of authentic learning experiences for my high school students.  With the help of outside mentors and ambitious and hardworking students we have been able to locate and use historical weather data for science research projects.  
Currently we are engaged in two projects.  One project involves digitizing data from weather records from logbooks recorded at Erasmus HallSchool in Brooklyn, NY between 1826 and 1849.  Cary Mock of the University of South Carolina told me about the logbooks, they are housed at the New York City Historical Society.  One of my students photographed the entire set of logbooks and is using those photos to digitize the data and explore and compare weather trends and changes.   
Erasmus High sample log entry from January 1852 (Brooklyn, New York)
Another project involves a group of students who have volunteered to digitize weather and lake height data from Mohonk Preserve in the New Paltz area of New York State.  After reading about a presentation about climate change given by the director of the preserve I contacted her and asked if there was anything that my students could volunteer to help with, with respect to weather data.  She was excited to get our students involved in digitizing their weather and lake water level records going back to the 1880s.  The students are currently putting the data from the logs into a database from with they will develop research questions from which they will formulate an investigation.   
Sample log entry from Mohonk Lake Preserve area (upstate New York), January 1890
I think that there is a lot of interest among teachers to get their students involved with authentic projects.  The advantage of working on historical weather projects is that it is an area of study that merges many aspects of learning.  A historical weather project can bring together topics in history, science, math and helps students with their organizational skills.  My students sometimes have the opportunity to consult with a professional scientist.  These areas all touch upon skills that we want our students to acquire.
I would like to acknowledge some of the people who have helped me with my work with students. Mr. John Buchanan, the Climate Change Student Outreach Chairperson for the Casualty Actuarial Society.  Mr. Eric Freeman, from the National Climactic Data Center, Mr. Gilbert Compo, from the Climate Diagnostics Center NOAA and Cary Mock of the Department of Geography, University of South Carolina.

Wednesday, January 2, 2013

Databank update - nearby 'duplicates' issue raised by Nick Stokes

Climate blogger Nick Stokes provided some additional feedback upon the beta 2 release alerting us to a case whereby two records for a station were still present. This was not a bug per se. The station data in one of the data decks presented to the merge program had been adjusted and hence the data disagreed. So, the merge program was doing what it should. Based upon a combination of metadata and data agreement the probability the two records were distinct was sufficiently large to constitute a new station.

One of the issues arising from the historically fragmented way data has been held and managed is that many versions of the same station may exist across multiple holdings. Often the holding will itself be a consolidation of multiple other sources and, like a Russian doll - well, you get the picture - its a mess. So, in many cases we have little idea what has been done to the data between the original measurement and our receipt. These decks are given low priority in the merge process but ignoring them entirely would be akin to cutting one's nose off to spite one's face - they may well contain unique information.

To investigate this further and more globally we ran a variant of the code with only one line change (Jared will attest that my estimate of line changes are always an underestimate but in this case it really was one line). If the metadata and data disagreed strongly then we withheld the station. We then ran a diff on the output with and without. The check found solely stations that were likely bona fide duplicates (641 in all). This additional check will be enacted in the version 1 release (and hence there will be 641 fewer stations).

Are we there now? Not quite. We have still to do the blacklisting. This is labor-intensive stuff. We will have a post on this - what we are doing and how we are planning to document the decisions in a transparent manner - early next week time permitting.

We currently expect to release version 1 no sooner than February. But it will be better for the feedback we have received and the extra effort is worth it for a more robust set of holdings.