Thursday, December 11, 2014

Why we need max and min temperatures for all stations

I'm doing an analysis of Diurnal Temperature Range (DTR; more on that when published) but as part of this I just played with a little toy box model and the result is sufficiently of general interest to highlight here and maybe get some feedback.

So, for most stations in the databank we have data for maximum (Tx) and minimum (Tn) that we then average to get Tm. Now, that is not the only transform possible - there is also DTR which is Tx-Tn. Although that is not part of the databank archive its a trivial transform. In looking at results running NCDC's pairwise algorithm distinct differences in breakpoint detection efficacy and adjustment distribution arise, which have caused great author team angst.

This morning I constructed a simple toy box where I just played what if. More precisely what if I allowed seeded breaks in Tx and Tn in the bound -5 to 5 and considered the break size effects in Tx, Tn, Tm and DTR:
The top two panels are hopefully pretty self explanatory. Tm and DTR effects are orthogonal which makes sense. In the lowest panel (note colours chosen from colorbrewer but please advise if issues for colour-blind folks):
red: Break largest in Tx
blue: Break largest in Tn
purple: break largest in DTR
green: break largest in Tm (yes, there is precisely no green)
Cases with breaks equal in size are no colour (infintesimally small lines along diagonal and vertices at Tx and Tn =0)

So …

if we just randomly seeded Tx and Tn breaks in an entirely uncorrelated manner into the series then we would get 50% of breaks largest in DTR and 25% each in Tx and Tn. DTR should be broader in its overall distribution and Tm narrower with Tx and Tn intermediate.

if we put in correlated Tx and Tn breaks such that they were always same sign (but not magnitude) then they would always be largest in either Tx or Tn (or equal with Tm when Tx=Tn)

If we put in anti-correlated breaks then they would always be largest in DTR.

Perhaps most importantly, as alluded to above, breaks will only be equal largest for Tm in a very special set of cases where Tx break = Tn break. Breaks, on average will be smallest in Tm. If breakpoint detection and adjustment is a signal to noise problem its not sensible to look where the signal is smallest. This has potentially serious implications for our ability to detect and adjust for breakpoints if we limit ourselves to Tm and is why we should try to rescue Tx and Tn data for the large amount of early data for which we only have Tm in the archives.

Maybe in future we can consider this as an explicitly joint estimation problem of finding breaks in the two primary elements and two derived elements and then constructing physically consistent adjustment estimates from the element-wise CDFs. Okay, I'm losing you now I know so I'll shut up ... for now ...


Bonus version showing how much more frequently DTR is larger than Tm:



  1. I am too stupid to understand these plots or it the resolution not sufficient. (Is there data somewhere?)

    Anyway the hopefully productive comment I wanted to make is that the update may be biased. Detection is performed on Tm (not on DTR, right?). Thus there may be missing breaks with a big DTR and a small Tm jump.

    For Germany we found that DTR has a better signal to noise ratio than Tm. Thus we should probably use it more for detection of breaks.

  2. Victor,

    there are no data so I suspect I didn't make this clear enough. I do have data that shows the results of applying PHA to the databank. It was trying to explain these that led me to step back and consider a toy model. What would the breaks in the primary elements and the two derivatives be if I allowed the break in each of the primary elements to be any value within some reasonable bounds?

    My point is that we may be looking in precisely the wrond direction if we use Tm. Tm unless the breaks in Tx and Tn are almost perfectly correlated will always be the least optimal of the 4 possible parameters to look at unless it somehow has some magical SNR aspect that means the noise in pairwise Tm comparisons is considerably lower than in DTR, Tx or Tn. I find this doubtful. If we want to catch and correct breakpoints properly arguably we should be looking at this as a multi-element problem.

  3. Okay, it was just a theoretical argument. Then I lost you at the update: "Bonus version showing how much more frequently DTR is larger than Tm". That sounds like an empirical claim for which you need data.

    1. Ah, that was just on the basis of a random draw. If I randomly draw a pair of Tx and Tn breaks then it is in effect a biased coin toss chance of Tm or DTR being greatest. There is a 2/3 chance of DTR being greater than Tm for any random draw pair of Tx and Tn breakpoints.

      However, given the toy model basis in this post you could actually look at the counts and magnitudes of breaks in different elements returned when applied to real-world data to say something. If I ran PHA on all four and I found more and larger breaks in Tm than DTR then I would conclude that the breaks in Tx and Tn strongly tend to be correlated etc. etc.

  4. There will also be certain types of inhomogeneities which cause opposite shifts in Tx and Tn, and thus a weak or no signal in Tm as the Tx/Tn changes cancel each other out. The most obvious example would be a site move towards (away from) a coastline, which would be expected to result (all other things being equal) in a decrease (increase) in Tx and an increase (decrease) in Tn. Depending on the site and local topography, you could also get a similar signal with a move from a valley site to a more exposed, elevated site (with lower Tx, and depending on site exposure potentially higher Tn).

    1. Agreed. Assuming that the performance of an algorithm is roughly equal across elements (e.g. the signal to noise ratio of the pairwise difference series is of similar magnitude) then one could use the resulting population size and distribution to try to ascertain an over-arching tendancy whether breaks are same sign or opposite sign in Tx and Tn and how strong that overall tendancy is.

  5. First of all, thank you for this interesting observation. Please forgive me if my comments start at the most basic points before they get around to the detailed comment.

    Firstly, the reason why this matters.
    Weather stations do not record the average temperature of the air above the region of the Earth’s surface that they have been chosen to represent. This is the fundamental problem.
    • At any instant of time, the particular distribution of air temperatures in a region a few kilometres by a few kilometres in extent is probably ‘a few’ degrees. For reasons of being specific, let’s call the actual range 5 °C i.e. ± 2.5 °C.
    • Well-sited weather stations are likely to read temperatures near the middle of the range – i.e. they will ideally avoid the extremes of the temperature distribution.
    • Now consider the daily maxima, minima and mean readings from the station. Are these representative of the daily maxima, minima and mean readings in the area they have been chosen to represent? Typically they will have some relationship to the true (and unobtainable) averages of maxima, minima and daily means e.g. the station will be in the 67th percentile of the local distribution.

    In trying to estimate past climates by examination of meteorological data, we are assuming that the ‘representativeness’ of each station has remained the same over the period of the record. Because the variability within an area is so large, this key assumption is critical to the meaningfulness of the endeavour. If the representativeness of a weather station has changed then it is possible to record spurious changes in averaged readings which are not related at all to climatological changes.

    This is why homogenisation is so critical to the validity of conclusions drawn from analysis of meteorological data. Homogenisation can be seen as a search within the data itself for signs of non-climatological changes.

    Secondly, implications of your experiment.
    If we focus on the idea that a weather station data needs to maintain its ‘degree of representativeness’, then we really want to find the most sensitive ways to detect any potential change in this ‘degree of representativeness’. So it seems very interesting that running the PHA on Tmax or T min gives more candidate breakpoints.

    However in addition to wanting to optimally homogenise the data, we also want to conservatively homogenise the data. In other words we don’t want to see breakpoints where there are none.

    So let us consider the case where a ‘breakpoint’ is detected in Tmax or T min, but not in Tmean. As I understand it, the statistical tests in the PHA would then be applied to Tmean and it would then (very likely) be rejected as a breakpoint. Why rejected? Because if it was statistically significant in Tmean it would most likely have been picked up in the initial PHA search for candidate breakpoints.

    So what can be done with this extra sensitivity to candidate breakpoints? Well I guess the key thing is that if there is a detectable imhomogeneity in Tmax or Tmin but not Tmean, then that could signal a change in the ‘degree of representativeness’ of T mean and thus is a cause for concern. Ideally, as you say, we would run the PHA across the each of the data series and so yes, keeping Tmax and Tmin could be potentially very significant.

    Re-reading this seems only just coherent. Does it make sense to you?

    1. It makes sense to me at least :-)

      In all applications of PHA there is a 'missing middle' of small breaks. We know that the real-world break distribution will not look like a bivariate distribution. The typical distribution of PHA adjustments looks like a normal distribution with a bite taken out of the middle. In the Willett et al humidity analyses the populated wings of the distribution can be used to fit a true distribution that is underlying. See e.g.

      One way of looking at the problem is that these undetectable breaks still 'matter'. They are not small. Typically for surface temperatures there are few adjustments applied smaller in magnitude than 0.5C but that is still a large break vis-a-vis the multi-decadal signal. By looking at Tx, Tn, their mean and their difference in theory we could actually fill in some of this missing middle for each element by applying breaks that minimise some penalty functions that include the need to average and difference appropriately to retain consistency and as such get a fairer assessment of the truth.

      Could ...