Understanding Hsiang and Sekar's analysis

My previous post critiqued Hsiang and Sekar’s paper that claims that the legal sale of ivory in 2008 led to a sharp increase (or discontinuity) in elephant poaching between 2007 and 2008. Specifically for this post, the critique stated that their analyses were flawed. Following this Hsiang and Sekar published a response. Their replication files are also available.

As a member of the MIKE-ETIS Technical Advisory Group I was asked if I could look in more detail at their work so we could understand why their analysis differed from other analyses of the MIKE data (they claim it is because we smoothed) and their claim that a simple summary of the data shows the discontinuity. This report describes what I have found. I outline the key point below but if you are interested I would suggest that you look at the report because it leads you, in non-technical language, through the whole argument.

The data, PIKE, are the proportion of all elephant carcasses found at a site that were illegally killed. The total number of carcasses found at each site differs hugely both between sites and over time. When dealing with proportions it is essential that this is accounted for in any summaries or analyses of the data (details in the report).

PIKE – straight average (black, solid, filled dots) and weighted average or aggregate (blue,dashed,open line)

If you do account for variability in number of carcasses found when summarising the data you get the blue dashed line in the plot above rather than the black line. The black line is Hsiang and Sekar’s average PIKE across all sites in each year where they have ignored differences between sites and which they say shows the discontinuity between 2007 and 2008. The blue line – they also present in their paper – is a weighted average of the data, the correct way of summarising proportions. or as Hsiang and Sekar call it the aggregate and does not show the discontinuity between 2007 and 2008.

If you account for the variability in the number of carcasses found when modelling the data you get the three lines that pretty much coincide in the graph below – rather than the black line of Hsiang and Sekar. I have used three different strategies to fit very simple models ignoring smoothing or the fact that some sites come from the same country (details are in the report and code ~~will follow~~ is here).

Simple models of PIKE data. Hsiang & Sekar regression (black), weighted regression (blue), Binomial (red), Poisson (purple)

The Binomial model is the correct way to model proportions (as noted in my previous blog) and this approach is widely used in many fields of application and is uncontroversial. The other two are approximations and are reasonable approaches for producing simple models of proportions. The fact that they all tell the same story gives confidence in these results.

None of these three models show the discontinuity between 2007 and 2008.

To conclude – it is not the fact that we have smoothed our estimates (as Hsiang and Sekar claim) that leads to the discrepancy between their results and all other analyses of MIKE data. As I have shown above, and described in this report, simple analyses of the data show similar results to other analyses of the MIKE data and no discontinuity. The discrepancy (and discontinuity) is because Hsiang and Sekar have ignored a fundamental feature of the data, that they are proportions(*) and the number of carcasses found varies hugely between sites and over time.

(*) See p53 of Dave Collett’s book Modelling Binary Data for a more technical explanation of why you have to be careful modelling proportions using linear regression.

Share this via:

Comments

Solomon Hsiang says

Tuesday 20th September 2016 at 19:09

We have responded to the updated set of arguments presented in Dr. Underwood’s blog posts at

http://www.g-feed.com/2016/09/normality-aggregation-weighting.html

where we formally derive various errors in Dr. Underwood’s proposals and test necessary assumptions against the data. This is the summary conclusion from our post:

“The three critiques/suggestions offered by Dr. Underwood are not logically consistent with themselves, since they make contradictory assumptions about whether PIKE corrects for elephant populations/surveyor effort and whether or not the total count of carcasses discovered at each site is actually a random variable or not. Furthermore, each of the three points raised is itself independently erroneous, either because the mathematical assumptions Dr. Underwood makes cannot possibly be true or because these assumptions are clearly overturned by the data. We therefore conclude that none of the critiques offered by Dr. Underwood are valid.”

Understanding Hsiang and Sekar’s analysis

Recent Posts

Topics I blog about