Diving into Hillary Clinton’s emails is not easy, and is in urgent need of improvement
The State Department has released about 300 of Hillary Clinton’s emails on 22 May 2015 under the Freedom of Information Act (FOIA). The former State Secretary’s emails had provoked much debate earlier this year when it was revealed that she had used her personal account to conduct official government business, instead of using the State Department’s account. The emails are available in the Virtual Reading Room of the US Department of State’s website. They concern the attack on the US embassy in Benghazi that took place on 11 September 2012.
This collection of emails reflects many of the issues that I had discussed in an earlier post when the story was first revealed. It also highlights the frustrations usually associated with research into digitalised archives. These emails are not in their original form; they are instead what seems to be a scan of the emails’ printout. This printout has apparently been OCRed; and the text recognition seems to be of good quality (at least no major mistake appeared in the few instances where I checked). No explanation is however provided as to how these emails have been processed.
But apart from making ad-hoc searches into the whole set of emails, not much else is left to the reader. Emails can be downloaded individually; but I quite don’t see how they could all be downloaded together so as to be able to run a larger text analysis of the set using a standalone software. That option seems to have been offered only thanks to the Wall Street Journal, that makes available the whole batch of emails into a single PDF document on its website. The inability to download the whole set of emails is likely to be the major issue in the case of future releases.
The search options present other weaknesses. I may be being blind but I could not find a way to select the emails only by “sender” – for instance, to see only the emails sent by Hillary Clinton herself. Of course it is possible to rank them alphabetically by clicking on the “from” button, but nothing more. If typing “Hillary Clinton” in the text box search, all occurrences of the name in the whole body of emails will show up, but will not isolate them by sender. More generally, we cannot know what has been left out of the selection – but this last problem is admittedly common to any sort of archival research.
All these issues are already a nuisance for about 300 emails. But they may hinder any serious research for the whole lot of messages that were mentioned at the beginning of the “emailgate” (55000 pages!). On the day of Clinton’s emails online publication, the Wall Street Journal organised a “Live Dive” into the messages. This is certainly doable for 296 emails; this will be quite of a diving adventure for the tens of thousands of messages that are to be released on 15 January 2016. The WSJ has further created a gizmo to allow readers to tag the emails themselves. As it stands, research tools into the documents therefore rely heavily on the initiatives of a private company, the WSJ. Any research into Hillary Clinton’s emails published in the conditions that the State Department’s FOIA website currently offers, risks to be at best impressionistic, at worst useless.