An exercise in text mining and distant reading: Helmut Schmidt’s visit to the Bundesbank in 1978
A visit of a German chancellor to the Bundesbank was unusual enough to be carefully recorded. The transcription of the discussion has now been declassified, which gave me the opportunity to try a detailed text analysis of the record. I am new to these tools and type of analysis, so comments are more than welcome.
In spite of my frustration coming from the little use I could make of online digital sources in writing A Europe Made of Money, here is an example of the sort of text mining I wish I could do more often. Why choosing this particular source and meeting? It may be obscure to the uninitiated, but German chancellors very rarely go to the Bundesbank. This is mostly due to the fact that the Bundesbank is jealously guarding its independence, and does not want to risk any sort of political interference with its policymaking. Helmut Schmidt, West German chancellor from 1974 to 1982, made an appearance at the Bundesbank on the occasion of the negotiations over the creation of the European Monetary System (EMS), on 30 November 1978. The Bundesbank was known to be unenthusiastic (to phrase it in a very diplomatic way) about the EMS; Helmut Schmidt came over to Frankfurt to try to convince the Bundesbank Council that the EMS was not something they should be opposed to. This visit, in itself, was therefore a significant event, and added some more weight to the importance of the EMS creation.
What is particularly interesting for the historian is that there exists a word-for-word transcript of this important meeting. When I went to the Bundesbank’s archives in 2008 to carry out my research, the archivist told me that I was the second one to look at this very record. The record had been left in the Emminger Nachlass, that is, the personal papers of the then head of the Bundesbank Otmar Emminger. The first one to see the record had been David Marsh for his book on the creation of the euro, and the document had then been passed on for translation in English and been made available online on the Margaret Thatcher’s archive website, both in translated and original form.
It is worth insisting on how fantastic a source this document is: it is a genuine verbatim of the audio record of the meeting. Everything that has been said has been put down in this text. While in the archives, I have been able to check it against the intermediary document, namely, the first transcription of the tapes that still contained approximations about the words that had been used. I could see that the transcription had therefore been extremely careful, and was reliable. On top of this, there is no OCR issue unlike in many other cases, as the version I used for this post is the English translation of the text, that is available on a webpage in a very neat version (unlike the photocopy of the German original document, the OCR of which does not look 100% reliable).
Methodology
I made use of iramuteq and R for the various statistics on the text. Frédéric Clavert helped me greatly in adjusting my formatting of the text (although the procedure in itself is not very complicated, there are a few tricks to be aware of, that I had overlooked, in particular the good use of variables). The software takes into account lemmatisation, that is, different forms derived from a word are considered to be the same word: “European” will thus be considered the same as “Europe.” Various posts by Frédéric proved very useful in working on the text, as well as the iramuteq’s users questions on sourceforge.net. I used Gephi for the visualisation of the relationships between words (similitude analysis); the website is explaining well how the software works, but Martin Grandjean’s introduction to Gephi and Clément Levallois’ series of tutorials are also very useful. Wordle helped me produce a nice word cloud. The text itself is composed of 22094 words, 936 hapax (forms that occur only once in the corups), that is, 4,24% of the whole corpus.
Word clouds
The easiest door of entry into text analysis is probably the word cloud (this one is done using Wordle). It is very nice, but it merely shows the respective weights of the words in the text. Little surprise given the nature of the meeting that words such as “Federal,” “Chancellor” and “Herr” feature so prominently. If anything the latter highlights the massive predominance of males in the central banking community.
For a reason that I ignore, the word cloud produced by iramuteq does not omit the word “but.” Yet interestingly this shows that it is the most widely used word in the text (“and”, “or” aside) and highlights the argumentative dimension of the discussion.
Similitude analysis
But what about the relationship between these different words? The similitude analysis (picture below – click on it to zoom in and see the details) shows specific words are linked to one another.
“Federal”, “chancellor” and “government” are so closely interlinked that they indeed overlap. Some nodes clearly stand out: one around “Herr” that mostly reflects the different governors that intervened in the discussion; one around “European” related to EEC matters; one around “policy” that nicely underscores the multi-dimensional nature of monetary affairs, as the link to “foreign” shows; and obviously a big one around “but” that directs to all sorts of arguments and qualifications that have been used in the discussion. A few of them are worth noting: the relationship with the dollar, the possibility to intervene in the system, or the issue of the exchange rate and the fight against inflation.
Profiles and dendogram
A further step can be to identify the groups of words that are frequently associated (profiles), and then to see how these words are associated (dendogram). The GNEPA method allows doing this in iramuteq. The profiles simply show which words are used most often together. Five profiles emerge from the text of the Schmidt meeting at the Bundesbank, as shown in the figure below.
Of the five classes that emerge, three come out very distinctively:
- in green (class 2), one group related to the Federal Republic of Germany (with words such as “minister,” “federal,” “government”);
- in black (class 3), a group concerning the technical working of the exchange rate system (“basket,” “rate,” “obligation,” “threshold”);
- and in red (class 1) the European dimension of German policy (“European,” “Italy,” “Community,” “stability,” “alliance”)
Two further groups exist:
- in purple (class 5) a class bringing together words related to the formal legal dimension (“treaty,” “write,” “freedom” – words such as “temporary” and “automatic” refer to the duty (and the interpretation of the degree to which this duty is binding) to intervene in the exchange rate system to defend a parity);
- and in blue (class 4) the international dimension (“America,” “Bretton Woods,” “trade,” “world”);
Classes 2 and 3 (black and green on the graph) are very close, indeed overlapping, which underscores that a central issue in the discussion was the intertwining of the technical working of the EMS and the FRG’s economy. Similarly, classes 1 and 4 (red and blue on the graph) are closely interlinked, highlighting that the EMS was a European, regional, response to international problems. By contrast, class 5, on the legal dimension, looks rather isolated – perhaps thereby reflecting the isolation of the topic itself in the discussions, mostly related to the legalistic interpretation of the working of the EMS.
The dendogram is a tree diagram that shows the hierarchical arrangement of the different groups of words. The group in black above is coloured here in a yellowish green (class 2, 11.7%). The dendogram built on the corpus is again interesting as it shows the very clear hierarchical relationships between the different profiles that was outlined above; in particular the relationship between classes 1 and 4.
Conclusions
Now the tricky but inescapable final question: is this analysis of the record of Schmidt’s visit to the Bundesbank using text mining analysis missing in my book? This “distant reading” analysis does not really change my previous “close reading” analysis of the meeting (pages 238-241); but it is clear that it provides a useful quantification of my interpretation, and, thereby, makes better and more concrete use of my source/proof. If writing vague qualifications – “often,” “frequently” and so on – certainly gives a good idea of the importance of a phenomenon, it is much better to be able to describe it precisely. The above graphs thus neatly represent the various aspects tackled in the discussion – European, international, domestic.
In addition to this, the visualisation of the results is clearly doing good justice to the findings. Again, giving orders of magnitude is definitely helpful and often enough; but being able to show this on a graph gives a much more tangible and understandable aspect to the findings. The pre-eminence of the word “but,” however odd, does highlight the argumentative dimension of the discussion. This visualisation can prove to be a very valuable asset: it may complement the analysis; it may help clarify the author’s prose (let’s face it: an academic’s writing is not always crystal clear); it may also help a reader better grasp an argument because he or she has a visual memory. Put differently: now that the tools exist, why not making use of them?
But it highlights also the limits and shortcomings of such a distant reading/text mining analysis. One of the most significant insights gained from a close reading of this text – and the one that I highlight most in my book – is the “European mantle” image used by Schmidt to talk about the relationship between West Germany and European integration. The Federal Republic still needed, Schmidt argued, the EEC and the EMS to “cover” its economic and political successes which, because of Germany’s recent tragic past, could not yet come out into the open. In qualitative terms, this is a fundamental insight gained from the reading of this meeting’s record. But this does not come up very well – if at all – in the text analysis above (you may spot it when zooming in the similitude analysis graph, in the top left hand corner). The reason for this is pretty simple: the words “European mantle” are not used often by Schmidt during his speech, and so their statistical weight does not correspond to their intellectual and symbolic weight. The so-called “Emminger letter” does not appear at all either, while it is another key element revealed by this meeting: the fact that Schmidt and Emminger had agreed from the start that the Bundesbank should not feel over-committed by its EMS participation and should be free to leave whenever it wanted. A close and careful combination of both distant and close reading therefore remains paramount to any text analysis.