Mapping WCIT with Twitter: Issue and Hashtag Profiles
Noortje Marres, Carolin Gerlitz, Esther Weltevrede, Erik Borra, Bernhard Rieder
In November 2012, we started capturing all tweets referring to the World Conference on International Telecommunications (WCIT-12) in Dubai, with the aim of finding out what issues were emerging as relevant in relation to this event. WCIT, a global summit hosted from 3-14 December 2012 by the ITU, the UN body for the regulation of telecommunications, was widely expected to make potentially important decisions about the governance of the internet. To get a sense of the relevant issue language, we invited a number of advocates and experts engaged with internet governance and activism to tell us what the most important issues were in relation to WCIT. By the time the summit started, we had received a total of 90 suggested issue terms.
Below we present the Twitter profiles of a selection of these issue terms. We produced the profiles by querying our data set of all tweets that contain the word WCIT, WCIT12 or ITU for these terms during the relevant period (108781 tweets between 23/11/12 and 19/12/12). To analyse this data, we used an experimental online tool for textual analysis, called the Associational Profiler (aka the Co-word Machine), which we are currently developing. Our prototype tool at the moment only accepts hashtag queries, and we therefore identified a corresponding hashtag for each of the issues suggested by our respondents for profiling purposes: human rights became #humanrights, deep packet inspection became #dpi, and so on. In addition, we produced profiles for hashtags that are relevant in relation to WCIT according to Twitter itself. That is, we also profiled hashtags that are most often used together with other hashtags in our WCIT data set. These are hashtags with high overall co-occurrence frequency within the WCIT Twitter data set (top 20). For an overview of the profiled issue terms and hashtags, see the Appendix below.
In order to analyse issue dynamics over time, we carved up the Twitter data set in four intervals: before the summit, first week of summit, second week of summit and after the summit. Our profiling tool determines what other hashtags a given hashtag is associated with in each of the four intervals, and shows how frequently these words and hashtags co-occur during these periods. The resulting profiles provide an indication of the overall activity of the hashtag over the course of the WCIT summit.
As we worked with Twitter data in this initial study, a key question for us is whether and how Twitter can be used for the analysis of wider issue dynamics. Can the detected hashtag patterns tell us something about wider issue dynamics in relation to WCIT or do they primarily inform us about the role of Twitter in communicating the summit?
Profiling WCIT Hashtags
The above figure visualizes the profile for #WCIT, showing which words are most prominently associated with this hashtag on Twitter over the course of the summit. Two types of hashtags feature prominently in the #WCIT profile: campaign hashtags and institutional hashtags. The first group includes hashtags associated with civil society campaigning (#netfreedom, #ituvideo), hashtags associated with the activist group Anonymous (#opwcit, #anonymous, #opbigbrother), and one connected with the company Google which had launched its own WCIT advocacy campaign (#freeandopen). The second set includes references to organisations: #icann (the US-based private organisation that manages internet addresses), #isoc (the Internet Society)) and regulation (#itrs or #itr). Only a few of the issue terms suggested by experts and advocates figure prominently in the WCIT profile: deep packet inspection (interval 2); internet regulation (itrs in interval 3 and 4) and censorship (interval 4). Overall, the #WCIT profile suggests that campaigning activity dominated Twitter during the summit. However, methods of hashtag profiling can also be used to investigate specific issue terms in more detail.
Using hashtags for issue analysis
Again, the overall profile for #WCIT shows that issue terms are _not_ the most prominent hashtags in our Twitter data set. Of the suggested issue terms, a limited number feature in the WCIT profile above: regulation, censorship, surveillance, dpi, etno, governance, human rights, security. Some of the issue terms provided by experts and advocates don’t have a corresponding hashtag profile at all: senderpays, personal data, transparency (note that this could change if we looked at words rather than hashtags).
However, when we consider the associational profiles of specific hashtags rather than the general #WCIT, the issue terms prove rather more interesting. Some of the not so frequently occurring terms, it turns out, have especially active and diverse profiles on Twitter. Below we present some initial analysis of these hashtag profiles, focusing in particular on hashtag activity, its associations, and how these vary over time.
Hashtag profiling may provide a way of answering our research question above: how much can hashtags tell us about Twitter, and how much about wider issue dynamics? As we will show, some of the dynamics disclosed by hashtag profiles are relatively specific to Twitter, such as the prominence of campaign words like #anonymous. In other cases there seems to be an interplay between Twitter- and issue dynamics, as in the case of #netfreedom shown below. While this term itself may be fairly specific to Twitter, its profile contains issue terms with wider resonance (censorship, human rights, surveillance, regulation). We therefore are confident that at least some issue dynamics can be gauged by studying Twitter.
1. Variation over time
The first thing to note is that hashtag profiles show distinct temporal patterns. While some topics were especially relevant before the summit started, others only gained prominence once the summit was well underway.
1.1. Before the summit
Some hashtags are at their most lively in advance of the event, whilst declining in frequency and connection once the summit started, for instance: #netneutrality and #freedom.
In the case of #netneutrality, note that the term starts out with a relatively granular and diverse profile, including specific organisations ( #ebu, a european public media organisation and #ep, the european parliament) and technical terms (internet roaming, dpi). Later on more general hashtags take over (dubai, itu (in french) and and the general progressive hashtag p2).
This declining specificity of the issue profile can also be expressed numerically, and this is indicated in the figure with colour coding: the red in the first interval indicates that the words in the profile are now relatively specific to #netneutrality: they occur more often with that hashtag than with others in the WCIT dataset. This specificity decreases in the later intervals, which may offer an additional indication of declining issue activity. The profile shows very little continuity.
1.2. Beginning of summit
A large number of hashtags are especially active at the beginning of the summit. This applies to media campaign words like #anonymous, #freeandopen, #ituvideo, #opbigbrother, #opwcit, but also to issue terms like #telecommunications, #humanrights, #access, #censorship, #dpi, #cybersecurity and #privacy. These terms were often mentioned at first, but do not endure throughout the event (some of these profiles are included further below).
1.3. End of summit
Other hashtags only become prominent towards the end of the conference, this includes the campaign hashtags #handsoffinternet, the spanish equivalent #manosfueradeinter, #spam as well as the issue term censorship (see further down) and the German censorship hashtag #zensur. Institutional hashtags including #icann, #isoc, #itr, #itrs, #dubai and #etno (The European Telecommunications Network Operators’ Association, a lobby group) gain increasing prominence in the later stages of WCIT or even thereafter.
1.4. Continuous or returning hashtags
Hashtags that have lively associations throughout the entire event are: #netfreedom, #governance and #un. In a significant number of tweets, governance is repeatedly mentioned with other hashtags: #wcit #wake #up #call #time to #broaden the #discussion on #internet #governance #newworldorder. This type of tweet that functions as a call for action during the event which creates strong co-word connections between the hashtags on a highly specific level. Looking at the data it turned out that this particular tweet is retweeted by a diverse set of users.
A few hashtags have most lively association only before and after the event. This includes #regulation, which contrasts interestingly with governance. The only continuing association for #regulation is tech. It is also associated with words that are not unique to the summit but refer to previous internet governance issues such as #acta.
2. Hashtag and issue composition
It is not just the activity of hashtags that varies over time, there are also relevant differences in terms of the associations composing the hashtag profile. Whilst some hashtags are related to only a few words that occur frequently, other hashtags come with multiple and granular associations. Profiles may also vary in their thematic coherence. Campaign hashtags such as #opwcit, #anonymous, #freeandopen and #ituvideo occur frequently, but they mainly appear with other campaign words.
Privacy is connected to both campaign words and other issue terms (security, dpi, surveillance, human rights). Its profile is strongest in the second week of the conference, with surveillance and the more generic netfreedom the only remaining issue terms after that.
As mentioned, most campaign hashtags are rather thinly constituted (With #anonomyous connected with a string of operation words (opwcit, opbigbrother, and so on). A few divert from this pattern, such as #handsoffinternet. As mentioned, this hashtag gained lively and specific connections only towards the end of the summit, but it also connects with a range of campaigning issues including: #censorship, #petition, #stop, #freespace and #netfreedom, followed by a request to follow on twitter, suggesting an action oriented hashtag.
#dpi (deep packet inspection) suggests a lively hashtags both in terms of overall activity and its constituent word associations over time. This hashtag brings together some major campaign words including censorship, privacy, surveillance, un, all with low specificity, but it also comes with more specific connections to more technical terms like y2270, wtsa, rfid, cctv, spying and drones. Interestingly, #dpi is not connected at all before the event. In the last week of the summit, the key specific connections are leak and lobbyism, which overtake issue-related and technical framings of the issue, suggesting a political event. Note that overall specificity is fairly low, signalling possibly that this is not a “very commanding” issue.
IMG #dpi (deep packet inspection)
Thematically more diverse and granular associations can be found across the intervals for the hashtag #censorship, which connects campaign words, countries and issue terms. Here, the blue colour of the bars indicates non-specific relations between #censorship and the campaign words.
Fittingly, the generic sounding hashtag #internet has a fairly generic composition, as it brings together the majority hashtags, ie those with the most frequent co-hashtag connections: campaign words and key actors, without containing the more specific actors. The blue colour of the bars indicates that these connections, whilst being frequent, are not very specific, suggesting that #internet functions as a loosely applied framing hashtag.
To conclude, #spam, another expert suggested hashtag, has a low number of co-word connections and low continuity across time, and as such may appear fittingly disorganised. However, the terms making up its profile are interestingly technical (such as osi: open systems interconnections) and also include overarching words like security and censorship. The technical term (osi) has fairly high specificity which makes sense, Overall the profile signals low but steady twitter activity. Could we say this matches the profile of a policy issue: fairly technical, some relevance and steady but low levels of public mobilization?
Profiling WCIT with Twitter brings into view some overall issue and Twitter dynamics in relation to this event. On the most basic level, there are hashtags with profiles, and those without, i.e. hashtags that aren’t significantly connected with other hashtags (such as #senderpays). There are some clear temporal dynamics, with some issues especially active before the summit (#netneutrality), others during (#governance) and a few only wake up towards the end (handsofftheinternet). There are also significant differences when it comes to profile composition. Some profiles are much more granular than others: #dpi for instance is much more widely and intricately connected than #spam. And while some hashtags are mostly associated with campaigns (again: see #governance), others have a more technical profile.
Perhaps most interestingly, profiling hashtags with Twitter provides a way to identify happening words and issues, and a way to determine _how_ they are happening. Based on the above, we define happening words as terms that are active, i.e. highly associated with other terms across the intervals, and granular, ie occurring with many different terms. The thematic make-up of profiles is also relevant in this respect: some hashtags are primarily associated with issue language (surveillance, censorship), while others have a higher technical content (rfid, roaming, fitrage), organisational (ep, etno) or political (lobbyism), and some with most of the above.
As mentioned, the question is what this says about Twitter and what it can tell us about wider issue dynamics in relation to WCIT and internet politics. Some profiling effects are probably specific to the medium (eg the prominence of media campaign words). But the rise of fall of issue terms like netneutrality and dpi are likely to be related to wider political dynamics relating to the summit. To clarify this furhter, we will need to expand our analysis, for instance by considering how hashtags are associated with external sources by way of hyperlinks in tweets, or by comparing issue profiles on Twitter with the profiles of similar issue terms in the news or summit documents.
|Terms suggested by issue experts and advocates (top 20)
|Hashtags with top overall co-word frequency (top 20)
|deep packet inspection
|freedom of expression
|cost of service