last updated Feb 28, 2016; originally written July 1, 2015
To start with, you will need to have a Twitter account and obtain credentials (i.e. API key, API secret, Access token and Access token secret) on the Twitter developer site to access the Twitter API, following these steps:
We will be using a Python library called Python Twitter Tools to connect to Twitter API and downloading the data from Twitter. There are many other libraries in various programming languages that let you use Twitter API. We choose the Python Twitter Tools for this tutorial, because it is simple to use yet fully supports the Twitter API.
Download the Python Twitter tools at https://pypi.python.org/pypi/twitter.
Install the Python Twitter Tools package by typing in commands:
The Streaming APIs give access to (usually a sample of) all tweets as they published on Twitter. On average, about 6,000 tweets per second are posted on Twitter and you (normal dev users) will get a small proportion (<=1%) of it. The Streaming APIs are one of the two types of Twitter APIs. The other one called REST APIs (we will talk about later in this tutorial), which is more suitable for singular searches, such as searching historic tweets, reading user profile information, or posting Tweets. The Streaming API only sends out real-time tweets, while the Search API (one of the popular REST APIs) gives historical tweets up to about a week with a max of a couple of hundreds. You may request elevated access (e.g. Firehose, Retweet, Link, Birddog or Shadow) for more data by contacting Twitter’s API support.
Create a file called twitter_streaming.py, and copy the code below into it. Make sure to enter your credentials obtained in the Step 1 above into ACCESS_TOKEN, ACCESS_SECRET, CONSUMER_KEY, and CONSUMER_SECRET.
If you run the program by typing in the command:
You will see tweets keep flowing in your screen. They are a sample of public data (including tweets and also events) flowing though Twitter at the moment. The data returned is in JSON format. It may looks too much for now; it will become clearer in the next step how to read and process this data. Below is one example tweet:
and its JSON format (often output without line breaks to save space but difficult to read and make sense of — but you can do pretty printing to include line breaks to make it more readable as shown in Step 5) :
You can run the program and save the data into a file for analysis later using the following commend:
The streaming API provides more advanced functions.
First, you can set different parameters (see here for a complete list) to define what data to request. For example, you can track certain tweets by specifying keywords or location or language etc. The following code will get the tweets in English that include the term “Google”:
Location is a bit tricky. Read here for a simple guide, and here for a complete guide. Find tweets by location can be done either by the Streaming API (only geolocated tweets) or the Search API (user’s location field is also used).
Second, by default, streaming API is connecting to the “public streams” — all public data on Twitter as we showed in the above example. Also, there are “user streams” and “site streams” that contains the data specific to the authenticated user or users (see here for more details). For conducting research on Twitter data, you usually only need to use “public streams” to collect data. In case you do need to use other streams, here is how to specify it:
The streaming API returns tweets, as well as several other types of messages (e.g. a tweet deletion notice, user update profile notice, etc), all in JSON format. Here we demonstrate how to read and process tweets in details. Other data in JSON format can be processed similarly.
Tweets, also known more generically as “status updates”. This map made by Raffi Krikorian explains a tweet in JSON format:
Although this map is from 2010 and a bit out-of-date, it is a good visualization of tweet’s JSON format. You can find the up-to-date information of tweet’s format here.
Use Python library json or simplejson to read in the data in JSON format and process them:
If, instead of the file twitter_stream_1000tweets.txt, use the example tweet in JSON format in the Streaming API section, this piece of code will output the following results:
Note that the same url will have a few different versions in the Twitter stream: http://t.co/rcygyEowqH in the text, http://noisy-text.github.io as the expanded full version, noisy-text.github.io as the display version.
For long-term data collection, you can setup a cron job. If you are interested in running a long term collection of one or multiple streaming queries, consider using Mark Dredze’s Twitter streaming library. This library wraps the basic streaming API with several helpful features, such as organizing data into files by data, support for multiple feed types, and ensuring feeds remain active after interruptions. You can run this library inside of crontab or supervisord.
Besides the streaming APIs, Twitter also provide another type of APIs — REST APIs. It provides two main functionalities: GET data from Twitter and POST data (e.g. a tweet from your account) to Twitter. In this tutorial, we will demonstrate three most useful APIs to collect data for social media research: Search (tweets contain certain words), Trends (trending topics) and User (a user’s tweets, followers, friends, etc.). For explanations of these key types of data offered by Twitter, see the lecture slides on this course website.
Similar to the Streaming API, you first import necessary Python packages and OAuth credentials as in Step 2. Then you can use search API like follows:
Alternatively, you can search with more parameters (see a full list here). For example, search for 10 latest tweets about “#nlproc” in English:
Twitter provides global trends and as well as localized tweets. The easiest and best way to see what trends are available and the place ids (which you will need to know to query localized trends or tweets), is by using this commend to request worldwide trends:
It returns all the trends that are offered by Twitter at the time. Here is part of the returned results:
The places ids are WOEIDs (Where On Earth ID), which are 32-bit identifiers provided by Yahoo! GeoPlanet project. And yes! Twitter is very international.
After you know the ids for the places you are interested in, you can get the local trends like this:
The trends will be returned in JSON, again. This time we reformat JSON data in a prettier way to make it easier to read by human beings:
If you want to get the real tweets in each trend, use the Search API to get them.
How often do the Twitter trending topics change? It is not disclosed by Twitter, but based on my experience, you will get most of them by querying every 5 minutes.
Another popular use of API is to obtain the social graph of users’ followers and friends, as well as a particular user’s tweets. Below we show two common usage examples of the User APIs:
Unlike Streaming API, REST APIs have a strict rate limit on how many requests you can send given a time limit and how many tweets you can get access to for each request (and it got stricter and stricter in the past). The limits vary from one API function to another. Twitter’s dev website give a list of the rate limits here.
You can also query the API to check your remaining quota, though you may rarely use this command:
This tutorial is meant to help you to start. To learn more about Twitter APIs, here are two ways I found quite sufficient:
Check more example Python scripts that demonstrate interactions with the Twitter API via the Python Twitter Tools: https://github.com/ideoforms/python-twitter-examples