To start with, you will need to have a Twitter developer account and obtain credentials (i.e. API key, API secret, Access token and Access token secret) on the to access the Twitter API, following these steps:
A pop up window will appear for reviewing Developer Terms. Click the “Create” button again.
In the next page, click on “Keys and Access Tokens” tab, and copy your “API key” and “API secret” from the Consumer API keys section.
Scroll down to Access token & access token secret section and click “Create”. Then copy your “Access token” and “Access token secret”.
2. Installing Twitter library
We will be using a Python library called Tweepy to connect to Twitter API and downloading the data from Twitter. There are many other libraries in various programming languages that let you use Twitter API. We choose the Tweepy for this tutorial, because it is simple to use yet fully supports the Twitter API.
Install tweepy by using pip/easy_install to pull it from PyPI:
You may also use Git to clone the repository from GitHub and install it manually:
3. Connecting to Twitter Streaming APIs
The Streaming APIs give access to (usually a sample of) all tweets as they published on Twitter. On average, about 6,000 tweets per second are posted on Twitter and you (normal dev users) will get a small proportion (<=1%) of it. The Streaming APIs are one of the two types of Twitter APIs. The other one called REST APIs (we will talk about later in this tutorial), which is more suitable for singular searches, such as searching historic tweets, reading user profile information, or posting Tweets. The Streaming API only sends out real-time tweets, while the Search API (one of the popular REST APIs) gives historical tweets up to about a week with a max of a couple of hundreds. You may request elevated access (e.g. Firehose, Retweet, Link, Birddog or Shadow) for more data by contacting Twitter’s API support.
Basic Uses of Streaming APIs
Create a file called twitter_streaming.py, and copy the code below into it. Make sure to enter your credentials obtained in the Step 1 above into ACCESS_TOKEN, ACCESS_SECRET, CONSUMER_KEY, and CONSUMER_SECRET.
If you run the program by typing in the command:
You will see tweets from your homepage in your screen. They are most recent statuses, including retweets, posted by the you and that your friends. The data returned is in JSON format. It may looks too much for now; it will become clearer in the next step how to read and process this data. Below is one example tweet:
and its JSON format (often output without line breaks to save space but difficult to read and make sense of — but you can do pretty printing to include line breaks to make it more readable as shown in Step 5) :
You can run the program and save the data into a file for analysis later using the following commend:
Advanced Uses of Streaming APIs
The streaming API provides more advanced functions.
First, you can set different parameters (see here for a complete list) to define what data to request. For example, you can track certain tweets by specifying keywords or location or language etc. The following code will get the tweets in English that include the term “google”:
Location is a bit tricky. Read here for a simple guide, and here for a complete guide. Find tweets by location can be done either by the Streaming API (only geolocated tweets) or the Search API.
Second, by default, streaming API is connecting to the “public streams” — all public data on Twitter as we showed in the above example. Also, we can use the streaming api to get tweets by a specific user. The follow parameter inside the filter fucntion can take an array of IDs to stream.
4. Reading and Processing Tweets in JSON format
The streaming API returns tweets, as well as several other types of messages (e.g. a tweet deletion notice, user update profile notice, etc), all in JSON format. Here we demonstrate how to read and process tweets in details. Other data in JSON format can be processed similarly.
Tweets, also known more generically as “status updates”. This map made by Raffi Krikorian explains a tweet in JSON format:
Although this map is from 2010 and a bit out-of-date, it is a good visualization of tweet’s JSON format. You can find the up-to-date information of tweet’s format here.
Use Python library json or simplejson to read in the data in JSON format and process them:
If, instead of the file twitter_stream_200tweets.txt, use the example tweet in JSON format in the Streaming API section, this piece of code will output the following results:
Note that the same url will have a few different versions in the Twitter stream: http://t.co/rcygyEowqH in the text, http://noisy-text.github.io as the expanded full version, noisy-text.github.io as the display version.
For long-term data collection, you can setup a cron job. If you are interested in running a long term collection of one or multiple streaming queries, consider using Mark Dredze’s Twitter streaming library. This library wraps the basic streaming API with several helpful features, such as organizing data into files by data, support for multiple feed types, and ensuring feeds remain active after interruptions. You can run this library inside of crontab or supervisord.
5. Using Twitter Search API, Trends API, and User API
Besides the streaming APIs, Twitter also provide another type of APIs — REST APIs. It provides two main functionalities: GET data from Twitter and POST data (e.g. a tweet from your account) to Twitter. In this tutorial, we will demonstrate three most useful APIs to collect data for social media research: Search (tweets contain certain words), Trends (trending topics) and User (a user’s tweets, followers, friends, etc.). For explanations of these key types of data offered by Twitter, see the lecture slides on this course website.
Search API
Similar to the Streaming API, you first import necessary Python packages and OAuth credentials as in Step 3 and create the api. Then you can use search API like follows:
Alternatively, you can search with more parameters (see a full list here){:target=”_blank”}. For example, search for 10 latest tweets about “#nlproc”:
Alternatively, you can search with more parameters (see a full list here. For example, search for 10 latest tweets about “#nlproc” in English:
Trends API
Twitter provides global trends and as well as localized tweets. The easiest and best way to see what trends are available and the place ids (which you will need to know to query localized trends or tweets), is by using this commend to request worldwide trends:
It returns all the trends that are offered by Twitter at the time. Here is part of the returned results:
The places ids are WOEIDs (Where On Earth ID), which are 32-bit identifiers provided by Yahoo! GeoPlanet project. And yes! Twitter is very international.
After you know the ids for the places you are interested in, you can get the local trends like this:
The trends will be returned in JSON, again. This time we reformat JSON data in a prettier way to make it easier to read by human beings:
and get:
If you want to get the real tweets in each trend, use the Search API to get them.
How often do the Twitter trending topics change? It is not disclosed by Twitter, but based on my experience, you will get most of them by querying every 5 minutes.
User API
Another popular use of API is to obtain the social graph of users’ followers and friends, as well as a particular user’s tweets. Below we show two common usage examples of the User APIs:
Rate Limit
Twitter’s streaming API allows a limited number of attempts to connect. If clients exceed the maximum allowed attempts in a window of time, they will receive error 420. The amount of time a client has to wait after receiving error 420 will increase exponentially each time they make a failed attempt.
The REST APIs have a strict rate limit on how many requests you can send given a time limit and how many tweets you can get access to for each request (and it got stricter and stricter in the past). The limits vary from one API function to another. Twitter’s dev website give a list of the rate limits here.
You can also query the API to check your remaining quota, though you may rarely use this command:
6. Learning More about Twitter APIs
This tutorial is meant to help you to start. To learn more about Twitter APIs, here are two ways I found quite sufficient:
Look up the documentation of Twitter APIs to find the function you would like to use, then search in the source code of the Twitter Python Tools to see how to call it in your program.
Check more example Python scripts that demonstrate interactions with the Twitter API
via the Tweepy Library:
https://github.com/tweepy/examples
Please contact us if you have any suggestions or notice any information that becomes out-of-date.