set up Reddit API

tl;dr: set up the Reddit API and use it to generate interest data

cost: $0

setup time: 5 minutes


Reddit can be a great place to waste an hour or to explore particular new hobbies. It also can be a valuable source of free information on one's customers. In the following walkthrough I'll explain how to set up the basic Reddit API (Docs) and how to use it to generate novel ad targeting options for both Reddit and all other paid digital channels.


#1 - create an 'App'

#1.1 - sign in to your Reddit account and go to the Apps pane

#1.2 - create an app on the script tier

(you can use http://localhost:8080 if you don't have a personal domain lying around)


#2 - create an access token (Docs)

#2.1 - Fetch an access token.

Reddit allows you only up to an hour per access token with the Implicit Grant OAuth flow. Below I've included a function that you can put at the top of your code that will generate and set an access token.

Grab the client ID and client secret from the app you just created. The new access token you create will be scoped down to the minimum read permissions (full list here)

import requests
import os
def generate_temporary_access_token():
    headers = {"User-Agent":"Your Specific UA - u/username"}
    client_auth = requests.auth.HTTPBasicAuth(os.environ['REDDIT_CLIENT_ID'], os.environ['REDDIT_CLIENT_SECRET'])
    post_data = {"grant_type": "password", "username": os.environ['REDDIT_USERNAME'], "password": os.environ['REDDIT_PASSWORD'], "scope":"read"}

    response = requests.post("https://www.reddit.com/api/v1/access_token", auth=client_auth, data=post_data, headers=headers)

    os.environ['REDDIT_ACCESS_TOKEN'] = response.json()['access_token']


#3 - helpful API calls

#3.1 - wrappers

Most languages have a API wrapper library (for example, Python has PRAW)

I won't be using one today. That said, they are occasionally more useful than the API docs. For example, you can find a listing of every endpoint with a plain text description here

#3.2 - keyword subreddit search

The most basic operation is to search subreddits by keyword.

headers = {"Authorization": f"Bearer {os.environ['REDDIT_ACCESS_TOKEN']}", "User-Agent": "Your Specific UA - u/username"}
data = "python"
url = f"https://oauth.reddit.com/subreddits/search?q={data}&limit=100&sort=relevance&sr_detail=true&show=true"

response = requests.get(url, headers=headers)
response_data = response.json().get('data', {}).get("children")

As with most of the following endpoints, you can fetch 100 results at a time and paginate through with &after=after_token, where the after_token is retrieved from response.json()['data']['after']

#3.2 - substring subreddit search

This particular query looks for subreddits that's names start with a given substring. It's a POST, for reasons

headers = {"Authorization": f"Bearer {os.environ['REDDIT_ACCESS_TOKEN']}", "User-Agent": "Your Specific UA - u/username"}
body_data = {'query': "tech", "include_over_18": False, "include_unadvertisable": True}
url = "https://oauth.reddit.com/api/search_subreddits"

response = requests.post(url, headers=headers, data=body_data)
response_data = response.json().get('subreddits', [])

Note that this endpoint is equivalent (and preferable) to /search_reddit_names

You can grab Reddit posts one of two ways - limit to just one subreddit, or across all (but prioritizing one):

subreddit = "all"
just_one_subreddit = 1 if subreddit != "all" else 0
url = f"https://oauth.reddit.com/r/{subreddit}/search?limit=100&show=all&sort=relevance&sr_detail=1&t=all&type=link&include_facets=1&q={data}"
url += f"&restrict_sr={just_one_subreddit}"

response = requests.get(url, headers=headers)
response_data = response.json().get('data', {}).get("children")

#3.3 - using the subreddit data

SEOs and ad buyers will want to know the keywords and topics associated with their communities. Once you've generated a list of keywords, use keyworddit to generate keywords and traffic estimates for each.

Folks who are more familiar with NLP can productionalize a keyword extraction model to scale even further. An example with Non-negative Metrics Factorization is provided here.


#4 - considerations

There are technically 3 API suites:

#4.1.1 - Ads API

Reddit has an Ads API. You have to spend $200k/yr in order to get access. The docs do not explain that, but rest assured, I DM'd their ads support staff. Fortunately for you, it has pretty limited functionality, so you're not missing out on much.

#4.1.2 - .json API

Reddit supports a meaningful amount of functionality in the .json suffix, which exposes site data as plain JSON.

e.g. You can search by Title, with no authentication needed

https://www.reddit.com/search.json?q=title:aws

#4.1.3 - OAuth API

(it's the one used above)

#4.2 - deprecations

Reddit has silently deprecated API endpoints in the past, notably including Topic Search (api/subreddits_by_topic), Subreddit Recommend (/api/recommend/sr/), and Trending Subreddits (/api/trending_subreddits). Some are still listed in the official API docs. Some have partial functionality still accessible through the .json 'API'


Thanks for reading. Questions or comments? 👉🏻 alec@contextify.io