scraping tweets for fun and profit

Thu 23 July 2020

tagged software, non-technical, python, growth hacks, scraping, dorking, how-to, organic social

tl;dr: find spontaenous interactions by scraping tweets with given keywords

cost: $0

build time: 5 minutes (MVP)

Twitter can be a great way to connect with alike minds. But if you're here, you're (probably) not yet a Thought Leader™. Let's walk through a quick way to speed that up.

Imagine you're releasing a blog post soon. Or perhaps you're looking to connect with people joining an upcoming webinar. Or maybe you just want to grow your following in a niche.

With the below tool, you can find hundreds of relevant conversations to join¹ more or less instantly²

Option	Description
'-k' or '--keywords'	A list of keywords (separated by spaces) that you want to search for; required=True
'-o' or '--output_filename'	Set the output filename to something other than the default
'-g' or '--output_gsheet'	Write to Google Sheets with the spreadsheet name you specify
'-d' or '--deduplicate'	Remove duplicates from the output (uses tweet_id)
'-s' or '--since'	Filter by posted date since a given date. Format is 2019-12-20 20:30:15
'-u' or '--until'	Filter by posted date until a given date. Format is 2019-12-20 20:30:15
'-l' or '--limit'	Limit the results per keyword provided
'-m' or '--min_likes'	Limit the results to only tweets with a given number of likes
'-n' or '--near'	Limit the results to tweets geolocated near a given city
'-v' or '--verified'	Limit the results to tweets made by accounts that are verified
'-q' or '--hide_output'	If you want to disable routine results logging; default=True
'-r' or '--resume'	Have the search resume at a specific Tweet ID

A list of all the twint supported args in at the bottom of get_tweets_by_keyword.py, as well.

#3 - write to Google Sheets

If you've read my Google Sheets API walkthrough, you can use those credentials to have this script easily write to Google Sheets. If you haven't, go set up the auth, as described there.

The beauty of the Google Sheets write is you can have one person responsible for running the script (or put it on a cronjob!) and have it write to a Sheet that is shared with others (e.g. a whole marketing team)

When setting up the Sheet:

Don't forget: each write will overwrite the first tab.
Remember you need to share the Sheet with your gserviceaccount email.
You'll need to export the GSheets keys to the local environment:

export GSHEETS_PRIVATE_KEY="-----BEGIN PRIVATE KEY-----superlongkeywithabunchofstuffinit"
export GSHEETS_CLIENT_EMAIL="theemailyousharedthesheetwith"

#4 - use cases

The core Twint library makes it easy to scrape lots of things from Twitter - a user's tweets, a user's followers, tweets that have emails in them, etc.

This code wraps Twint and does only one thing (and does it well): scrape tweets that contain any of a number of keywords.

If the former sounds more up your alley than the latter, I recommend you check out the Twint documentation.

A few use cases I like using this tool for:

getting reviews for your / your competitors' product
surfacing relevant convos to promote your content
finding coupons
getting software recommendations
building Twitter lists of folks to follow
get volume estimates (i.e. how many people are talking about [X,Y,Z])
conference and webcast attendees

¹ I strongly recommend against trying to automate responses to scraped tweets. It will come across as inauthentic (which it is) and it will hurt your brand. If you really want to scale this to the moon, hire a social media manager to run it.

² The script processes 1500-2000 tweets/minute

Thanks for reading. Questions or comments? 👉🏻 alec@contextify.io

scraping tweets for fun and profit

table of contents:

#1.1 - setup if you haven't used the Terminal before

#1.2 - setup if you're familiar with the Terminal

#2 - usage

#3 - write to Google Sheets

#4 - use cases