dorking (how to find anything on the Internet)
tl;dr: Use advanced Google Search to find any webpage, emails, info, or secrets
cost: $0
time: 2 minutes
Software engineers have long joked about how much of their job is simply Googling things
Now you can do the same, but for free
Below, I'll cover dorking
, the use of search engines to find very specific data
For each example, you can paste it directly into Google to see the result
table of contents:
webpages
Inspired by this Twitter exchange with Gumroad CEO Sahil Lavingia, the next few examples will cover Gumroad and Sahil.
find specific pages within a website (ex: for DynamoDB e-books)
site:gumroad.com dynamodb
find specific pages that must include a phrase in the Title text
allintitle:"support this" site:gumroad.com
find similar sites (Google only)
related:gumroad.com
you can chain operators together (ex: looking for bug bounties with either security or bug-bounty in the URL)
(inurl:security OR inurl:bug-bounty OR site:hackerone.com) + "gumroad"
you can restrict to certain top-level domains (ex: lists of teachers)
site:.edu filetype:xls inurl:"email.xls"
emails
find Gmail accounts
alec barrett-wilsdon "@gmail.com"
find work accounts (you'll need to find their domain first)
alec "@contextify.io"
not finding what you're looking for with either of those? Try to guess the format of the email (try going to this site, search the domain, and click Identified Name Formats
))
"abarrett.wilsdon@"
you can always find every page with emails on it (and then use the next snippet below)
site:alec.fyi intext:"@"
find every email on a web page that you're on - inject it into a site with Chrome DevTools (more here)
var elems = document.body.getElementsByTagName("*");
var re = new RegExp("(^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$)");
for (var i = 0; i < elems.length; i++) {
if (re.test(elems[i].innerHTML)) {
console.log(elems[i].innerHTML);
}
}
this will log every email found, without you having to scan through the whole page.
finally, you can validate if an email that you found or guessed is real by hovering over it in a new compose window in Gmail:
if you look carefully, you'll notice the chat and video call options are greyed out on an invalid email.
files
find spreadsheets
filetype:csv OR filetype:xlsx OR filetype:xls OR filetype:xltx OR filetype:xlt OR inurl:airtable.com/universe/
find Google Docs and Google Sheets
site:docs.google.com "gumroad"
find where your competitor's logo is (ex: partners or customers' websites)
"Gumroad Logo.png"
find your competitors' sales pitches and whitepapers
site:intercom.com (filetype:pdf OR filetype:ppt)
find case studies written about competitors
inurl:hubspot-case-study -site:http://hubspot.com
SEO
find sites with specific keywords in the anchor text
inanchor:"cyber security"
research blog posts with specific keywords in their title
inposttitle:"diy slime"
find backlinks (ex: other sites that link to a particular blog post). note: the link
operator is now deprecated
intext:intercom.com/intercom-api-reference/reference
find keyword permutations with the wildcard
operator
* design tools
find companies using a given widget
intext:"Powered by Intercom" -site:intercom.com
coupons!
search the site itself for codes
site:curology.com ("coupon" | "referral code" | "affiliate code" | "discount code" | "VIP")
next, try twitter
site:twitter.com + "meundies" + ("coupon" | "referral code" | "affiliate code" | "discount code" | "VIP")
next, try Mailchimp emails
site:campaign-archive.com + "blueapron" + ("coupon" | "referral code" | "affiliate code" | "discount code" | "VIP")
secrets
cybersecurity experts use dorking, as one tool among many, to find potential vulnerabilities in a company. I will not be covering any such queries, out of concern for their potential for misuse.
operator review
operators
are components of a search query that narrow the results down. You can combine as many as you want in one query. The most useful ones you'll want to know are:
operator | description |
---|---|
"phrase" | results must include "phrase" |
-phrase | exclude results with phrase |
phrase1 AND phrase2 | phrase1 and phrase2 must both be included |
phrase1 OR phrase2 | one of phrase1 and phrase2 must be included (or both) |
site:example.com | results must be on domain example.com |
filetype:jpg | results must be of type .jpg |
AND/OR logic can be used to combine distinct queries
"phrase1" OR "phrase 2" AND "phrase3"
# equivalent to these two searches
>> "phrase1"
>> "phrase 2" AND "phrase3"
Thank you to Tejas, Chris, Ian, and Brandon for contributing edits!
Thanks for reading. Questions or comments? 👉🏻 alec@contextify.io