GitHub - antihackme/behind-this-website: Checklist for investigating the provenance and ownership of websites.

Who’s behind this website? A Checklist.

By Prianjana Bengani (@acookiecrumbles) and Jon Keegan (@jonkeegan) Oringinally presented at IRE NICAR Conference - March 4, 2022 Slides: English | Russian

Thank you to Svetlana Borodina at Harriman Institute for the Russian translation!

What is this?

This checklist is meant to be used as a reporting tool to help journalists and researchers when trying to find out who published a website. This is meant to be used in conjunction with offline reporting techniques.

Following this checklist does not guarantee that you can unmask the owner of a website that does not want to be found, but it can help surface crucial clues and connections that can act as leads for further reporting.

🌟 Strong recommendation: while running through this checklist, create a data diary — it can be a TextEdit doc, a Google Doc, just the Notes app, whatever. It is important to be able to retrace your steps.

Site Content

Text

✍️ Are there any authors listed?
- If the site is Wordpress, try this wildcard search on Google to reveal the author list: "https://yourwebsite.com/author/*/"
📫 Are there any e-mail addresses or contact information?
- If there are e-mail addresses, do those share the domain with the website?
- Does the email show up in haveibeenpwned.com?
- Check to see if there is a Gravatar associated with that address:
  - https://en.gravatar.com/site/check/[email protected]
🕑 What’s the server’s local time?
- Look at the datetime attribute in links on Wordpress sites. GMT timestamp can reveal time zone based on GMT offset: <time class="updated" datetime="2022-03-04T10:21:40+06:00">March 4, 2022</time>
🕶 Does the website have a privacy policy or terms and conditions that mentions an LLC, or what regional laws apply?
📡 Does the website have an RSS feed?
- Does the RSS feed give any additional information about authors / stories that aren't visible on the site?
- You can pull RSS article links into Google sheets using IMPORTFEED

Features and functionality

🗞 Does the website have a newsletter?
- Check for the physical postal address — required by the CAN-SPAM Act in the US
💸 Does the website collect donations?
🛒 Does the website have an e-commerce store? Or, does it sell products?
- Try walking through the checkout process (without paying). Sometimes the real payee name is revealed just before you confirm the payment.

Links

🔗 What domains does the website link to most? (Requires scraping)
❤️ Who links to the domain most often?
- Google search operator: "link:yourwebsite.com"
- Check backlinks on ahrefs.com for free 💵
Do the links have UTM codes?

Photos, images and documents

📸 Are there author photos?
- Use reverse image search to see if the same images appear elsewhere
- Check sensity.ai to see if the image is GAN-generated
- Read more about spotting GAN-generated images here.
🔎 Do the images have EXIF data?
- Instructions here.
👀 Do the images have any other identifying information?
- Run through the list here
🪣 Where are the images hosted?
- If on AWS S3, the bucket name can be revealing — or you might find the bucket isn’t secure.
📄 Are there PDFs hosted on the site?
- On a search engine, "filetype:pdf site:<yourwebsite.com>"
- If you find some, check the metadata with "Get Info" in your PDF viewer.
🕛 Are there old archived images on Wayback Machine?
- Using the "URLs" page may find deleted images, filter on "image/" to narrow the search.

Social Media

If there are any social media profiles mentioned on the site, they are worth investigating.

👤 Are there any social media accounts in the <meta> section of the HTML?
📅 When were the individual accounts created? Does it line up with the site history?
📊 What platform has the biggest reach?
📣 Is the messaging different across platforms?
📇 Do they have completely distinct account names across social media platforms or are they more-or-less the same?
- Note: just because you find the same account name across platforms doesn’t necessarily mean they belong to the same person!

Facebook

On the Facebook profile, go to Page Transparency:

☎️ Is there an address and phone number for the page?
⏪ Does the page history reveal a different name?
- Has the page shifted topics?
🐣 When was the Facebook page created?
Is the page running any groups?
🗳 Has the page run any ads? Has the page run political ads?
🤖 Does Facebook flag any ‘related pages’ for the given page? Rely on Facebook’s algorithms to find connections!

Twitter

On Twitter, the account might be part of a pod or network that boosts each other. Using en.whotwi.com, it’s worth checking:

👯‍♀️ Who is the account is engaging with?
🐦 What are the account’s tweeting patterns?
#️⃣ What hashtags are associated with the account?
Who were the account's the first follows / followers?
- Find this here: https://en.whotwi.com/

Other platforms

Don't forget to check to see if the site has accounts on Youtube, Instagram, Reddit, Github,

Infrastructure

Resources & Tools

Books

Open Source Intelligence Techniques - Michael Bazzell https://inteltechniques.com/book1.html

Verification Handbook - edited by Craig Silverman https://datajournalism.com/read/handbook/verification-3

Website Infrastructure

Blacklight: The Markup's real-time website privacy inspector.
builtwith.com: gives you the infrastructure of the site, including IP addresses, analytics codes, tech stack, etc. Freemium model.
DNSDBScout: allows you to search and ‘flexible search’ for passive dns lookups including IP <-> domain mapping.
Dnslytics: offers a range of tools including reverse Analytics and reverse DNS lookups, as well as WHOIS data. Freemium.
RiskIQ: a ‘threat intelligence’ tool that allows you to get reverse IP, reverse analytics, WHOIS, SSL, subdomains, etc.
Whoxy: a tool that lets you see historical WHOIS registrations. Free.
The Internet Archive browser extension.

Social Media Accounts

Sensity AI: check if an image is GAN-generated or not. Freemium.
whotwi.com: create a profile-at-a-glance for any account on Twitter. Free.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Who’s behind this website? A Checklist.

What is this?

Site Content

Text

Features and functionality

Links

Photos, images and documents

Social Media

Facebook

Twitter

Other platforms

Infrastructure

Resources & Tools

Books

Website Infrastructure

Social Media Accounts

About

Releases

Packages

antihackme/behind-this-website

Folders and files

Latest commit

History

Repository files navigation

Who’s behind this website? A Checklist.

What is this?

Site Content

Text

Features and functionality

Links

Photos, images and documents

Social Media

Facebook

Twitter

Other platforms

Infrastructure

Resources & Tools

Books

Website Infrastructure

Social Media Accounts

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages