Skip to content

Latest commit

 

History

History
102 lines (62 loc) · 4.32 KB

README.md

File metadata and controls

102 lines (62 loc) · 4.32 KB

Expat Cinema

Expat Cinema shows foreign movies with english subtitles that are screened in cinemas in the Netherlands. It can be found at https://expatcinema.com.

Deploy Cloud

Deploy

A GitHub Action is used to deploy the Serverless application to AWS. The action is triggered by a push to the main branch.

Scrapers

Scheduled

The scrapers run on a daily schedule defined in cloud/serverless.yml

Manual

  • cd cloud; yarn scrapers to run the scrapers on the dev stage
  • cd cloud; yarn scrapers:prod to run the scrapers on the prod stage

Deploy Web

Scheduled

The web is deployed on a daily schedule using GitHub Actions. The schedule is defined in .github/workflows/web.yml. The schedule is needed to have the SSG (static site generator) get the latest data from the scrapers.

Manual

Easiest is to bump the version in web/package.json and push to master. This will trigger a GitHub Action that will deploy the web app to GitHub Pages. Note there's only a prod stage for the web app.

Running scrapers locally

yarn scrapers:local

Stores the output in cloud/output instead of S3 buckets and DynamoDB

Use SCRAPERS environment variable in .env.local to define a comma separated list of scrapers to locally run and diverge from the default set of scrapers in scrapers/index.js

And to call a single scraper, e.g. LOG_LEVEL=debug yarn tsx scrapers/kinorotterdam.ts and then have e.g.

if (require.main === module) {
  extractFromMoviePage(
    'https://kinorotterdam.nl/films/cameron-on-film-aliens-1986/',
  ).then(console.log)
}

with the LOG_LEVEL=debug used to have debug output from the scrapers show up in the console

CI/CD

GitHub actions is used, web/ uses JamesIves/github-pages-deploy-action to deploy to the gh-pages branch, and the GitHub settings has Pages take the source branch gh-pages which triggers the GitHub built in pages-build-deployment

.env.* files are only used for the local stage, not for running other stages locally, and not for CI/CD, for that take a look at the GitHub secrets and variables (on repository and environment level)

Quick local backup

aws s3 sync s3://expatcinema-scrapers-output expatcinema-scrapers-output --profile casper
aws s3 sync s3://expatcinema-public expatcinema-public--profile casper
aws dynamodb scan --table-name expatcinema-scrapers-analytics --profile casper > expatcinema-scrapers-analytics.json

Favicon

Chromium

Some scrapers need to run in a real browser, for which we use puppeteer and a lambda layer with Chromium.

Upgrading puppeteer and chromium

yarn add [email protected] @sparticuz/chromium@^123.0.1
yarn add -D [email protected]

After installing the new version of puppeteer and chromium update the lambda layer in serverless.yml, by doing a search and replace on arn:aws:lambda:eu-west-1:764866452798:layer:chrome-aws-lambda: and change e.g. 44 to 45

Installing Chromium for use by puppeteer-core locally

See https://github.com/Sparticuz/chromium#running-locally--headlessheadful-mode for how

Troubleshooting

When running a puppeteer based scraper locally, e.g. yarn tsx scrapers/ketelhuis.ts and getting an error like

Error: Failed to launch the browser process! spawn /tmp/localChromium/chromium/mac_arm-1205129/chrome-mac/Chromium.app/Contents/MacOS/Chromium ENOENT

you need to install Chromium locally, run yarn install-chromium to do so and update LOCAL_CHROMIUM_EXECUTABLE_PATH in browser.ts to point to the Chromium executable. See https://github.com/Sparticuz/chromium#running-locally--headlessheadful-mode for how