Skip to content

ckuijjer/expatcinema.com

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Expat Cinema

Expat Cinema shows foreign movies with english subtitles that are screened in cinemas in the Netherlands. It can be found at https://expatcinema.com.

Deploy Cloud

Deploy

A GitHub Action is used to deploy the Serverless application to AWS. The action is triggered by a push to the main branch.

Scrapers

Scheduled

The scrapers run on a daily schedule defined in cloud/serverless.yml

Manual

  • cd cloud; yarn scrapers to run the scrapers on the dev stage
  • cd cloud; yarn scrapers:prod to run the scrapers on the prod stage

Deploy Web

Scheduled

The web is deployed on a daily schedule using GitHub Actions. The schedule is defined in .github/workflows/web.yml. The schedule is needed to have the SSG (static site generator) get the latest data from the scrapers.

Manual

Easiest is to bump the version in web/package.json and push to master. This will trigger a GitHub Action that will deploy the web app to GitHub Pages. Note there's only a prod stage for the web app.

Running scrapers locally

yarn scrapers:local

Stores the output in cloud/output instead of S3 buckets and DynamoDB

Use SCRAPERS environment variable in .env.local to define a comma separated list of scrapers to locally run and diverge from the default set of scrapers in scrapers/index.js

And to call a single scraper, e.g. LOG_LEVEL=debug yarn tsx scrapers/kinorotterdam.ts and then have e.g.

if (require.main === module) {
  extractFromMoviePage(
    'https://kinorotterdam.nl/films/cameron-on-film-aliens-1986/',
  ).then(console.log)
}

with the LOG_LEVEL=debug used to have debug output from the scrapers show up in the console

CI/CD

GitHub actions is used, web/ uses JamesIves/github-pages-deploy-action to deploy to the gh-pages branch, and the GitHub settings has Pages take the source branch gh-pages which triggers the GitHub built in pages-build-deployment

.env.* files are only used for the local stage, not for running other stages locally, and not for CI/CD, for that take a look at the GitHub secrets and variables (on repository and environment level)

Quick local backup

aws s3 sync s3://expatcinema-scrapers-output expatcinema-scrapers-output --profile casper
aws s3 sync s3://expatcinema-public expatcinema-public--profile casper
aws dynamodb scan --table-name expatcinema-scrapers-analytics --profile casper > expatcinema-scrapers-analytics.json

Favicon

Chromium

Some scrapers need to run in a real browser, for which we use puppeteer and a lambda layer with Chromium.

Upgrading puppeteer and chromium

yarn add [email protected] @sparticuz/chromium@^123.0.1
yarn add -D [email protected]

After installing the new version of puppeteer and chromium update the lambda layer in serverless.yml, by doing a search and replace on arn:aws:lambda:eu-west-1:764866452798:layer:chrome-aws-lambda: and change e.g. 44 to 45

Installing Chromium for use by puppeteer-core locally

See https://github.com/Sparticuz/chromium#running-locally--headlessheadful-mode for how

Troubleshooting

When running a puppeteer based scraper locally, e.g. yarn tsx scrapers/ketelhuis.ts and getting an error like

Error: Failed to launch the browser process! spawn /tmp/localChromium/chromium/mac_arm-1205129/chrome-mac/Chromium.app/Contents/MacOS/Chromium ENOENT

you need to install Chromium locally, run yarn install-chromium to do so and update LOCAL_CHROMIUM_EXECUTABLE_PATH in browser.ts to point to the Chromium executable. See https://github.com/Sparticuz/chromium#running-locally--headlessheadful-mode for how

About

Expat Cinema - Foreign movies with English subtitles

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages