Skip to content

Latest commit

 

History

History
284 lines (203 loc) · 8.96 KB

README.md

File metadata and controls

284 lines (203 loc) · 8.96 KB

Siteshooter

Siteshooter

NPM version Build Status dependencies

Automate full website screen shots and PDF generation with multiple view port support

Features

  • Crawls specified host and generates a sitemap.xml on the fly
  • Generates entire website screen shots based on sitemap.xml
  • Define multiple view ports
  • Automated PDF generation
  • Includes crawled meta data in generated PDF
  • Reports on broken website links (404 http response)
  • Supports HTTP basic authentication
  • Supports Microsoft Online 3 step authentication
  • Supports Salesforce Visualforce 3 step authentication
  • Supports site maps with HTTP, HTTPS, and FTP protocol URLs
  • Follows HTTP 301 redirects
  • Custom JavaScript inject file - injects into page prior to screen shooting
  • Trigger page events by passing querystring values to custom inject.js file

Do you need a website and workflow management platform?

Catapult website and workflow management platform Give Catapult a shot


In This Documentation

  1. Getting Started
  2. Siteshooter Configuration File
  3. CLI Options
  4. Tests
  5. Troubleshooting & FAQ

Getting Started

Dependencies

Install the following prerequisite on your development machine:

Notable npm Modules

Quick Start

$ npm install siteshooter --global

If siteshooter is installed, make sure you have the latest version by running:

$ npm update siteshooter --global
  • You may need to run these commands with elevated privileges, e.g. sudo, you will be prompted to do so if needed.
  • Installing with the --global flag affords you the siteshooter command on your machine's command line at any path.
  • Read more about the --global flag here.

Create a Siteshooter Configuration File

$ siteshooter --init

Update Siteshooter Configuration File

View the full siteshooter.yml example

Inside siteshooter.yml, add additional options.

  • All Simple Web Crawler options can be added to sitecrawler_options and will pass through to the crawler process
  • Generated screenshot image files are optimized using imagemin and imagemin-pngquant modules, which reduce the overall size of generated PDFs. To adjust the image quality, update the image_quality option in your siteshooter.yml file.
domain:
  name: https://www.devopsgroup.io
  auth:
    user:
    pwd:

pdf_options:
 excludeMeta: true

screenshot_options:
  delay: 2000
  image_quality: '60-80'

sitecrawler_options:
  exclude:
   - "pdf"
  stripQuerystring: false
  ignoreInvalidSSL: true

viewports:
 - viewport: desktop-large
   width: 1600
   height: 1200
 - viewport: tablet-landscape
   width: 1024
   height: 768
 - viewport: iPhone5
   width: 320
   height: 568
 - viewport: iPhone6
   width: 375
   height: 667

CLI Options

$ siteshooter --help

Usage: siteshooter [options]

OPTIONS
_______________________________________________________________________________________
-c --config            Show configuration
-C --cwd               Set working directory, which will load a siteshooter.yml file in the specified path
-e --debug             Output exceptions
-h --help              Print this help
-i --init              Create siteshooter.yml template file in working directory
-p --pdf               Generate PDFs, by defined view ports, based on screen shots created via Siteshooter
-q --quiet             Only return final output
-s --screenshots       Generate screen shots, by view ports, based on sitemap.xml file
-S --sitemap           Crawl domain name specified in siteshooter.yml file and generate a local sitemap.xml file
-v --version           Print version number
-V --verbose           Verbose output
-w --website           Report on website information based on Siteshooter crawled results

When running a siteshooter command without any options, the following options will run in order by default:

  • --sitemap
  • --screenshots
  • --pdf

Custom JavaScript Inject File

To manipulate the DOM, prior to the screen shot process, add a inject.js file in the same working directory as the siteshooter.yml.

Example: inject.js file

/**
 * @file:            inject.js
 * @description:     used to inject custom JavaScript into a web page prior to a screen shot. 
 */

console.log('JavaScript injected into page.');

if ( typeof(jQuery) !== "undefined" ) {

    jQuery(document).ready(function() {
        console.log('jQuery loaded.');
    });
}

Trigger JavaScript Events

When using the optional inject.js file, events can be triggered based on the following querystring parameter - pevent

 // Add URL with pevent querystring parameter in the generated sitemap.xml
<url>
    <loc>https://www.devopsgroup.io?pevent=open-privacy-overlay</loc>
    <changefreq>weekly</changefreq>
</url>

Example: Event detection & triggering

/**
 * @file:            inject.js
 * @description:     used to inject custom JavaScript into a web page prior to a screen shot. 
 */


function getQueryVariable(variable) {
    var query = window.location.search.substring(1);
    var vars = query.split('&');
    for (var i = 0; i < vars.length; i++) {
        var pair = vars[i].split('=');
        if (decodeURIComponent(pair[0]) == variable) {
            return decodeURIComponent(pair[1]);
        }
    }
}

if ( typeof(jQuery) !== "undefined" ) {

    jQuery(document).ready(function() {
        var pageName = window.location.pathname.replace('/', ''),
            pageEvent = getQueryVariable('pevent');

        console.log('document ready.');
        console.log('userAgent', navigator.userAgent);
        console.log('Page: ', pageName);
        console.log('Event: ', pageEvent);

        switch (pageName) {

            // home
            case '':

                switch (pageEvent) {
                    case 'open-privacy-overlay':

                        jQuery('a[data-target~="#modal-privacy"]').trigger('click');
                        break;
                }

                break;
        }

    });
}

Tests

Tests are written with Mocha and can be run with npm test.

Troubleshooting

If you're having issues with Siteshooter, submit a GitHub Issue.

  • Make sure you have a siteshooter.yml file in your working directory and the yaml file is well formatted
  • Experiencing font-loading issues? Try increasing the delay setting in your siteshooter.yml file
screenshot_options:
  delay: 2000
  • Trying to take a screenshot of a page with a video? Unfortunately, PhantomJS does not support videos. As such, here's one approach to showing a video's poster image.
/**
 * @file:            inject.js
 * @description:     used to display a video's poster image
 */

if( jQuery('video').length >0 ){
    jQuery('video').parent().prepend('<img src="'+jQuery('video').attr('poster')+'"/>');
    jQuery('video').remove();
}
  • SimpleCrawler TypeError: The header content contains invalid characters
    • Try setting the acceptCookies option to false
sitecrawler_options:
  acceptCookies: false

Code of Conduct

Take a moment to read or Code of Conduct

Contributing to the project

We are always looking for quality contributions! Please check the CONTRIBUTING.md for contribution guidelines.