Skip to content

VIPnytt/RobotsTagParser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

65 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Build Status Scrutinizer Code Quality Code Climate Test Coverage License Packagist Gitter

X-Robots-Tag HTTP header parser

PHP class to parse X-Robots-Tag HTTP headers according to Google X-Robots-Tag HTTP header specifications.

SensioLabsInsight

Requirements:

Note: HHVM support is planned once facebook/hhvm#4277 is fixed.

Installation

The library is available via Composer. Add this to your composer.json file:

{
    "require": {
        "vipnytt/robotstagparser": "~0.2"
    }
}

Then run composer update.

Getting Started

Basic example

Get all rules affecting you, this includes the following:

  • All generic rules
  • Rules specific to your User-Agent (if there is any)
use vipnytt\XRobotsTagParser;

$headers = [
    'X-Robots-Tag: noindex, noodp',
    'X-Robots-Tag: googlebot: noindex, noarchive',
    'X-Robots-Tag: bingbot: noindex, noarchive, noimageindex'
];

$parser = new XRobotsTagParser('myUserAgent', $headers);
$rules = $parser->getRules(); // <-- returns an array of rules

Different approaches:

Get the HTTP headers by requesting an URL

use vipnytt\XRobotsTagParser;

$parser = new XRobotsTagParser\Adapters\Url('http://example.com/', 'myUserAgent');
$rules = $parser->getRules();

Use your existing GuzzleHttp request

use vipnytt\XRobotsTagParser;
use GuzzleHttp\Client;

$client = new GuzzleHttp\Client();
$response = $client->request('GET', 'http://example.com/');

$parser = new XRobotsTagParser\Adapters\GuzzleHttp($response, 'myUserAgent');
$array = $parser->getRules();

Provide HTTP headers as an string

use vipnytt\XRobotsTagParser;

$string = <<<STRING
HTTP/1.1 200 OK
Date: Tue, 25 May 2010 21:42:43 GMT
X-Robots-Tag: noindex
X-Robots-Tag: nofollow
STRING;

$parser = new XRobotsTagParser\Adapters\TextString($string, 'myUserAgent');
$array = $parser->getRules();

Export all rules

Returns an array containing all rules for any User-Agent.

use vipnytt\XRobotsTagParser;

$parser = new XRobotsTagParser('myUserAgent', $headers);
$array = $parser->export();

Directives:

  • all - There are no restrictions for indexing or serving.
  • none - Equivalent to noindex and nofollow.
  • noindex - Do not show this page in search results and do not show a "Cached" link in search results.
  • nofollow - Do not follow the links on this page.
  • noarchive - Do not show a "Cached" link in search results.
  • nosnippet - Do not show a snippet in the search results for this page.
  • noodp - Do not use metadata from the Open Directory project for titles or snippets shown for this page.
  • notranslate - Do not offer translation of this page in search results.
  • noimageindex - Do not index images on this page.
  • unavailable_after - Do not show this page in search results after the specified date/time.

Source: https://developers.google.com/webmasters/control-crawl-index/docs/robots_meta_tag