Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for wildcard hosts #10

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 26 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,34 +6,49 @@ django-simple-robots

Most web applications shouldn't be indexed by Google. This app just provides a view that serves a "deny all" robots.txt.

In some cases, you do want your app to be indexed - but only in your production environment (not any staging environments). For this case, you can set `ROBOTS_ALLOW_HOST`. If the incoming hostname matches this setting, an "allow all" robots.txt will be served. Otherwise, the "deny all" will be served.
In some cases, you do want your app to be indexed - but only in your production environment (not any staging environments). For this case, you can set `ROBOTS_ALLOW_HOSTS`. If the incoming hostname matches this setting, an "allow all" robots.txt will be served. Otherwise, the "deny all" will be served.

Tested against Django 2.2, 3.2 and 4.0 on Python 3.6, 3.7, 3.8, 3.9 and 3.10

### Installation

Install from PIP

pip install django-simple-robots
```bash
pip install django-simple-robots
```

In your root urlconf, add an entry as follows:

from django.conf.urls import url
from simple_robots.views import serve_robots
```python
from django.conf.urls import url
from simple_robots.views import serve_robots

urlpatterns = [
path("robots.txt", serve_robots),
# ..... other stuff
]
urlpatterns = [
path("robots.txt", serve_robots),
# ..... other stuff
]
```

Then, add `simple_robots` to `INSTALLED_APPS` in your `settings.py`
Then, add `simple_robots` to `INSTALLED_APPS` in your `settings.py`.

Optionally, set `ROBOTS_ALLOW_HOST` settings variable.
Optionally, set `ROBOTS_ALLOW_HOSTS` settings variable.

ROBOTS_ALLOW_HOST = "myproductionurl.com"
```python
ROBOTS_ALLOW_HOSTS = ["myproductionurl.com"]
```

`ROBOTS_ALLOW_HOSTS` also supports multiple options, similar to [`ALLOWED_HOSTS`](https://docs.djangoproject.com/en/stable/ref/settings/#allowed-hosts):

```python
# Allow all subdomains of `myproductionurl.com` (including the apex) and exactly `myotherproductionurl.com` (no subdomains)
ROBOTS_ALLOW_HOSTS = [".myproductionurl.com", "myotherproductionurl.com"]
```

That's it!

Note: Previous versions used `ROBOTS_ALLOW_HOST` to specify a single allowed host. This setting still exists for backwards compatibility.

### Customization

The allow and disallow template are stored at `robots.txt` and `robots-disallow.txt` respectively. You can override these in your projects templates directory to customize the responses.
Expand Down
2 changes: 1 addition & 1 deletion dev-requirements.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
black==21.11b1
black==22.8.0
flake8==4.0.1
isort==5.10.1
13 changes: 13 additions & 0 deletions simple_robots/tests/tests.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,19 @@ def test_allow_if_host_matches(self):
response = self.client.get("/robots.txt", HTTP_HOST="test.com")
self.assertEqual(response.content, b"User-agent: *\nAllow: /\n")

@override_settings(ROBOTS_ALLOW_HOST=".test.com", ALLOWED_HOSTS=[".test.com"])
def test_allow_if_host_matches_wildcard(self):
response = self.client.get("/robots.txt", HTTP_HOST="example.test.com")
self.assertEqual(response.content, b"User-agent: *\nAllow: /\n")

@override_settings(
ROBOTS_ALLOW_HOSTS=["example.test.com", "example2.test.com"],
ALLOWED_HOSTS=[".test.com"],
)
def test_allow_if_host_matches_multiple(self):
response = self.client.get("/robots.txt", HTTP_HOST="example2.test.com")
self.assertEqual(response.content, b"User-agent: *\nAllow: /\n")

@override_settings(
ROBOTS_ALLOW_HOST="test.com", ALLOWED_HOSTS=["test.com", "somethingelse.com"]
)
Expand Down
14 changes: 9 additions & 5 deletions simple_robots/views.py
Original file line number Diff line number Diff line change
@@ -1,19 +1,23 @@
from django.conf import settings
from django.http.request import validate_host
from django.views.generic import TemplateView

ROBOTS_ALLOW_HOST_SETTING = "ROBOTS_ALLOW_HOST"
ROBOTS_ALLOW_TEMPLATE = "robots.txt"
ROBOTS_DISALLOW_TEMPLATE = "robots-disallow.txt"


class ServeRobotsView(TemplateView):
content_type = "text/plain"

def get_allowed_hosts(self):
# Maintain singular setting for backwards compatibility
if getattr(settings, "ROBOTS_ALLOW_HOST", ""):
return [settings.ROBOTS_ALLOW_HOST]

return getattr(settings, "ROBOTS_ALLOW_HOSTS", [])

def get_template_names(self):
if (
getattr(settings, ROBOTS_ALLOW_HOST_SETTING, None)
== self.request.get_host()
):
if validate_host(self.request.get_host(), self.get_allowed_hosts()):
return ROBOTS_ALLOW_TEMPLATE
return ROBOTS_DISALLOW_TEMPLATE

Expand Down