-
-
Notifications
You must be signed in to change notification settings - Fork 338
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarify robots.txt generation #620
Comments
Thanks for the report. It's been so long since I added this to the project that I can't remember the reasoning behind doing it this way. I think it was just the logic that if you're working in development you wouldn't want that project being scraped by search engines. I would imagine that most of the time that would never happen anyway so perhaps it is a pointless "feature" that could be removed. |
Hi and thanks for the reply :)! In my opinion, behaviour which is environment dependent in a way that is not strictly requested by the user isn't optimal, even though it might just try to be setting sensible defaults. This can then introduce unwanted behaviour which can sometimes be hard to trace as in my case. I'd also presume most users who run Hugo in What I'd imagine could be a better way of doing this is to have a single parameter somewhere that dictates what goes into the I'm happy to have a go at a PR over the weekend that you can then review. Please let me know if you prefer any specific approach 😊. Thank you! |
The two config parameters serve a different purpose. The I think 99% of the time people are going to just allow all in the robots.txt and so that's probably the sensible default. If you understand what the file is, and how it works, you're going to go and tweak it yourself anyway and so trying to get smart about that with parameters is probably only going to confuse people and not be broad enough to meet individual requirements anyway. I'm going to change the default template to allow all and remove the test for whether the site is in production or development. |
Issue description
Hi and thanks a lot for all the work on Congo 👏 !
I've been a Congo user for quite some time, and one thing that kept bugging me for a while is that the
robots.txt
of my website kept havingDisallow: /
, thus rendering my website pretty useless for search engine indexing, despite me not explicitly disallowing this anywhere.I've naturally had a read through the docs and issues, but apart from finding the
robots
andenableRobotsTXT
params, nothing seemed relevant, and only looking into the code I've found the culprit:congo/layouts/robots.txt
Lines 2 to 6 in 60fc10d
The Allow all vs. Disallow all option is set based on
hugo.IsProduction
and.Site.Params.env == "production"
, which allowed me to find the bug in my case. My issue was that when runninghugo server
locally, this runs indevelopment
environment, so outputsDisallow: /
, but after reading about it in this article, I found out that runninghugo --minify
should run inproduction
by default, which was indeed the case and the in the SSG output on my local machine, myrobots.txt
had indeedAllow: /
. The real bug was, that for some reason, in the CI pipeline (woodpecker-ci) which deploys my site to Codeberg pages,hugo --minify
runs indevelopment
by default, so I had to change it tohugo run --minify --environment production
to finally fix it.tl;dr
I'd like to ask two things:
I'm happy to help with changing this/improving the docs on it :).
Thanks a lot 💙 !
Theme version
v2.6.1
Hugo version
hugo v0.110.0
Which browser rendering engines are you seeing the problem on?
Chromium (Google Chrome, Microsoft Edge, Brave, Vivaldi, Opera, etc.)
URL to sample repository or website
https://adam.sr
Hugo output or build error messages
No response
The text was updated successfully, but these errors were encountered: