-
Notifications
You must be signed in to change notification settings - Fork 193
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Option to drop fragments from URLs #215
Comments
Thanks for checking in before making the change. I think this would probably make the most sense as a per-site configuration checkbox — not as filters in the dashboard. As a result, the processing would be done at ingest time, rather than at query time. I agree that this is less than ideal, since we'd like to collect as "raw" data as possible. But perhaps we should also think of this as a security improvement. Some websites — wrongly! — include sensitive data inside the URL fragment (e.g., auth tokens). This change allows them to use Shynet without sensitive information hitting the database. I say go ahead and create a PR. Thanks! |
Thank you for the quick response @milesmcc. 🙌 On one hand I appreciate that Shynet is simple, but on the other I'm concerned of losing the original data. What do you think if we add a |
I think it's maybe overkill. I worry it also might make things like querying and filtering more complicated (since there would be two table fields that end users might want to interact with). It also removes any potential security or privacy benefit. |
I would like for Shynet to have a configuration option that will make it drop fragments (i.e. everything after
#...
) from URLs. This way URLs that are really the same page will be collated together for stats.For example in my blog I have people linking to https://blog.fidelramos.net/photography/photography-workflow#5-replication-with-syncthing, but I would like for Shynet to treat that link as https://blog.fidelramos.net/photography/photography-workflow so all those hits are grouped together.
One thing I'm not sure about is whether the URL sanitation should happen at collection time or when calculating the stats, I'm not familiar with shynet's internals yet to know which approach makes the most sense. On one hand it would be better to collect raw data without alteration, but this might put too heavy a burden when parsing the stats.
Another big question is where to offer this option. As a per-site configuration checkbox? As filters in the dashboard?
I'm willing to code this but would like some discussion and agreement on how to execute it so the effort is productive.
The text was updated successfully, but these errors were encountered: