You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Those tracking parameters are changing a lot, even for each users sometimes. So extra steps are needed to clean that to be able to match URL.
Are we taking care of that? Didn't see where in the JS (perhaps I just miss it). It can't be on the server side because we are making a hash of the URL on the client side.
I think we should just create a list of parameters we filter out. Do you see a smarter way?
The text was updated successfully, but these errors were encountered:
You're right. We just check for canonical, then default to whatever we have. We didn't try to do anything more complicated than that because the variability available is pretty infinite. However, I agree, we should be able to pretty safely ignore common tracking parameters.
We perhaps could even ignore everything after any example of any known tracking parameter. The few instances where that is incorrect would probably be heavily outweighed by all of the accurately tracked links which would otherwise be ruined by infinitely variable possible URLs.
Yeah, this is a painful and error prone solution, but I don't think there is much that can be done about it, although I agree with the idea of stripping off known junk like google analytics stuff. I can re-hash the DB fairly easily (I have a python script to do that), so we could get that to match. Good call @gagarine !
Ok, let's try to remove very well known tracking parameters to mitigate this problem.
I think in the future, we can compare page's content and analyses if they are similar to others. This can even be used to understand when a page is moved or duplicated on other websites. In other words a duplication detection and consolidation system but that keep all the url variant, to be able to match with the hash send by the plugin. But this part is a server side things and should be some kind of background task.
How I understand we are taking canonical URL from the page element (in html5
<link rel=canonical>
) and use location.href has a fallback:chrome.extension.sendRequest({'action': 'setCanonical', 'url': canonicalValue || location.href, 'title': title});
The things is URL often have tracking parameters and you end up with url like: http://www.example.com/?utm_source=adsite&utm_campaign=adcampaign&utm_term=adkeyword
Those tracking parameters are changing a lot, even for each users sometimes. So extra steps are needed to clean that to be able to match URL.
Are we taking care of that? Didn't see where in the JS (perhaps I just miss it). It can't be on the server side because we are making a hash of the URL on the client side.
I think we should just create a list of parameters we filter out. Do you see a smarter way?
The text was updated successfully, but these errors were encountered: