You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As mentioned in contao/contao#4213 subscribers might want to handle or react to redirected URLs. Or rather: the CrawlUri passed to the crawler subscribers will currently not be the actual URL that has been crawled, if a redirect was involved.
When an URL responds with a redirect, the http client will automatically follow that redirect (up until the max_redirects setting). This means however that when subscribers are notified, the passed CrawlUri instance will not be the actual URL that has been crawled in the end. In contao/contao#4218 this is rectified by analysing the response info that the ResponseInterface of the Symfony Http Client provides. But may be there is a better way of doing this.
The Symfony Http Client does not offer much utility when it comes to redirects. If you want to handle redirects more granularly, you can set max_redirects to 0 and then handle the RedirectException - and then decide whether another request should be made or not. And at the same time you could also be directly update the CrawlUri instance with the new URL (or even track each individual URL in a stack) before it is passed to subscribers.
The text was updated successfully, but these errors were encountered:
Implemented in 54f6e82 :)
You can now access it via $crawlUri->getRedirectedTo() which might be null of course.
Note that for this to work, I had to add another column in the DoctrineQueue so we cannot just update like that because afaik in Contao we define the table ourselves instead of using a schema listener that forwards to Escargot (which is would be now possible as of b551530 too). Care to work on this in the Core? 😊
I've released the changes to the DoctrineQueue in 1.5.0 (https://github.com/terminal42/escargot/releases/tag/1.5.0).
So technically, you can now now require ^1.5 and use a schema listener that adds the $queue->getTableSchema() to it.
Until then I probably cannot release a new version with this new feature as otherwise tl_crawl_queue would fail because it's lacking the new column required.
As mentioned in contao/contao#4213 subscribers might want to handle or react to redirected URLs. Or rather: the
CrawlUri
passed to the crawler subscribers will currently not be the actual URL that has been crawled, if a redirect was involved.When an URL responds with a redirect, the http client will automatically follow that redirect (up until the
max_redirects
setting). This means however that when subscribers are notified, the passedCrawlUri
instance will not be the actual URL that has been crawled in the end. In contao/contao#4218 this is rectified by analysing the response info that theResponseInterface
of the Symfony Http Client provides. But may be there is a better way of doing this.The Symfony Http Client does not offer much utility when it comes to redirects. If you want to handle redirects more granularly, you can set
max_redirects
to0
and then handle theRedirectException
- and then decide whether another request should be made or not. And at the same time you could also be directly update theCrawlUri
instance with the new URL (or even track each individual URL in a stack) before it is passed to subscribers.The text was updated successfully, but these errors were encountered: