You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The documentation for how to scrape datasets shows that you can use either collect-mail --url or collect-mail --file when scraping IETF mailing lists, but only collect-mail --file when scraping W3C/3GPP/IEEE/etc mailing lists.
From my (admittedly limited) poking around in the code, it seems like mailman.collect_archive_from_url could be pretty simply rewritten using the code already in the documentation (linked above) to allow the --url option to work for all of the different mailing list types. Which I imagine might be useful for those who are coming to this package without necessarily wanting to download hundreds of mailing lists in one go?
(Please forgive if there is an existing issue about this or if I've wildly misunderstood the code in mailman.py, I've just been getting acquainted with the package! 😅 )
The text was updated successfully, but these errors were encountered:
Thanks for this. It's right on.
It's related to an issue that's just come up, which is that it's much easier to download mbox files from the new IETF mailing list archive interface.
So we will need a mailman ingest from files very soon.
Streamlining the CLI so that it automatically recognizes whether something is a URL or a file name is a nice idea.
The documentation for how to scrape datasets shows that you can use either
collect-mail --url
orcollect-mail --file
when scraping IETF mailing lists, but onlycollect-mail --file
when scraping W3C/3GPP/IEEE/etc mailing lists.From my (admittedly limited) poking around in the code, it seems like
mailman.collect_archive_from_url
could be pretty simply rewritten using the code already in the documentation (linked above) to allow the--url
option to work for all of the different mailing list types. Which I imagine might be useful for those who are coming to this package without necessarily wanting to download hundreds of mailing lists in one go?(Please forgive if there is an existing issue about this or if I've wildly misunderstood the code in mailman.py, I've just been getting acquainted with the package! 😅 )
The text was updated successfully, but these errors were encountered: