-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problems with pushing mementos into Internet Archive #43
Comments
Thanks for providing details about the problem. Do you have any suggestion for how the user can provide headers? For example:
|
The |
@maturban MemGator has some logic of allowing users to specify user-agent through the command-line. I think simply allowing a string with some semantic CLI flag (e.g., MemGator's @ibnesayeed might have an opinion on this as well. |
Here are my suggestions after thinking about it this morning. For command-line usersFor the command line utility, something like this should suffice:
for comparison, wget has We don't have a use case for allowing command-line users to change all request headers, just user-agent. For programmers (me, other WS-DL folks, and the world)Programmers, on the other hand, may need to modify request headers. This is why I was suggesting that we alter
and have the I have an even better idea. Because ArchiveNow employs the requests library, you could allow the programmer to set up a session object and send the session object as an argument. If no session object is specified, the argument can default to a new one. Like this:
This way, the programmer can set up the session once in their own code and just pass it. They may have changed the session object to include caching, timeouts, user-agents, request headers, etc, and ArchiveNow does not need to care what changes were made. It just calls You can even re-use this session object solution when changing the user-agent string while adding the user-agent argument for command-line users. |
MemGator CLI's user agent works as following:
|
I would not suggest supplying python dictionaries from the CLI as CLIs should be language independent. If you want to allow specifying generic request headers from the CLI (apart from a dedicated flag for the UA), you can use the As far as the internal API is concerned, I would certainly suggest taking @shawnmjones' advice on supporting custom session object. In addition to that, I would suggest you use wildcard keyword arguments (that start with |
I noticed this when I was using ArchiveNow this morning.
If I add a user agent to the arguments to the
requests.get
on line 15 ofarchivenow/archivenow/handlers/ia_handler.py
then it works.archivenow/archivenow/handlers/ia_handler.py
Line 15 in cafcbdd
I'm uncertain as to how you want to handle the user specifying their own user agent. The existing
--agent
argument appears to be for specifying which tool the user desires to employ for creating WARCs. Also, there doesn't appear to be a way to submit changes to any of the request headers inarchivenow/archivenow.py
.As I'm calling ArchiveNow within Python code, I would prefer an available parameter to the
push
function on line 129 ofarchivenow/archivenow.py
.archivenow/archivenow/archivenow.py
Lines 129 to 168 in cafcbdd
For example, we could have:
where the user can override any of the request headers by assigning them as a dictionary to the
headers
parameter. This dictionary would have to be re-submitted through the code on line 154 to the function executed via multithreading.I haven't submitted a pull request yet because all handlers would need to be updated to receive and act on this parameter. I'm not sure of the implications of that.
The text was updated successfully, but these errors were encountered: