Skip to content
This repository has been archived by the owner on Feb 19, 2019. It is now read-only.

Add safe generator #386

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open

Add safe generator #386

wants to merge 6 commits into from

Conversation

NoahCarnahan
Copy link
Contributor

Add a SafeGenerator class, which can be used to wrap a normal generator. It will throw an exception if something attempts to iterate over the generator a second time.
This PR also adds a decorator that can be used with functions that return generators so that they will return SafeGenerators instead.
@czue @esoergel

Copy link
Member

@czue czue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice! maybe add a test for the decorator too?

also it doesn't matter too much, but the couch.database module is a bit of a strange place for it. might be better to make this its own module?

Copy link
Contributor

@esoergel esoergel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, good solution! Leaving open to give you a chance to respond to Cory's feedback.

@millerdev
Copy link
Contributor

millerdev commented Dec 5, 2016

Summary of comments I made on slack, plus some new things.

FWIW, according to the Python docs SafeIterator would be "deemed broken":

The intention of the protocol is that once an iterator’s next() method raises StopIteration, it will continue to do so on subsequent calls. Implementations that do not obey this property are deemed broken. (This constraint was added in Python 2.3; in Python 2.2, various iterators are broken according to this rule.) - https://docs.python.org/2/library/stdtypes.html#iterator-types

I'm not sure if that matters in practice, and I think the benefits of this probably outweigh the "brokenness." However, it is possible that other code could break if it depended on the documented property that an exhausted iterator will continue to raise StopIteration on subsequent next() calls. Personally I think it would have been better if the iterator protocol had been designed to raise something other than StopIteration on subsequent next calls, but it does make the implementation more complicated. I'm sure there's an in-depth discussion on this reasoning somewhere on the python-dev mailing list.

Can we avoid putting anything new in dimagi-utils? Seems like this repo should eventually be merged back into HQ since it's so tightly coupled and the submodule workflow only adds overhead.

Update: I see you mentioned the brokenness issue in the SafeIterator docstring, which is great.

This iterator-like object wraps an iterator.
The SafeIterator will raise a IteratorAlreadyConsumedException if a user attempts to iterate through it a
second time.
This is useful if the wrapped iterator is a one time user generator, which would simply raise StopIteration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should say "one time use" instead of "one time user"?

This decorator function wraps the return value of the given function in a SafeIterator.
You should only use this decorator on functions that return generators.
The SafeIterator will render the generator "safe" by raising a IteratorAlreadyConsumedException if it is
iterated a second time.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be more correct to s/generator/iterator/g on this docstring. Possibly rename this decorator to safe_iterator.

@NoahCarnahan
Copy link
Contributor Author

Thanks Daniel. I agree that its not ideal to break that protocol, but I think the practical advantages are worth it in this instance. I did consider an alternative implementation that would not break the iterator protocol:

class SafeIterable(object):
    def __init__(self, iterator):
        self._iterator = iterator
        self._has_been_iterated = False

    def __iter__(self):
        if self._has_been_iterated:
            raise IteratorAlreadyConsumedException
        self._has_been_iterated = True
        return self._iterator

The downside to this implementation is that there is nothing stopping someone from calling iter() on the object, then passing the iterator that it returns around.
A third option would be to put a soft assert into the original implementation so that we get an email instead of raising an exception, but of course that isn't quite as useful.

@millerdev
Copy link
Contributor

Obligatory link to Python-Dev post:
https://mail.python.org/pipermail/python-dev/2002-July/026459.html

The ensuing discussion is a fun read at first, but goes on for much too long and descends into rather time-wasting pedantry at times.

Of note, IteratorExhausted was proposed:
https://mail.python.org/pipermail/python-dev/2002-July/026469.html

And ultimately shot down by the BDFL. I'll leave it as an exercise for you, dear reader, to discover that bitter end.

@czue
Copy link
Member

czue commented Dec 6, 2016

not sure if it's the right mechanism, but would this be a good thing to take to architecture review to try and reach consensus? seems we are making certain tradeoffs either way, and maybe trying to be more inclusive/thoughtful about those in a wider group could help reach consensus.

@czue
Copy link
Member

czue commented Dec 6, 2016

also @millerdev re: this point:

Can we avoid putting anything new in dimagi-utils? Seems like this repo should eventually be merged back into HQ since it's so tightly coupled and the submodule workflow only adds overhead.

I agree with this - think we've said it a few times but maybe not documented formally anywhere.

I think it was put here because @NoahCarnahan wants to patch code in dimagi utils to use this. so seems the options would be a library that both dimagi-utils and HQ can depend on, or moving a lot of code from dimagi-utils to HQ, both of which could be a lot of overhead. So it's possible we might decide that this rule can be broken if other code in dimagi-utils needs to depend on new code.

@NoahCarnahan
Copy link
Contributor Author

Good call, I'll send an email to architecture review. Didn't realize that we were trying to avoid adding code to this repo. Once we decide on an approach, I'll look into how difficult it would be to move this code (and the code it needs to patch) into the main repo, but might be inclined to keep it here if it looks like a lot of work.

@millerdev
Copy link
Contributor

I think it was put here because @NoahCarnahan wants to patch code in dimagi utils to use this.

Ah, I didn't look closely enough at the usage context. Yes, that makes sense. So possible next steps could be (in order of preference):

  • use this as an opportunity to push the merger of dimagi-utils into commcare-hq and just be done with these annoyances already. Moving this repo to ex-submodules wouldn't be that hard would it?
  • leave it here and defer that for another rainy day 😞

@esoergel
Copy link
Contributor

esoergel commented Dec 6, 2016

Quick note on dimagi-utils as a submodule - it IS still referenced by other submodules:

$ ag --count --ignore dimagi-utils-src dimagi.utils submodules/
auditcare-src/auditcare/management/commands/generate_request_report.py:1
auditcare-src/auditcare/signals.py:1
auditcare-src/auditcare/views.py:1
auditcare-src/README.rst:1
formtranslate-src/formtranslate/api.py:1
touchforms-src/touchforms/formplayer/views.py:1
couchlog-src/setup.py:3
couchlog-src/couchlog/signals.py:1
couchlog-src/couchlog/views.py:3
couchlog-src/couchlog/tests.py:1
couchlog-src/couchlog/models.py:1

So we'd need to either merge in those modules as well, or copy/paste the utilities they depend on. There are likely a bunch of utils in dimagi-utils that are only used in commcare-hq, though, which could be moved to corehq/util without much ado.

I'd support most any such effort, but that does seem pretty out of scope for this PR, so my vote would be to merge SafeIterator now, then do the submodule clean-up separately.

@millerdev
Copy link
Contributor

Ah, didn't realize there were a that many other dependencies outside of commcare-hq. That definitely does increase the scope, so I'm fine with merging this PR and dealing with that issue separately.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants