Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[stdlib] Implement Dict.setdefault(key, default) #2803

Closed
wants to merge 19 commits into from

Conversation

msaelices
Copy link
Contributor

  • Implement Dict.setdefault(key, default)
  • Tests for Dict.setdefault(key, default)

@msaelices msaelices requested review from a team as code owners May 23, 2024 20:45
Signed-off-by: Manuel Saelices <[email protected]>
Copy link
Contributor

@gabrieldemarmiesse gabrieldemarmiesse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the pull request! It's a very useful function to have ! I actually discoved it last week and it made my code much simpler. I think we can improve the performance by avoiding copies and working with references whenever possible.

@@ -855,6 +855,22 @@ struct Dict[K: KeyElement, V: CollectionElement](
self._index = _DictIndex(self._reserved)
self._entries = Self._new_entries(self._reserved)

fn setdefault(inout self, key: K, owned default: V) -> V:
Copy link
Contributor

@gabrieldemarmiesse gabrieldemarmiesse May 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fn setdefault(inout self, key: K, owned default: V) -> V:
fn setdefault(inout self: Reference[Self, _, _], key: K, owned default: V) -> Reference[V, self.is_mutable, self.lifetime]:

It should be possible to return a reference here to avoid a move. It will make this method work with non-copyable types in the future (and also moar speeeeeed!!!!)

Comment on lines 871 to 872
self[key] = default
return default
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The line self[key] = default triggers a copy since there is another use of default afterwards.

What about replacing it with self[key] = default^?

We can then return self.__get_ref(key) to work with references :) No copy there.

I think in the future, for even better performance, we should return a reference to the object being inserted in the Dict._insert function. That can avoid a second lookup. It can be another pull request as it's more complicated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestions. Unfortunately, the self[][key] = default^ trick is not working:

image

❯ ./stdlib/scripts/run-tests.sh 
Creating build directory for building the stdlib running the tests in.
Packaging up the Standard Library.
Included from /home/msaelices/src/mojo/stdlib/src/collections/__init__.mojo:1:
/home/msaelices/src/mojo/stdlib/src/collections/dict.mojo:871:19: error: expression must be mutable in assignment
            self[][key] = default^
            ~~~~~~^~~~~
mojo: error: failed to parse the provided Mojo source module

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe self should be mutable for this to work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, the problem is that self needs to be declared as inout. The problem is that if I do that change I am now receiving this error:

image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe fn setdefault(self: Reference[Self, True, _], ...

Also if that's possible, can you avoid using screenshots? If you use copy-paste of text, it's better, as other can copy paste your error messages, and can also search for them from the search bar of github.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I followed your suggestions in this commit, even if it's failing:
msaelices@7b3844f

I'm waiting for your expertise @gabrieldemarmiesse . Thanks in advance.

Copy link
Contributor

@gabrieldemarmiesse gabrieldemarmiesse May 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested locally and this is the right fix. It works. Notice the True inside the Reference. Also you need to remove the inout

EDIT: Locally in the unit tests, I had to dereference the result with [], which is expected

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See gabrieldemarmiesse@1144383 for the fix

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much. I've just merged it in this PR: msaelices#1

@@ -524,10 +524,21 @@ fn test_clear() raises:
assert_equal(len(some_dict), 0)


fn test_dict_setdefault() raises:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add another test that verifies that no copies are happening when calling setdefault ? We can use the CopyCounter struct for this purpose. There is an example here: https://github.com/modularml/mojo/blob/nightly/stdlib/test/collections/test_list.mojo#L568

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do it when I manage to fix the compiler issue above. Thank you!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is still this new test to add

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done here: msaelices@7e53b09

It's curious. I thought the case of the "b" key would have 0 copies but it's 1.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you should use the transfer operator in the test ^? Maybe that's why there is a copy. Also I think calling __getitem__ triggers a copy

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried it but no luck: msaelices@91d9ee5

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be a bug in the dict implementation. let's skip this problem for now. I think it needs to be investigated later on, but that would be out of scope for this PR.

@gabrieldemarmiesse
Copy link
Contributor

Also can you rename your pull request? By putting [stdlib] at the start, it will make the CI happy :)

@msaelices msaelices changed the title Implement Dict.setdefault(key, default) [stdlib] Implement Dict.setdefault(key, default) May 24, 2024
msaelices and others added 4 commits May 25, 2024 00:51
It will not copy while returning the value
Suggested by @gabrieldemarmiesse

Signed-off-by: Manuel Saelices <[email protected]>
Signed-off-by: gabrieldemarmiesse <[email protected]>
@msaelices
Copy link
Contributor Author

@gabrieldemarmiesse I've merged your fix, and also synced the nightly branch fixing conflicts. Could you please TAL?

Thanks so much.

Comment on lines 873 to 874
self[][key] = default^
return self[].__get_ref(key)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self[][key] = default^
return self[].__get_ref(key)
self[][key] = default^
return self[].__get_ref(key)

There is no need to put it outside the except, no? Is it because it fails at compile time otherwise?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No special reason. Fixed here: msaelices@ab36941

@msaelices
Copy link
Contributor Author

@gabrieldemarmiesse could you please take another look? I've synced from nightly and fixed the issues because of breaking changes from that branch.

Copy link
Contributor

@gabrieldemarmiesse gabrieldemarmiesse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! There is still the problem with the copy to be investiguated, but I believe it's unrelated to your PR so we can skip it for now. At least it's documented in the unit tests :)

Copy link
Contributor

@gabrieldemarmiesse gabrieldemarmiesse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! There is still the problem with the copy to be investiguated, but I believe it's unrelated to your PR so we can skip it for now. At least it's documented in the unit tests :)

@ConnorGray
Copy link
Collaborator

!sync

@modularbot modularbot added the imported-internally Signals that a given pull request has been imported internally. label Jun 29, 2024
@modularbot
Copy link
Collaborator

✅🟣 This contribution has been merged 🟣✅

Your pull request has been merged to the internal upstream Mojo sources. It will be reflected here in the Mojo repository on the nightly branch during the next Mojo nightly release, typically within the next 24-48 hours.

We use Copybara to merge external contributions, click here to learn more.

@modularbot modularbot added the merged-internally Indicates that this pull request has been merged internally label Jul 18, 2024
modularbot pushed a commit that referenced this pull request Jul 19, 2024
[External] [stdlib] Implement `Dict.setdefault(key, default)`

Implement `Dict.setdefault` which returns a reference with the value of the item with the specified key.
If the key does not exist, it is inserted with the specified value into the `Dict`.

Co-authored-by: Manuel Saelices <[email protected]>
Closes #2803
MODULAR_ORIG_COMMIT_REV_ID: a5514f823c3b7dc978607435d8082f1efe14aa13
@modularbot modularbot added the merged-externally Merged externally in public mojo repo label Jul 19, 2024
@modularbot
Copy link
Collaborator

Landed in a842307! Thank you for your contribution 🎉

@modularbot modularbot closed this Jul 19, 2024
modularbot pushed a commit that referenced this pull request Sep 13, 2024
[External] [stdlib] Implement `Dict.setdefault(key, default)`

Implement `Dict.setdefault` which returns a reference with the value of the item with the specified key.
If the key does not exist, it is inserted with the specified value into the `Dict`.

Co-authored-by: Manuel Saelices <[email protected]>
Closes #2803
MODULAR_ORIG_COMMIT_REV_ID: a5514f823c3b7dc978607435d8082f1efe14aa13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
imported-internally Signals that a given pull request has been imported internally. merged-externally Merged externally in public mojo repo merged-internally Indicates that this pull request has been merged internally
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants