-
-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Outer for SparseArray #940
Conversation
according to Pylint(R1721:unnecessary-comprehension)
fix small typo
Update osx.yml
|
Not sure what the latest llvm that supports Python 3.9 is. Try one by one :(
Thanks for fixing up the OSX CI testing. I am okay with the |
mathics/builtin/tensors.py
Outdated
@@ -329,7 +362,62 @@ def rec(item, rest_lists, current): | |||
elements.append(rec(element, rest_lists, current)) | |||
return Expression(head, *elements) | |||
|
|||
return rec(lists[0], lists[1:], []) | |||
def rec_sparse(item, rest_lists, current): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this code deserves to be moved to a mathics.eval
module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you tell me more about how the codes are organized in Mathics3? So that I can determine which code should be moved to mathics.eval
or other locations in future updates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This aspect is described in https://mathics-development-guide.readthedocs.io/en/latest/extending/code-overview/python-modules.html and mathics.eval
specifically in https://mathics-development-guide.readthedocs.io/en/latest/extending/code-overview/python-modules.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the references. But I still have some specific questions left to figure out
- Should
mathics.eval
be as general as possible, or is it just a place foreval()
of functions inmathics.builtin
? - I don't feel any essential difference between
rec()
andrec_sparse()
. So ifrec_sparse()
should be moved tomathics.eval.tensor
, I'd prefer to treatrec()
(inOuter
and inInner
) the same way. - If
rec()
is moved tomathics.eval.tensor
, it will require as many as 6 parameters(item, rest_lists, current, f, head, evaluation)
, which seems a bit weird to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The question as is leads me to believe that I might not have have conveyed the motivation. So let me start with that and wind up with some guidelines which I hope you can then decide on how to organize.
Currently Mathics3 evaluates expressions by walking a tree (which is specifically a kind of M-expression)
If Mathics3 is to be sped up, we need to turn the tree evaluation into a sequence of instructions. The functions that start eval_
you can think of as the instructions. They are top-level functions like Add, Subtract, Factor, etc. but with parameter checking done and objects like Integers and Strings turned into possibly lower, level objects. Basically there would be at least one for every Wolfram Alpha builtin function.
Of course, to support these high-level built in functions, we need other functions. Those will not start with eval_
.
With respect to rec()
and rec_sparse()
specifically, if you do not feel any essential difference between them, then, sure, they both can be moved somewhere under the the mathics.eval
module. Please, can we we come up with a more descriptive name than rec()
and rec_sparse()
and add docstrings to these functions to as well?
So finally, I have to describe or apologize for the crappiness of this code, since we are trying to encourage people not to follow the bad habits of what came before.
Originally, this was one huge monolith and we have not come close to digging ourselves out of the mess that is there. In the hierarchical OO monolith, there were a number of nested private functions such as rec()
(or is it "wreck") that had to duplicated in that organization and because they were or are, well, private functions which probably should not have been private but instead put in a common module where the function can be used both places.
Finally as for the number of parameters. It probably means that there is a structure, object, concept or organization that we are missing that exists that encapsulates those parts into a more comprehensive whole.
However since I don't understand what this rec
thing does, clearly I can't offer a suggestion. If you have an idea here and want to go with that, please suggest something or make that change.
Otherwise for the first cut, I would not worry about this initially. We can come back to that. What I have found is that there is so much of a mess, we just can't fix everything all at once. And whatever the ultimate better solution is, having splitting off the common rec
thing into a separate module will only help things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked at the situation with the rec()
and rec_sparse()
in more detail. Some of the things I said were a bit misleading, or just wrong (because I did not understand the true situation).
So let me clarify the situation here and correct some things I wrote.
First, yes indeed, this code goes back to the earliest times and that means the coding style is not good; names are misleading and there are neither docstrings nor documentation describing the code.
I suppose "rec" relates to the recursive implementation. The only way one could make this more vague would to use f()
or fn()
. But thankfully, at least the wrong term has not be used which was sometimes done.
In good programming practice, function names reflect what a function does more than how it is implemented. And here the name is vague too.
As a result, you find a number of internal functions called rec()
in different Builtin classes that are really not related to one another. But the two functions we are interested in here which are similar are rec()
and rec_sparse()
from Builtin Outer
.
A better name for these two functions would be create_outer_piece()
and create_outer_piece_sparse()
. If it is useful to mention these work recursively, that would be done in the docstring or description of the function. For example:
"""Recursively computes the outer product of list `item` with `rest_lists`;
`current` contains the results computed so far.
"""
It might also be nice to add type annotations. (This code was written in Python 2.7 when type annotations did not exist. But there comments, docstrings and good programming practice did exist in Python 2.7.)
Here is a suggestion for reorganization...
Create a function called eval_Outer()
in mathics.eval.tensors
, and move the Python code in current method in Outer
called eval()
to that renaming and improving the rec()
and rec_sparse()
methods. They can still be an internal function as it is here.
eval()
then just calls eval_Outer
.
Later, if we want reorganized things further we are in a better position to do that.
I suspect that you like most people were expecting you'd just make this little change and to get the code to understand sparse arrays and be done with it. I know if I were contributing here, that's what I expect.
The reality is there is a big mess that we picked up that we are still trying to dig ourselves out of. In the past, a great deal of our effort has been just clean up work of the kind mentioned here.
It has been a big mistake to just keep coding in the style and organization that came before. That's what led to this being a big mess rather than just a mess.
To the extent you can help us dig out of this, this is greatly appreciated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to say, my main concern was about having a more or less complicated implementation of the eval_
methods inside the mathics.builtin.tensor
module, not about the private, nested functions. Of course, it is better if these functions are properly named and documented, but at this stage, it would be enough for me to move the implementation to mathics.eval.tensor
, something like
def eval_outer(f, list, evaluation):
"""Evaluates recursively the outer product"""
...
and
In mathics.builtin.tensor
def eval(self, f, lists, evaluation):
"Outerf[f_, lists__]"
return eval_outer(f, lists, evaluation)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like the C/C++ style header-source separation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I guess so. I suppose this would be a good way to describe it in the developer documentation.
And like C/C++ header-source separation it is not strictly mandatory to make code work, just accepted practice.
As @mmatera says, this is good. We can merge this as is, or do you want to do the other stuff in addition. That can be done in another PR if you prefer.
Let us know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that (at least in tensor-related evaluations) there are only two cases where recursion is needed, namely tensor contraction (i.e. Inner, TensorContract, etc.
) and direct product (i.e. Outer, TensorProduct, etc.
). So when I have time I'll try to combine rec()
and rec_sparse()
into a single public function called create_outer_piece()
or something else, and do the same thing for Inner
(obviously Inner
shares the same bug as Outer
).
However, that is a completely different thing than fixing Outer
. So I would prefer this PR to be merged as is, then I will do the above things in another PR.
Apart from the code organization comment, it would be nice to add some doctests/pytests. |
@Li-Xiang-Ideal Let's separate the OSX CI fix from the Outer part. |
move almost all ``eval()`` to ``mathics.eval.tensors``
Awesome! |
mathics/builtin/distance/clusters.py
Outdated
@@ -139,7 +139,7 @@ def _cluster(self, p, k, mode, evaluation, options, expr): | |||
options, "DistanceFunction", evaluation | |||
) | |||
if distance_function_string == "Automatic": | |||
from mathics.builtin.tensors import get_default_distance | |||
from mathics.eval.tensors import get_default_distance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW - another advantage of moving interpreter evaluation code out of the builtin class is that we don't need to put imports inside functions like this, which considered bad style.
So this import can be moved to the top now.
It is another sign that modularity has been improved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. But when I saw so many distance functions in mathics.builtin.distance.cluster
, my first thought was the "minimal substitution principle" :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a sense I agree, and I find this counterintuitive.
However the majority of Pythonistas have decided that this is bad style though. And Pythonistas have a philosophy that in Python there is only one way to do things - their way.
That said, I will try to justify this - weakly. There is this whole culture of lint checkers that work best when imports are at the top. I have been told by Python programmers that putting all imports at the top like you have to do in Pascal makes the code easier to read. Lastly, in Python there is a small amount of code that gets run every time that function gets called (which in some cases can be less code run when the number of calls is 0)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I'm new to python and know little about such things as philosophy of Pythonistas, the last time I used a language other than Mathematica and TeX was about three years ago. Certainly, when I write a project on my own, I will avoid ugly codes such as naming a function rec()
or importing a module inside a function. But when I'm involved in an open source project, I tend to subconsciously write things similar to existing codes.
So thanks for sharing this. I'll feel encouraged to write codes looking nicer.
LGTM. |
@Li-Xiang-Ideal If you want to be able to commit directly to this repository let me know and I'll add you to the list of committers. |
@rocky OK that would be more convenient for me. thx |
You should now have access. |
See issue #939