Fix Outer for SparseArray #940

Li-Xiang-Ideal · 2023-11-25T16:39:05Z

See issue #939

according to Pylint(R1721:unnecessary-comprehension)

fix small typo

Update osx.yml

Li-Xiang-Ideal · 2023-11-25T17:20:44Z

~~Update osx.yml since it's somewhat outdated: the latest llvm is llvm@17 instead of llvm@11~~
Dependency of llvm@17 on [email protected] would result in some problems in macOS check. Use llvm@14 instead.

Not sure what the latest llvm that supports Python 3.9 is. Try one by one :(

rocky · 2023-11-25T18:07:30Z

Thanks for fixing up the OSX CI testing.

I am okay with the Outer[] changes. @mmatera - your thoughts?

mmatera · 2023-11-25T19:26:17Z

mathics/builtin/tensors.py

@@ -329,7 +362,62 @@ def rec(item, rest_lists, current):
                    elements.append(rec(element, rest_lists, current))
                return Expression(head, *elements)

-        return rec(lists[0], lists[1:], [])
+        def rec_sparse(item, rest_lists, current):


Maybe this code deserves to be moved to a mathics.eval module.

Could you tell me more about how the codes are organized in Mathics3? So that I can determine which code should be moved to mathics.eval or other locations in future updates.

This aspect is described in https://mathics-development-guide.readthedocs.io/en/latest/extending/code-overview/python-modules.html and mathics.eval specifically in https://mathics-development-guide.readthedocs.io/en/latest/extending/code-overview/python-modules.html

Thanks for the references. But I still have some specific questions left to figure out

Should mathics.eval be as general as possible, or is it just a place for eval() of functions in mathics.builtin?

I don't feel any essential difference between rec() and rec_sparse(). So if rec_sparse() should be moved to mathics.eval.tensor, I'd prefer to treat rec() (in Outer and in Inner) the same way.

If rec() is moved to mathics.eval.tensor, it will require as many as 6 parameters (item, rest_lists, current, f, head, evaluation), which seems a bit weird to me.

The question as is leads me to believe that I might not have have conveyed the motivation. So let me start with that and wind up with some guidelines which I hope you can then decide on how to organize.

Currently Mathics3 evaluates expressions by walking a tree (which is specifically a kind of M-expression)

If Mathics3 is to be sped up, we need to turn the tree evaluation into a sequence of instructions. The functions that start eval_ you can think of as the instructions. They are top-level functions like Add, Subtract, Factor, etc. but with parameter checking done and objects like Integers and Strings turned into possibly lower, level objects. Basically there would be at least one for every Wolfram Alpha builtin function.

Of course, to support these high-level built in functions, we need other functions. Those will not start with eval_.

With respect to rec() and rec_sparse() specifically, if you do not feel any essential difference between them, then, sure, they both can be moved somewhere under the the mathics.eval module. Please, can we we come up with a more descriptive name than rec() and rec_sparse() and add docstrings to these functions to as well?

So finally, I have to describe or apologize for the crappiness of this code, since we are trying to encourage people not to follow the bad habits of what came before.

Originally, this was one huge monolith and we have not come close to digging ourselves out of the mess that is there. In the hierarchical OO monolith, there were a number of nested private functions such as rec() (or is it "wreck") that had to duplicated in that organization and because they were or are, well, private functions which probably should not have been private but instead put in a common module where the function can be used both places.

Finally as for the number of parameters. It probably means that there is a structure, object, concept or organization that we are missing that exists that encapsulates those parts into a more comprehensive whole.

However since I don't understand what this rec thing does, clearly I can't offer a suggestion. If you have an idea here and want to go with that, please suggest something or make that change.

Otherwise for the first cut, I would not worry about this initially. We can come back to that. What I have found is that there is so much of a mess, we just can't fix everything all at once. And whatever the ultimate better solution is, having splitting off the common rec thing into a separate module will only help things.

I looked at the situation with the rec() and rec_sparse() in more detail. Some of the things I said were a bit misleading, or just wrong (because I did not understand the true situation).

So let me clarify the situation here and correct some things I wrote.

First, yes indeed, this code goes back to the earliest times and that means the coding style is not good; names are misleading and there are neither docstrings nor documentation describing the code.

I suppose "rec" relates to the recursive implementation. The only way one could make this more vague would to use f() or fn(). But thankfully, at least the wrong term has not be used which was sometimes done.

In good programming practice, function names reflect what a function does more than how it is implemented. And here the name is vague too.

As a result, you find a number of internal functions called rec() in different Builtin classes that are really not related to one another. But the two functions we are interested in here which are similar are rec() and rec_sparse() from Builtin Outer.

A better name for these two functions would be create_outer_piece() and create_outer_piece_sparse(). If it is useful to mention these work recursively, that would be done in the docstring or description of the function. For example:

"""Recursively computes the outer product of list `item` with `rest_lists`; `current` contains the results computed so far. """

It might also be nice to add type annotations. (This code was written in Python 2.7 when type annotations did not exist. But there comments, docstrings and good programming practice did exist in Python 2.7.)

Here is a suggestion for reorganization...

Create a function called eval_Outer() in mathics.eval.tensors , and move the Python code in current method in Outer called eval() to that renaming and improving the rec() and rec_sparse() methods. They can still be an internal function as it is here.

eval() then just calls eval_Outer .

Later, if we want reorganized things further we are in a better position to do that.

I suspect that you like most people were expecting you'd just make this little change and to get the code to understand sparse arrays and be done with it. I know if I were contributing here, that's what I expect.

The reality is there is a big mess that we picked up that we are still trying to dig ourselves out of. In the past, a great deal of our effort has been just clean up work of the kind mentioned here.

It has been a big mistake to just keep coding in the style and organization that came before. That's what led to this being a big mess rather than just a mess.

To the extent you can help us dig out of this, this is greatly appreciated.

I have to say, my main concern was about having a more or less complicated implementation of the eval_ methods inside the mathics.builtin.tensor module, not about the private, nested functions. Of course, it is better if these functions are properly named and documented, but at this stage, it would be enough for me to move the implementation to mathics.eval.tensor, something like

def eval_outer(f, list, evaluation): """Evaluates recursively the outer product""" ...

and

In mathics.builtin.tensor

def eval(self, f, lists, evaluation): "Outerf[f_, lists__]" return eval_outer(f, lists, evaluation)

Sounds like the C/C++ style header-source separation?

Yes, I guess so. I suppose this would be a good way to describe it in the developer documentation.

And like C/C++ header-source separation it is not strictly mandatory to make code work, just accepted practice.

As @mmatera says, this is good. We can merge this as is, or do you want to do the other stuff in addition. That can be done in another PR if you prefer.

Let us know.

I believe that (at least in tensor-related evaluations) there are only two cases where recursion is needed, namely tensor contraction (i.e. Inner, TensorContract, etc.) and direct product (i.e. Outer, TensorProduct, etc.). So when I have time I'll try to combine rec() and rec_sparse() into a single public function called create_outer_piece() or something else, and do the same thing for Inner (obviously Inner shares the same bug as Outer).

However, that is a completely different thing than fixing Outer. So I would prefer this PR to be merged as is, then I will do the above things in another PR.

mmatera · 2023-11-25T19:29:46Z

Apart from the code organization comment, it would be nice to add some doctests/pytests.

rocky · 2023-11-25T21:53:54Z

@Li-Xiang-Ideal Let's separate the OSX CI fix from the Outer part.

Add some tests

mathics/builtin/tensors.py

move almost all ``eval()`` to ``mathics.eval.tensors``

mmatera · 2023-11-27T13:33:16Z

Awesome!

rocky · 2023-11-27T13:40:47Z

mathics/builtin/distance/clusters.py

@@ -139,7 +139,7 @@ def _cluster(self, p, k, mode, evaluation, options, expr):
            options, "DistanceFunction", evaluation
        )
        if distance_function_string == "Automatic":
-            from mathics.builtin.tensors import get_default_distance
+            from mathics.eval.tensors import get_default_distance


BTW - another advantage of moving interpreter evaluation code out of the builtin class is that we don't need to put imports inside functions like this, which considered bad style.

So this import can be moved to the top now.

It is another sign that modularity has been improved.

Indeed. But when I saw so many distance functions in mathics.builtin.distance.cluster, my first thought was the "minimal substitution principle" :)

In a sense I agree, and I find this counterintuitive.

However the majority of Pythonistas have decided that this is bad style though. And Pythonistas have a philosophy that in Python there is only one way to do things - their way.

That said, I will try to justify this - weakly. There is this whole culture of lint checkers that work best when imports are at the top. I have been told by Python programmers that putting all imports at the top like you have to do in Pascal makes the code easier to read. Lastly, in Python there is a small amount of code that gets run every time that function gets called (which in some cases can be less code run when the number of calls is 0)

Actually I'm new to python and know little about such things as philosophy of Pythonistas, the last time I used a language other than Mathematica and TeX was about three years ago. Certainly, when I write a project on my own, I will avoid ugly codes such as naming a function rec() or importing a module inside a function. But when I'm involved in an open source project, I tend to subconsciously write things similar to existing codes.

So thanks for sharing this. I'll feel encouraged to write codes looking nicer.

mmatera · 2023-11-27T16:33:08Z

LGTM.

rocky · 2023-11-27T17:56:28Z

@Li-Xiang-Ideal If you want to be able to commit directly to this repository let me know and I'll add you to the list of committers.

Li-Xiang-Ideal · 2023-11-28T02:50:30Z

@Li-Xiang-Ideal If you want to be able to commit directly to this repository let me know and I'll add you to the list of committers.

@rocky OK that would be more convenient for me. thx

rocky · 2023-11-28T03:43:56Z

@rocky OK that would be more convenient for me. thx

You should now have access.

Li-Xiang-Ideal added 17 commits November 21, 2023 14:23

Update tensors.py

c4b6752

according to Pylint(R1721:unnecessary-comprehension)

Merge branch 'Mathics3:master' into master

45149f3

Update SYMBOLS_MANIFEST.txt

983049d

Fix Outer for SparseArray

e171e0c

Update tensors.py

e7b68a3

fix small typo

Update tensors.py

0789bd6

Merge branch 'Mathics3:master' into master

4159b69

Update tensors.py

58dbb8b

Update tensors.py

9583f38

Update isort-and-black-checks.yml

f799576

Update tensors.py

f09b340

Update isort-and-black-checks.yml

6dddce8

Update osx.yml

12917a1

Update osx.yml

bbfc74c

Update osx.yml

9be0ca7

Update osx.yml

bb133b1

Merge pull request #1 from Li-Xiang-Ideal/OSX-test-patch-1

60611b4

Update osx.yml

Update osx.yml

8a0057f

Not sure what the latest llvm that supports Python 3.9 is. Try one by one :(

mmatera reviewed Nov 25, 2023

View reviewed changes

Li-Xiang-Ideal added 2 commits November 26, 2023 12:38

Update tensors.py

e235c04

Add some tests

Merge branch 'Mathics3:master' into master

3af6f9a

rocky reviewed Nov 27, 2023

View reviewed changes

mathics/builtin/tensors.py Outdated Show resolved Hide resolved

fix typo

4db5ea0

rocky reviewed Nov 27, 2023

View reviewed changes

mathics/builtin/tensors.py Outdated Show resolved Hide resolved

Li-Xiang-Ideal added 2 commits November 27, 2023 21:17

move eval from mathics.builtin.tensors to here

792d218

Update clusters.py

312eaff

Update tensors.py

088cbed

move almost all ``eval()`` to ``mathics.eval.tensors``

rocky reviewed Nov 27, 2023

View reviewed changes

Update clusters.py

daf7d6d

rocky merged commit 2843011 into Mathics3:master Nov 27, 2023
11 checks passed

Li-Xiang-Ideal linked an issue Dec 4, 2023 that may be closed by this pull request

Outer works improperly on SparseArray #939

Closed

Li-Xiang-Ideal mentioned this pull request Dec 11, 2023

Combine all rec() related to Outer #945

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Outer for SparseArray #940

Fix Outer for SparseArray #940

Li-Xiang-Ideal commented Nov 25, 2023

Li-Xiang-Ideal commented Nov 25, 2023 •

edited

Loading

rocky commented Nov 25, 2023

mmatera Nov 25, 2023

Li-Xiang-Ideal Nov 26, 2023

rocky Nov 26, 2023

Li-Xiang-Ideal Nov 26, 2023

rocky Nov 26, 2023 •

edited

Loading

rocky Nov 27, 2023 •

edited

Loading

mmatera Nov 27, 2023

Li-Xiang-Ideal Nov 27, 2023

rocky Nov 27, 2023 •

edited

Loading

Li-Xiang-Ideal Nov 27, 2023

mmatera commented Nov 25, 2023

rocky commented Nov 25, 2023

mmatera commented Nov 27, 2023

rocky Nov 27, 2023 •

edited

Loading

Li-Xiang-Ideal Nov 27, 2023

rocky Nov 27, 2023 •

edited

Loading

Li-Xiang-Ideal Nov 27, 2023

mmatera commented Nov 27, 2023

rocky commented Nov 27, 2023

Li-Xiang-Ideal commented Nov 28, 2023

rocky commented Nov 28, 2023

Fix Outer for SparseArray #940

Fix Outer for SparseArray #940

Conversation

Li-Xiang-Ideal commented Nov 25, 2023

Li-Xiang-Ideal commented Nov 25, 2023 • edited Loading

rocky commented Nov 25, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rocky Nov 26, 2023 • edited Loading

Choose a reason for hiding this comment

rocky Nov 27, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rocky Nov 27, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mmatera commented Nov 25, 2023

rocky commented Nov 25, 2023

mmatera commented Nov 27, 2023

rocky Nov 27, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rocky Nov 27, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mmatera commented Nov 27, 2023

rocky commented Nov 27, 2023

Li-Xiang-Ideal commented Nov 28, 2023

rocky commented Nov 28, 2023

Li-Xiang-Ideal commented Nov 25, 2023 •

edited

Loading

rocky Nov 26, 2023 •

edited

Loading

rocky Nov 27, 2023 •

edited

Loading

rocky Nov 27, 2023 •

edited

Loading

rocky Nov 27, 2023 •

edited

Loading

rocky Nov 27, 2023 •

edited

Loading