Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Table: Add methods 'join' and 'with_column' #5251

Merged
merged 2 commits into from
Feb 19, 2021

Conversation

janezd
Copy link
Contributor

@janezd janezd commented Feb 9, 2021

Issue

Resolves #5115.

Please comment the method names. 'join' reminds of SQL's join, but it's kind of correct.

Includes
  • Code changes
  • Tests
  • Documentation

@codecov
Copy link

codecov bot commented Feb 9, 2021

Codecov Report

Merging #5251 (5429e1c) into master (0568368) will increase coverage by 0.03%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #5251      +/-   ##
==========================================
+ Coverage   85.18%   85.22%   +0.03%     
==========================================
  Files         300      300              
  Lines       60979    61037      +58     
==========================================
+ Hits        51946    52016      +70     
+ Misses       9033     9021      -12     

@ajdapretnar
Copy link
Contributor

Thanks for this one! 🎉

Few comments:

  1. I would prefer to name the method add_column. with_column really confused me.
  2. I would add docstrings. 🤷 (I can do it, if you want.)
  3. Why does this fail?
iris = Table(iris)
iris1 = iris[:,:2]
iris2 = iris[:,2:]
Table.join(iris1, iris2)

@janezd
Copy link
Contributor Author

janezd commented Feb 10, 2021

  1. I would prefer to name the method add_column. with_column really confused me.

Inspired by Kotlin's convention (e.g. withIndex). I'm not too happy with this name, but I didn't like _add_column and similar, because it suggests that the method adds a column to this table. I'll be happy to rename, but the name mustn't mislead.

  1. I would add docstrings. 🤷 (I can do it, if you want.)

Was planned. Now done.

  1. Why does this fail?
iris = Table(iris)
iris1 = iris[:,:2]
iris2 = iris[:,2:]
Table.join(iris1, iris2)

Wow. I was writing the tests and the code in parallel, and solved some problem by using an empty array of appropriate dimensions, but I accidentally just used the same dimension as the array in tests, soempty((5, 0)) was hardcoded. :)))

@lanzagar
Copy link
Contributor

lanzagar commented Feb 10, 2021

so empty((5, 0)) was hardcoded. :)))

Everybody knows that hardcoded things should be fitted to iris. Then nobody complains for a long time, because when they test the code it always works ;)

I also don't particularly like the name with_column. Maybe the effort to be so precise is a bit unneeded here, since unlike the differences between sort/sorted and similar we are talking about a method of the Table class which AFAIK we decided should be immutable in all such cases. Maybe that sacrifice at least allows us to use the most obvious name and not be misleading?

On the other hand join is a bit misleading for me, because I would definitely expect the option to join based on a unique identifier and not just sequentially. In connection to this: the method above (concatenate) used to allow axis=1 which we removed - do we remember the reason? And should we re-enable that now by calling join when axis=1? Or to avoid the name join just not have a separate method at all and use concatenate(..., axis=1) like in the olden days?
Sidenote: if we do keep two methods, I do not like the inconsistent signatures *tables vs tables

And finally: YAY, we are getting a method to add columns :)

@janezd
Copy link
Contributor Author

janezd commented Feb 10, 2021

add column

OK, I'll try one last appeal: doesn't this look nice?

amended = iris.with_column(label_var, label_col)

As opposed to

amended = iris.add_column(label_var, label_col)

In the second case, I'd expect amended to be None. (But it may be just me. Yesterday I thought for a moment that ndarray.flatten works in place!)

If I haven't convinced you, say so and I'm renaming it to add_column.

join

I wasn't sure about join, either. I still decided to have it as analogous to str.join.

But I agree on merging this with concatenate with axis=1 (and obviously using tables rather than *tables)

I forgot the reason for it's removal. I believe it was a mess. We probably thought it would be cleaner and didn't realize how much we'd miss it.

@ajdapretnar
Copy link
Contributor

I see your point, but I am somehow not used to methods describing the results, but rather the operation. For example, table.to_dense() returns a dense table. If it described the result, it would be table.as_dense(). concatenate() and join() both describe the action. If they described the result, they would be concatenated() and joined(). Or am I mistaken? Perhaps @markotoplak and @VesnaT can add their two cents.

As for join(), I was equally surprised it didn't accept the key. pandas' join() joins by key if given, else it joins by index. Is this the right moment to reiterate how pandas will solve everything?

@janezd janezd added the needs discussion Core developers need to discuss the issue label Feb 11, 2021
@janezd janezd mentioned this pull request Feb 11, 2021
3 tasks
@janezd
Copy link
Contributor Author

janezd commented Feb 12, 2021

Decision: concatenate with axis; add_column.

@janezd janezd changed the title [RFC] Table: Add methods 'join' and 'with_column' Table: Add methods 'join' and 'with_column' Feb 12, 2021
@janezd janezd removed the needs discussion Core developers need to discuss the issue label Feb 12, 2021
Copy link
Contributor

@lanzagar lanzagar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably think about using the new functions (especially add_column) in the many places where it is done manually now.
We obviously don't need to wait and can merge this beforehand, after the minor changes mentioned (docstring and name).

Orange/data/table.py Outdated Show resolved Hide resolved
Orange/data/table.py Outdated Show resolved Hide resolved
table = cls.from_numpy(domain, Xs, Ys, Ms, W, ids=tables[0].ids)
for ta in reversed(table_attrss):
table.attributes.update(ta)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name of the new table should also be set, otherwise it is 'untitled'.
The same approach can be used as for axis=0:

names = [table.name for table in tables if table.name != "untitled"]
if names:
    table.name = names[0]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I extracted the common code from _concatenate_vertical and _concatenate_horizontal to concatenate.

All changes are in the second commit.

@ajdapretnar ajdapretnar merged commit 5a1d61b into biolab:master Feb 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

API: add function for adding new columns
3 participants