Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-key-column joins #16

Open
adamConnerSax opened this issue May 14, 2021 · 2 comments
Open

Multi-key-column joins #16

adamConnerSax opened this issue May 14, 2021 · 2 comments
Labels
enhancement 🚀 New feature or request R&D : library

Comments

@adamConnerSax
Copy link

It seems essential to me that the library be able to join using multiple columns as the join key. I don't know if the underlying Trie makes that simpler.

The simplest version might simply add a column holding a product (maybe [v]?) of the key-columns to each Frame, then join the new Frames on that new column, then remove that column in the result. That's how I'd do it from outside the library. And that doesn't seem horribly inefficient. But I'm imagining there's a better way?

@ocramz ocramz added enhancement 🚀 New feature or request R&D : library labels May 15, 2021
@ocramz
Copy link
Owner

ocramz commented May 15, 2021

Trying to wrap my head around this; could you provide an example of the input and result?

@adamConnerSax
Copy link
Author

I can give a more thorough example later but here's a sketch: Suppose I have rows of data, each recording, e.g., votes by people with certain demographic characteristics (age, sex, education) in certain geographic areas of the US. And suppose that the geography is specified by two keys: US State and Congressional district within the state. Now I want to join that data to similarly geographically specified data about, e.g., median income. The join is on the pair of geographic columns.

I can, of course, make a new column holding a tuple and then join on that. But it's convenient to skip those extra steps. I think a combinator could handle this fine.

The joins in Frames are already like this. The key is specified as a type-list of (Symbol, Type).

In the Heidi case, I guess k arguments would become Set k?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement 🚀 New feature or request R&D : library
Projects
None yet
Development

No branches or pull requests

2 participants