-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial draft for gallery of Seaborn Figure level to CanD axes level plots #13
base: master
Are you sure you want to change the base?
Conversation
TODO: sharey changes of font and graphics in CanD make smarter FacetGrid plots
I managed to also add the replot part. That brings me to
|
This is looking awesome! Responding to your questions:
A few more things:
Thanks for your work on this! |
Thank :), I'll work on the changes you proposed and update the file once it's done. I have now a short question -- why do you want me to remove the gitignore? Is there some reason why not to have it? I'm still relatively new to git(hub), mostly new to any sort of collaboration/contribution so I do not know the "good practise" if this is part of it. And one more question to 2) sharey, sharex. If I do a for loop, usually I do not have/save the axes separately hence I cannot do a simple
and alternatively then in a for loop I can write some |
Replace c.add_axis("axname", ...) ax = c.ax("axname") with ax = c.add_axis("axname", ...)
Sounds great! For question 1, .gitignore is to avoid git adding unneeded files when you automatically add lots of files, or when you list the status. For a project of this size, it isn't really needed, since there should never be mass additions to the repo or lots of temporary files. I may add one in the future, but this is a bigger decision related to core maintenance of the project. Additionally, these decisions shouldn't be coupled into unrelated PRs. You are welcome to use your own .gitignore file locally, but just don't add it to your commit. For question 2, it is better to do:
|
Will do/done. Thanks. I have been now playing with the sharing. It has a problem with sharing an axis more than one so chaining the previous won't work. I have made the following workaround: if i>0:
# sharey, take the before last axis and sharey with the last plotted one, cannot share 2 axis
c.ax(a[i]).sharey(c.ax(a[i-1]))
# need to rescale all axis
c.ax(a[i]).autoscale()
# remove ticks from shared y axes
plt.setp(ax.get_yticklabels(), visible=False)
# remove ylabel
ax.set_ylabel('') and I think it does the job but at the same time, I'm not 100 % sure that it won't break eventually, that the axis will have correct labels/ticks and scale. Could you please check? |
I have never done axis sharing myself, but regarding sharing it more than once, did you try putting the base axis (say ax0) in both positions, i.e., both:
and
I think one of these should work but not the other. If one of these work I think that is the more robust solution. You can always check any of the solutions by plotting an extra point on one of them and seeing if it updates the others too. |
You are right, this |
It sounded like, since you can't share an axis that is already shared, the function inserted some kind of link in one axis to keep it synced with another one. It makes sense to be able to set multiple things to the same source, but not multiple sources to the same thing. |
Hi @mwshinn, I have added one plot for Facetgrid, see plot 3, using the I also though about Seaborn's pairplot and I think one would need a function for this because the customisation is really hard. |
Looks good! A couple things you can do to improve the code in section 3:
What do you mean that resizing doesn't work nicely? It looks good to me. If it is easier, you can auto-size axes in add_grid with "spacing=" instead of "size=". I actually don't think pairplot would be as hard as you say. If you loop through i and j indices, you can plot a histogram if they are equal and a scatterplot if they are not. |
That was my original thought but then it starts breaking the other parts (or making them complicated), eg: Regarding Oh, I missed the Well, that's just half of the job. The strength comes with the fact that one can map anything to anywhere, not only having g = sns.pairplot(penguins, diag_kind="kde")
g.map_lower(sns.kdeplot, levels=4, color=".2") or mapping correlation values or any other function to it (this is |
Why doesn't the tuple work in the first case? Do you say it gets complicated because you need a string for the axis name? if you do list(product(b,a)), you can do c.add_grid([str(c) for c in conditions], ...). This makes things a lot simpler. As it stands now, you are creating your own homemade tuple! Or, if you don't want to name your axes like that, do c.add_grid([f"{a}|{b}" for a,b in conditions]). Or even simpler, my preference would be to just name your axes 0, 1, 2, ... based on the position of the tuple in the list. For query, currently you have
vs
I think the latter is much more readable. Aah, yes, you're right. Perhaps implementing one basic case of it would be good enough, since people could then make the modifications as needed? |
Sorry, it took so long, too much work... I fixed most of the things which you recommended, only the query I have not yet updated. There are now 3 figures of how to do what we discussed. I kept the two for illustration but also for one more issue. I actually have not yet figured out (no time :/) why it's happening but it you do what you suggest:
then it removes all the ticks, not just the ones one updates in the I also finished the Suggestions and comments are welcomed :). Also, the bug with the updated title-size is still there which I guess is related to another issue -- size of points in legend is not affected by seaborn's s parameter. |
TODO size of points in legend is not affected by seaborn's s parameter
Thanks a lot for your continued work on this! Looking great! Regarding the three, I don't think it's necessary to include the "Another way" part underneath "Preferred way". They only differ in how they are enumerating the conditions, which is more of a python-related difference than a CanD-related difference. Aah, yes, you're right about the axis labels. It looks like this is because of the shared axes and your way is the recommended way of doing this. You can remove the "original way" too since my suggestion didn't work. What do you mean by the following? "This is just playing with the same graph, one can do it with standard methods" I'm not sure I follow what the point of the last plot is. Yeah, the title size thing isn't fixed yet. It isn't a quick fix so I need to find the time to sort it out. |
Thanks :). Yes, I will remove them before the final version comes out, it was there mainly for you to see and maybe choose if you prefer another way in the examples than what I chose. There are still some other older matplotlib graphs too which I'm also gonna remove later. What I mean is that it's basically the same as the 8th figure, the differences are in: g = (g.set_axis_labels("Total bill (US Dollars)", "Tip")
.set(xlim=(0, 60), ylim=(0, 12),
xticks=[10, 30, 50], yticks=[2, 6, 10])
.fig.subplots_adjust(wspace=.02)) which is just setting limits and labels or spacing and one can do it easily with the previous code, there wasn't anything new so I didn't create one. I can do it thou, not a big problem. No worries, just I didn't want to forget it :). I also by mistake created those |
Oh, I see! Okay, yeah, we probably don't need that in the final version since it is more about seaborn than CanD. |
Oops, clicked the wrong button while responding |
I think, like you mention, the catplots would be great. I think a joint plot would be really nice in showing off what CanD is capable of, especially since people can easily adapt the different pieces of it, unlike in seaborn. I would prioritize that over a pairplot, since a pairplot seems to me more like an exploratory data analysis tool than something that would actually go in a paper. (Especially with how difficult it would be to support all of the functions.) In practice, though, I would be grateful for any of these that you end up implementing! |
Catplot done. Can I have a technical question? How do you orient yourself on the canvas? I find it really hard to figure out the coordinates and I basically do it by trial and error. But I guess there must be a much simpler way to do that. |
And I added jointplot too. Not fully done because there are some things I wanted to discuss and some I didn't have time for. I created a helper function for plotting the marginal distributions and left it quite open. There are many ways how that could be developed but I wanted to know your opinion first about this one. Should it have more parameters? Should it mimic seaborn closely or just provide a way how to solve this problem? Also, for the |
The catplots look really good! I agree it is difficult to orient yourself on the canvas. I usually use the "make_grid" function, which is designed to make this a bit easier. Create a grid over the whole figure with your desired grid spacing, and then remove the grid when you are done laying everything out. However, I would love it if there were a smoother way of orienting on the canvas. I considered making a gui at one point, but realized this would quickly become much more complicated than CanD itself. :P If you have any suggestions for better ways to do this I would be open to new ideas. I like the helper function you made, I'm certain that people will find this really useful! What kinds of parameters were you thinking about adding to it? If by mimicking seaborn more closely you mean by moving the legend, adjusting tick labels, etc., I would advise going for simplicity and mimicking the most important features, like you have done. If people want to move the legend, they can see your other examples where you do that. One thing that would be really useful is a textual description of the jointplot helper function, e.g., what does it do, how does it do it, etc. Looks like this is how seaborn does the colors: https://github.com/mwaskom/seaborn/blob/9425588d3498755abd93960df4ab05ec1a8de3ef/seaborn/axisgrid.py#L2212 |
Thank you :). Also thanks a lot for the code, I couldn't find it (also a thing which I'm still learning how to do effectively), I updated the colour. I added docs to the function but it already had some comments in it about what the lines do. How would you imagine it? I couldn't find the |
The docstring you added is good! In addition, I think it would be useful to have a paragraph in the notebook (outside the code) describing why we need a function to do this, just to orient the reader who might scroll down to here and not understand why they are looking at a function instead of plotting code. Something like "This is non-trivial to do, can be useful for a lot of things, easy to encapsulate in a function and reuse, all the following plots use it,...". Oops, sorry, I meant "debug_grid". This function does almost exactly what you describe in your first "nice to have" point. :P I fully agree with you on the "more examples people can copy" idea. :) The GUI you described sounds simple, but there are some factors which make it more complicated than it seems:
There might be some more, but these are the main ones which have stopped me from making a GUI. |
Found it, thanks, yes, that's almost as I thought of it :). I would still have a comment on the Hmm, I see, I didn't think of those problems. Some problems could be relatively easy to fix (eg. if you want a position relative to another graph, just add a checkbox or let the user type the name of the axis/plot it should relate to) but for most, I don't have a good answer. |
It is already possible to do every 5th/10th/n-th grid by calling the command twice, once with a spacing of the small unit and once (with thicker lines passed as an argument) for the larger unit. Numbers are a good idea, maybe I will try some things and see if I can get it to look good. |
Hi @jankaWIS ! I think this looks just about ready to go to add to the official documentation. I did a test run of what it will look like in the documentation and the results are attached as a png below. I think the following are the remaining things left to be done before it is ready to merge:
No rush of course, just wanted to leave this here for whenever you get around to looking at it again. Thanks! https://user-images.githubusercontent.com/951986/163253854-cd49d83f-f86c-4d97-bb8f-bd9f1e869be3.png |
Hi @mwshinn, yes, I totally agree.
The last two points I will do. Maybe I'll need some help with the before last point, I will try and let you know. I will go over it once again and see if I cleaned up everything. And btw., there is a watermark with my name at the very end. Thanks |
Hi @mwshinn,
as we discussed in this issue, here is the first draft of
Seaborn
's figure-level plots done inCanD
(and I left thematplotlib
there as well). I have a few comments/issues. I will try to work on some other gallery plots if time permits.Comments:
sharey
orsharex
in CanD? Relevant for some graphs (eg. 10)rst
.