-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update view_corr_mat in visualisation commands and Update the introduction page in docs #108
Conversation
…DataFrame object and PathToFile
I have built the sphinx documentation locally before submitting the pull request. And it shows the docs as expected. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fab! Thanks @wingedRuslan! I'll ping @Islast and see if she wants to weigh in, and if you're up for adding a test for the corr_mat
file that would be pretty fab...but otherwise it all looks good to me 🚀
|
||
Parameters | ||
---------- | ||
corr_mat : :class:`pandas.DataFrame` or :class:`str` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic makes sense to me! Thanks for clearly explaining your thought process in the PR 😸
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also agree with your reasoning here, fewer arguments is great, and your changes to the docstrings are very clear.
|
||
# If cost is given then roughly threshold at that cost. | ||
# NOTE - this is not actually the EXACT network that you're analysing | ||
# because it doesn't include the minimum spanning tree. But it will give | ||
# you a good sense of the network structure. | ||
# #GoodEnough ;) | ||
|
||
if isinstance(corr_mat, str): | ||
M = np.loadtxt(corr_mat) # Read in the data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feels like we should add a test for this, what do you think @wingedRuslan?
If that feels a bit too much then we can just open an issue noting that we should check the dimensions of the file when we load it in. Up to you and @Islast ✨
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can leave np.loadtxt
to do the heavy lifting on properly importing data. Or did you mean it's worth checking the file isn't too large before trying to import it? That's something I've never really considered
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry folks! Massively unclear on my part.
The test that @wingedRuslan has added further down to check that the data is square is what I was thinking of, not anything about reading in the data properly 😬
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work!:hibiscus::tada: I'm very pleased with the changes to the docs, I think you've done a really good job of translating my slightly vague ideas about making it easier to navigate for a newcomer into reality.
The changes to the code are also really good. I agree wholeheartedly with the changes you've made, but I have a couple of additional requests
- I would like to support numpy arrays as well as pandas dataframes as input to this function.
- This is one I don't feel strongly about and am happy for you to go with your gut, but I have made a comment further down about how I usually to do type-checking, just in case you aren't familiar with raising errors in python. It's a useful tool to have.
I really appreciate the work you've put into this @wingedRuslan
elif isinstance(corr_mat, pd.DataFrame): | ||
M = corr_mat.to_numpy() # Convert the DataFrame to a NumPy array | ||
else: | ||
print("Please provide correlation matrix as pandas.DataFrame object or as a path to the file containing the matrix") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like type checking a whole lot. My go to for handling these cases is raising a type error. This is nice because printed output can often get lost. In this case raising an exception would look like:
else:
raise TypeError("corr_mat argument must be a pandas.DataFrame object or as a path to the file containing the matrix")
My reasoning for doing things this way is that if this command is run with a whole load of other code, it's easy for a printed message to get lost. Raising an error message aborts the execution of code (unless there is an exception for it specified) and the printed output will end with
File: "somefile", line somenumber,
TypeError corr_mat argument must be a pandas.DataFrame object or as a path to the file containing the matrix
which gives the user a lot of useful information to help debug
If you come across a case when you want to convey some information to the user about how the code is being executed, but you don't want to abort the code, raising a warning is another good trick.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You've got a point! 👍
I will adjust the type checking by raising a type error.
|
||
Parameters | ||
---------- | ||
corr_mat : :class:`pandas.DataFrame` or :class:`str` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also agree with your reasoning here, fewer arguments is great, and your changes to the docstrings are very clear.
|
||
# If cost is given then roughly threshold at that cost. | ||
# NOTE - this is not actually the EXACT network that you're analysing | ||
# because it doesn't include the minimum spanning tree. But it will give | ||
# you a good sense of the network structure. | ||
# #GoodEnough ;) | ||
|
||
if isinstance(corr_mat, str): | ||
M = np.loadtxt(corr_mat) # Read in the data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can leave np.loadtxt
to do the heavy lifting on properly importing data. Or did you mean it's worth checking the file isn't too large before trying to import it? That's something I've never really considered
…ing a TypeError if unsupported type provided
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fab! Thanks @wingedRuslan!
Last question - do we check anywhere that all the values are floats? I feel like I can’t see that but I’m also doing this on my phone and might just have missed it. The biggest error is going to be if people read in a file that has column and row labels (in my best guess opinion) so a check for that would be great...
elif isinstance(corr_mat, np.ndarray): | ||
M = corr_mat # support numpy array as input to the function | ||
else: | ||
raise TypeError("corr_mat argument must be a 1)pandas.DataFrame object or 2) numpy.array or 3)a path to the file containing the matrix") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tiny typo - can you add a space after 1) and 3)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh - and I just thought, maybe re-order the text or the order of the arguments so that 1, 2 and 3 correspond to the order of the if statements above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, I will re-order the text in TypeError message and fix typos in the next commit
else: | ||
raise TypeError("corr_mat argument must be a 1)pandas.DataFrame object or 2) numpy.array or 3)a path to the file containing the matrix") | ||
|
||
if M.shape[0] != M.shape[1]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what I was thinking of! Great stuff!!
|
||
# If cost is given then roughly threshold at that cost. | ||
# NOTE - this is not actually the EXACT network that you're analysing | ||
# because it doesn't include the minimum spanning tree. But it will give | ||
# you a good sense of the network structure. | ||
# #GoodEnough ;) | ||
|
||
if isinstance(corr_mat, str): | ||
M = np.loadtxt(corr_mat) # Read in the data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry folks! Massively unclear on my part.
The test that @wingedRuslan has added further down to check that the data is square is what I was thinking of, not anything about reading in the data properly 😬
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything looks great, merge when ready
What's the context for this pull request?
Solves issue update view_corr_mat in visualisation commands #83 and issue More explicit linking to autodoc content within documentation #106
What's new?
I did not want to add more arguments(corr_mat, output, corr_mat_file=None) to the function, because in this case if a user wants to use a filename, a function call would be
view_corr_mat(None, output-file, filename)
.Passing None as an argument could be unclear and intimidating for non-coders. Besides, calling the function would be different for 2 cases:
view_corr_mat(corr_mat-object, output-file)
- to plot correlation matrix (pandas DataFrame)view_corr_mat(output-file, filename)
- to plot correlation matrix (saved in file)But if you believe that including 2 arguments is better, please let me know, and I will make appropriate changes :)
What should a reviewer feedback on?
The decision regarding the
view_corr_mat function
Is the updated introduction page clear for users?
Does anything need to be updated after merge?
The project's website -> introduction page