Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update view_corr_mat in visualisation commands and Update the introduction page in docs #108

Merged
merged 8 commits into from
Apr 18, 2019

Conversation

wingedRuslan
Copy link
Collaborator

@wingedRuslan wingedRuslan commented Apr 10, 2019

  • I'm ready to merge
  1. The function view_corr_mat in visualisation_commands is redesigned so that one can pass a correlation matrix (pandas dataframe) and plot that, or a filename instead and plot the matrix saved in the filename. In the function body I check the type of the argument.
    I did not want to add more arguments(corr_mat, output, corr_mat_file=None) to the function, because in this case if a user wants to use a filename, a function call would be view_corr_mat(None, output-file, filename).
    Passing None as an argument could be unclear and intimidating for non-coders. Besides, calling the function would be different for 2 cases:
    view_corr_mat(corr_mat-object, output-file) - to plot correlation matrix (pandas DataFrame)
    view_corr_mat(output-file, filename) - to plot correlation matrix (saved in file)

But if you believe that including 2 arguments is better, please let me know, and I will make appropriate changes :)

  1. Getting started section is modified and updated in the introduction page.
  • highlighted that we have automatically generated docstring documentation
  • added links to the submodules page ,the index, the module index pages.
  • readers are informed that they can use the search bar to look up functions
  • What should a reviewer feedback on?
    The decision regarding the view_corr_mat function
    Is the updated introduction page clear for users?

  • Does anything need to be updated after merge?
    The project's website -> introduction page

@wingedRuslan
Copy link
Collaborator Author

wingedRuslan commented Apr 10, 2019

I have built the sphinx documentation locally before submitting the pull request. And it shows the docs as expected.

Copy link
Member

@KirstieJane KirstieJane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fab! Thanks @wingedRuslan! I'll ping @Islast and see if she wants to weigh in, and if you're up for adding a test for the corr_mat file that would be pretty fab...but otherwise it all looks good to me 🚀


Parameters
----------
corr_mat : :class:`pandas.DataFrame` or :class:`str`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic makes sense to me! Thanks for clearly explaining your thought process in the PR 😸

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also agree with your reasoning here, fewer arguments is great, and your changes to the docstrings are very clear.


# If cost is given then roughly threshold at that cost.
# NOTE - this is not actually the EXACT network that you're analysing
# because it doesn't include the minimum spanning tree. But it will give
# you a good sense of the network structure.
# #GoodEnough ;)

if isinstance(corr_mat, str):
M = np.loadtxt(corr_mat) # Read in the data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels like we should add a test for this, what do you think @wingedRuslan?

If that feels a bit too much then we can just open an issue noting that we should check the dimensions of the file when we load it in. Up to you and @Islast

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can leave np.loadtxt to do the heavy lifting on properly importing data. Or did you mean it's worth checking the file isn't too large before trying to import it? That's something I've never really considered

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry folks! Massively unclear on my part.

The test that @wingedRuslan has added further down to check that the data is square is what I was thinking of, not anything about reading in the data properly 😬

docs/source/introduction.rst Show resolved Hide resolved
Copy link
Collaborator

@Islast Islast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!:hibiscus::tada: I'm very pleased with the changes to the docs, I think you've done a really good job of translating my slightly vague ideas about making it easier to navigate for a newcomer into reality.

The changes to the code are also really good. I agree wholeheartedly with the changes you've made, but I have a couple of additional requests

  1. I would like to support numpy arrays as well as pandas dataframes as input to this function.
  2. This is one I don't feel strongly about and am happy for you to go with your gut, but I have made a comment further down about how I usually to do type-checking, just in case you aren't familiar with raising errors in python. It's a useful tool to have.

I really appreciate the work you've put into this @wingedRuslan

scona/scripts/visualisation_commands.py Show resolved Hide resolved
elif isinstance(corr_mat, pd.DataFrame):
M = corr_mat.to_numpy() # Convert the DataFrame to a NumPy array
else:
print("Please provide correlation matrix as pandas.DataFrame object or as a path to the file containing the matrix")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like type checking a whole lot. My go to for handling these cases is raising a type error. This is nice because printed output can often get lost. In this case raising an exception would look like:

else:
    raise TypeError("corr_mat argument must be a pandas.DataFrame object or as a path to the file containing the matrix")

My reasoning for doing things this way is that if this command is run with a whole load of other code, it's easy for a printed message to get lost. Raising an error message aborts the execution of code (unless there is an exception for it specified) and the printed output will end with

File: "somefile", line somenumber, 
TypeError  corr_mat argument must be a  pandas.DataFrame object or as a path to the file containing the matrix

which gives the user a lot of useful information to help debug

If you come across a case when you want to convey some information to the user about how the code is being executed, but you don't want to abort the code, raising a warning is another good trick.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've got a point! 👍
I will adjust the type checking by raising a type error.


Parameters
----------
corr_mat : :class:`pandas.DataFrame` or :class:`str`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also agree with your reasoning here, fewer arguments is great, and your changes to the docstrings are very clear.

docs/source/introduction.rst Show resolved Hide resolved

# If cost is given then roughly threshold at that cost.
# NOTE - this is not actually the EXACT network that you're analysing
# because it doesn't include the minimum spanning tree. But it will give
# you a good sense of the network structure.
# #GoodEnough ;)

if isinstance(corr_mat, str):
M = np.loadtxt(corr_mat) # Read in the data
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can leave np.loadtxt to do the heavy lifting on properly importing data. Or did you mean it's worth checking the file isn't too large before trying to import it? That's something I've never really considered

Copy link
Member

@KirstieJane KirstieJane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fab! Thanks @wingedRuslan!

Last question - do we check anywhere that all the values are floats? I feel like I can’t see that but I’m also doing this on my phone and might just have missed it. The biggest error is going to be if people read in a file that has column and row labels (in my best guess opinion) so a check for that would be great...

elif isinstance(corr_mat, np.ndarray):
M = corr_mat # support numpy array as input to the function
else:
raise TypeError("corr_mat argument must be a 1)pandas.DataFrame object or 2) numpy.array or 3)a path to the file containing the matrix")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tiny typo - can you add a space after 1) and 3)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh - and I just thought, maybe re-order the text or the order of the arguments so that 1, 2 and 3 correspond to the order of the if statements above?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, I will re-order the text in TypeError message and fix typos in the next commit

else:
raise TypeError("corr_mat argument must be a 1)pandas.DataFrame object or 2) numpy.array or 3)a path to the file containing the matrix")

if M.shape[0] != M.shape[1]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what I was thinking of! Great stuff!!


# If cost is given then roughly threshold at that cost.
# NOTE - this is not actually the EXACT network that you're analysing
# because it doesn't include the minimum spanning tree. But it will give
# you a good sense of the network structure.
# #GoodEnough ;)

if isinstance(corr_mat, str):
M = np.loadtxt(corr_mat) # Read in the data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry folks! Massively unclear on my part.

The test that @wingedRuslan has added further down to check that the data is square is what I was thinking of, not anything about reading in the data properly 😬

@wingedRuslan wingedRuslan requested a review from Islast April 18, 2019 08:40
Copy link
Collaborator

@Islast Islast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything looks great, merge when ready

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants