Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to run KNearNeigh with just colors rather than 1 mag + colors #20

Open
1 of 3 tasks
sschmidt23 opened this issue Aug 14, 2024 · 4 comments
Open
1 of 3 tasks
Assignees
Labels
enhancement New feature or request

Comments

@sschmidt23
Copy link
Collaborator

During Raphael Shirley's presentation on rail_lephare, Jeff mentioned that with very disparate mag distributions for training and test, the inclusion of a magnitude in the K nearest neighbor distance calculation could actually make things worse than just using colors without the magnitude. Adding an option to allow running with just colors would let people test this, should be very easy to implement.

Before submitting
Please check the following:

  • I have described the purpose of the suggested change, specifying what I need the enhancement to accomplish, i.e. what problem it solves.
  • I have included any relevant links, screenshots, environment information, and data relevant to implementing the requested feature, as well as pseudocode for how I want to access the new functionality.
  • If I have ideas for how the new feature could be implemented, I have provided explanations and/or pseudocode and/or task lists for the steps.
@sschmidt23 sschmidt23 added the enhancement New feature or request label Aug 14, 2024
@sschmidt23 sschmidt23 self-assigned this Aug 14, 2024
sschmidt23 added a commit that referenced this issue Aug 15, 2024
add 'only_colors' option to KNearNeigh #20
@raphaelshirley
Copy link

Hi,

I am trying to run with this option set. I am getting a dimension error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[119], line 1
----> 1 knn_estimated = estimate_knn.estimate(test_data_handle)

File ~/Documents/github/lincc/rail_base/src/rail/estimation/estimator.py:97, in CatEstimator.estimate(self, input_data)
     73 """The main interface method for the photo-z estimation
     74 
     75 This will attach the input data (defined in ``inputs`` as "input") to this
   (...)
     94     Handle providing access to QP ensemble with output data
     95 """
     96 self.set_data("input", input_data)
---> 97 self.run()
     98 self.finalize()
     99 return self.get_handle("output")

File ~/Documents/github/lincc/rail_base/src/rail/estimation/estimator.py:110, in CatEstimator.run(self)
    108 for s, e, test_data in iterator:
    109     print(f"Process {self.rank} running estimator on chunk {s} - {e}")
--> 110     self._process_chunk(s, e, test_data, first)
    111     first = False
    112     # Running garbage collection manually seems to be needed
    113     # to avoid memory growth for some estimators

File ~/Documents/github/lincc/rail_sklearn/src/rail/estimation/algos/k_nearneigh.py:195, in KNearNeighEstimator._process_chunk(self, start, end, data, first)
    192         knn_df.loc[np.isclose(knn_df[col], self.config.nondetect_val), col] = np.float32(self.config.mag_limits[col])
    194 testcolordata = _computecolordata(knn_df, self.config.ref_band, self.config.bands, self.config.only_colors)
--> 195 dists, idxs = self.kdtree.query(testcolordata, k=self.numneigh)
    196 dists += TEENY
    197 test_ens = _makepdf(dists, idxs, self.trainszs, self.sigma)

File sklearn/neighbors/_binary_tree.pxi:1183, in sklearn.neighbors._kd_tree.BinaryTree64.query()

ValueError: query data dimension must match training data dimension

Do I need to change the input data handle after setting only_colors=True in the informer?

@sschmidt23
Copy link
Collaborator Author

I think I set things up so that you had to set the config parameter only_colors=True in both the inform and estimate stage, is it possible that you only set it in one of the two, and thus the model expects a different number of parameters? I could store the boolean for only_colors in the model to ensure that a user can't accidentally use differing values for inform and estimate, if that is less confusing.

If it's not that, then this error could also maybe be caused if you are using a non-default set of names for the magnitudes via bands and do not also supply the dictionary of mag_limits for those bands.

@raphaelshirley
Copy link

Ah yes thanks, that fixed it setting it True in the estimate stage too.

@sschmidt23
Copy link
Collaborator Author

It probably makes more sense to store the only_colors value in the model, since a user shouldn't really be able to set it in the estimate stage if the model was trained for a specific case. I'll make a PR to do just that and merge in so that you only specify in the inform.

sschmidt23 added a commit that referenced this issue Oct 2, 2024
move only_colors to model and remove from estimate #20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants