Add option to run KNearNeigh with just colors rather than 1 mag + colors #20

sschmidt23 · 2024-08-14T21:56:29Z

During Raphael Shirley's presentation on rail_lephare, Jeff mentioned that with very disparate mag distributions for training and test, the inclusion of a magnitude in the K nearest neighbor distance calculation could actually make things worse than just using colors without the magnitude. Adding an option to allow running with just colors would let people test this, should be very easy to implement.

Before submitting
Please check the following:

I have described the purpose of the suggested change, specifying what I need the enhancement to accomplish, i.e. what problem it solves.
I have included any relevant links, screenshots, environment information, and data relevant to implementing the requested feature, as well as pseudocode for how I want to access the new functionality.
If I have ideas for how the new feature could be implemented, I have provided explanations and/or pseudocode and/or task lists for the steps.

add 'only_colors' option to KNearNeigh #20

raphaelshirley · 2024-09-30T13:44:06Z

Hi,

I am trying to run with this option set. I am getting a dimension error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[119], line 1
----> 1 knn_estimated = estimate_knn.estimate(test_data_handle)

File ~/Documents/github/lincc/rail_base/src/rail/estimation/estimator.py:97, in CatEstimator.estimate(self, input_data)
     73 """The main interface method for the photo-z estimation
     74 
     75 This will attach the input data (defined in ``inputs`` as "input") to this
   (...)
     94     Handle providing access to QP ensemble with output data
     95 """
     96 self.set_data("input", input_data)
---> 97 self.run()
     98 self.finalize()
     99 return self.get_handle("output")

File ~/Documents/github/lincc/rail_base/src/rail/estimation/estimator.py:110, in CatEstimator.run(self)
    108 for s, e, test_data in iterator:
    109     print(f"Process {self.rank} running estimator on chunk {s} - {e}")
--> 110     self._process_chunk(s, e, test_data, first)
    111     first = False
    112     # Running garbage collection manually seems to be needed
    113     # to avoid memory growth for some estimators

File ~/Documents/github/lincc/rail_sklearn/src/rail/estimation/algos/k_nearneigh.py:195, in KNearNeighEstimator._process_chunk(self, start, end, data, first)
    192         knn_df.loc[np.isclose(knn_df[col], self.config.nondetect_val), col] = np.float32(self.config.mag_limits[col])
    194 testcolordata = _computecolordata(knn_df, self.config.ref_band, self.config.bands, self.config.only_colors)
--> 195 dists, idxs = self.kdtree.query(testcolordata, k=self.numneigh)
    196 dists += TEENY
    197 test_ens = _makepdf(dists, idxs, self.trainszs, self.sigma)

File sklearn/neighbors/_binary_tree.pxi:1183, in sklearn.neighbors._kd_tree.BinaryTree64.query()

ValueError: query data dimension must match training data dimension

Do I need to change the input data handle after setting only_colors=True in the informer?

sschmidt23 · 2024-09-30T15:56:03Z

I think I set things up so that you had to set the config parameter only_colors=True in both the inform and estimate stage, is it possible that you only set it in one of the two, and thus the model expects a different number of parameters? I could store the boolean for only_colors in the model to ensure that a user can't accidentally use differing values for inform and estimate, if that is less confusing.

If it's not that, then this error could also maybe be caused if you are using a non-default set of names for the magnitudes via bands and do not also supply the dictionary of mag_limits for those bands.

raphaelshirley · 2024-09-30T16:04:26Z

Ah yes thanks, that fixed it setting it True in the estimate stage too.

sschmidt23 · 2024-09-30T18:35:39Z

It probably makes more sense to store the only_colors value in the model, since a user shouldn't really be able to set it in the estimate stage if the model was trained for a specific case. I'll make a PR to do just that and merge in so that you only specify in the inform.

move only_colors to model and remove from estimate #20

sschmidt23 added the enhancement New feature or request label Aug 14, 2024

sschmidt23 self-assigned this Aug 14, 2024

sschmidt23 mentioned this issue Aug 14, 2024

add 'only_colors' option to KNearNeigh #20 #21

Merged

4 tasks

sschmidt23 added a commit that referenced this issue Aug 15, 2024

Merge pull request #21 from LSSTDESC/issue/20/sklearncolor

85ff7c8

add 'only_colors' option to KNearNeigh #20

sschmidt23 added a commit that referenced this issue Oct 2, 2024

Merge pull request #24 from LSSTDESC/issue/20/mvbool

d290f52

move only_colors to model and remove from estimate #20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add option to run KNearNeigh with just colors rather than 1 mag + colors #20

Add option to run KNearNeigh with just colors rather than 1 mag + colors #20

sschmidt23 commented Aug 14, 2024

raphaelshirley commented Sep 30, 2024

sschmidt23 commented Sep 30, 2024

raphaelshirley commented Sep 30, 2024

sschmidt23 commented Sep 30, 2024

Add option to run KNearNeigh with just colors rather than 1 mag + colors #20

Add option to run KNearNeigh with just colors rather than 1 mag + colors #20

Comments

sschmidt23 commented Aug 14, 2024

raphaelshirley commented Sep 30, 2024

sschmidt23 commented Sep 30, 2024

raphaelshirley commented Sep 30, 2024

sschmidt23 commented Sep 30, 2024