Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scope out/write pseudocode for network-based sampling #9

Open
2 tasks
ChesterHuynh opened this issue Mar 2, 2021 · 4 comments
Open
2 tasks

Scope out/write pseudocode for network-based sampling #9

ChesterHuynh opened this issue Mar 2, 2021 · 4 comments
Assignees
Labels
planning Exploration, research, or sketching pseudocode

Comments

@ChesterHuynh
Copy link
Collaborator

ChesterHuynh commented Mar 2, 2021

SUMMARY

Here, we scope out the code to implement any prior biases in multi-variate data. Take X = (X_1, X_2, X_3, ...), where
each X_i has dimensionality, d_i. Then we might have apriori notions of how "correlated" samples are in each axis.

For example:

TODO

Look into SPORF source code:

  • Sketch out where new methods would need to be added
  • Sketch out what these new methods would entail
@ChesterHuynh ChesterHuynh added the planning Exploration, research, or sketching pseudocode label Mar 2, 2021
@adam2392
Copy link
Owner

adam2392 commented Mar 10, 2021

So as a pass in from Python side:

Inputs

sampling_graphs = (G1, G2, ...)
sampling_patch_dimensions = (...)  # "Analagous" to PATCH_HEIGHT/PATCH_WIDTH min/max

-> G1.shape = dimension of variable /axis 0
-> G2.shape = dimension of variable/axis 1
...

C++ psuedo algorithm

- pick arbitrary point in data matrix
- determine which index (i.e. row) in the sampling graphs (G1, G2, ...) this point falls in
- according to patch dimensions, perform samplking

Summary of Proposed Work

I think to make things backwards compatiable for the sake of comparing, for now, we can assume we're trying to add a new projection_matrix variant: G-Rerf (i.e.... graph sampling RERF?) heh.

List of files in Python:

  1. In Python/rerf/rerfClassifer.py: Alter the inputs to have the following: sampling_graphs and sampling_patch_min and sampling_patch_max. To make "it backwards compatabile" for now, we can keep the image width/height and patch height/width parameters but raise an error if they are both passed in(?)

Errors and Python backend computations:

  • The sampling_graphs and sampling_patch_min/max arguments MUST have the same length, or raise error.
  • The graphs in sampling_graphs must all be of square and symmetric form.
  • Moreover, they must be row-wise stochastic (rows add up to 1); we can normalize the rows if they are not and raise a warning, or just error altogether and require user to normalize it (i'm leaning towards error and make user aware).
  • Moreover, one should be able to derive the "data shape" from the size of the sampling graphs:
shape_arr = []
for graph in sampling_graphs:
    shape_arr.append(graph.shape[0])

These shapes should be error checked during fit(X, y) if the projection_matrix is G-Rerf, that the shape_arr should be summed to the number of columns in X. In addition, we should assume that the columns of X are flattened in the same way the images are currently flattened (I think it's column-wise default in np.flatten?).

  1. Any data that needs to then be passed down to C++: in RerF.py, it seems, one can do something like forestClass.setParameters("...", X); idk if it accepts numpy datatypes? However, we need to expose the ability to pass this into C++ side.

List of files in C++:

  1. Getting parameters to the fpForest class via setParameter in src/baseFunctions/fpForest.h:73-85:
				inline void setParameter(const std::string& parameterName, const std::string& parameterValue){
					fpSingleton::getSingleton().setParameter(parameterName, parameterValue);	
				}


				inline void setParameter(const std::string& parameterName, const double parameterValue){
					fpSingleton::getSingleton().setParameter(parameterName, parameterValue);	
				}


				inline void setParameter(const std::string& parameterName, const int parameterValue){
					fpSingleton::getSingleton().setParameter(parameterName, parameterValue);	
				}

Once we have the "parameters" successfully set in the fpSingletonClass, we can then hypothetically access it easily. Potentially it can be done via: https://stackoverflow.com/questions/30388170/sending-a-c-array-to-python-and-back-extending-c-with-numpy

  1. Possibly add checks at the C++ level, which should be more easy once we get the numbers into C++ array: packedForest/src/fpSingleton/fpSingleton.h. This might involve checking the formatting of the arrays, datatypes, etc.

  2. Finally implement the new sampling procedure in packedForest/src/forestTypes/binnedTree/processingNodeBin.h, for G-RerF.

Test Scenario

Specify a two-node connected, or disconnected graph.
The signal is generated as 1's and -1's, where X = (X_1, X_2). The classes are then:

  • class 1: X_1 = 1, X_2 = -1
  • class 2: X_1 = -1, X_2 = 1
  1. connected, sampling_dim_min/max = 2 / 2: should be 50% accuracy
  2. connected, sampling_dim_min/max = 1, 2: should get to 100% accuracy
  3. disconnected, sampling_dim_min/max = 1, 2: should get to 100% accuracy
  4. disconnected, sampling_dim_min/max = 2,2; should get 100% accuracy

Application Scenarios

  • thresholded euclidean pairwise distance graphs in channel axis
  • thresholded nearby points in frequency axis
  • exponentially decaying weight functions on the euclidean distance (A_ij = exp(-||x_i - x_j||) Euclidean distance)

All scenarios that "add" your prior belief about how the data might be correlated

@adam2392
Copy link
Owner

More info on the topic of passing numpy arrays from Python -> C++ via pybind11:

@ChesterHuynh
Copy link
Collaborator Author

Re: Last point of 1. Are we expecting a graph for each time step? I.e. will sampling graphs be a list/array of matrices?

shape_arr = []
for graph in sampling_graphs:
    shape_arr.append(graph.shape[0])

If so, then the above snippet makes sense. Otherwise, need a little more clarification

@adam2392
Copy link
Owner

Yep a list/array of matrices. So we should expect a graph for each time step, and possibly... we can allow for the "default" SRerf functionality by passing in None? Idk we can play around w/ it as long as it's simple. E.g.

sampling_graphs = (arr_1, None, arr_2) would impose sampling graph in axis 0 and 2 but use Srerf in axis 1 (i.e. nearest neighbors in a patch).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
planning Exploration, research, or sketching pseudocode
Projects
None yet
Development

No branches or pull requests

2 participants