Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Type declaration for indices and indptr might lead to dtype mismatch #43

Open
j-i-l opened this issue Sep 2, 2024 · 3 comments
Open

Comments

@j-i-l
Copy link
Collaborator

j-i-l commented Sep 2, 2024

When passing the arguments of a scipy.sparse.csr_matrix to sparse_stoch_from_full_csr we run into a ValueError:

long long[:] Tf_indices,
E ValueError: Buffer dtype mismatch, expected 'long long' but got 'int'

See e.g. here

We can address this by explicitly converting the indices to np.int64 when passing them as arguments, or we might change datat type to int instead of long long in sparse_stochfrom_full_csr (and others).

Note that for some functions in _cython_sparse_stoch.pyx already declare int[:] as data type when indices are passed as arguments.


          We have some issues with the type declarations for `indices` and `indptr`:

https://github.com/alexbovet/flow_stability/actions/runs/10513029536/job/29127746857#step:5:116

It also seems not completely consistent in the defs:

E.g. for indptr some functions require int:

def cython_sparse_stoch_from_full_csr(int[:] nz_rowcols,
double[:] Tf_data,
int[:] Tf_indices,
int[:] Tf_indptr,
double diag_val):

some long long:

def cython_rebuild_nnz_rowcol(double[:] T_data,
long long [:] T_indices,
long long [:] T_indptr,
long long [:] nonzero_indices,
long long size,
double diag_val):

I'd put everything indices and indptr to long and declare with np.int64 on the python site.
@alexbovet is that OK for you or is there actually a use-case for long long?

Originally posted by @j-i-l in #35 (comment)

@j-i-l
Copy link
Collaborator Author

j-i-l commented Sep 2, 2024

#35 (comment)

We might want to add the possibility to install flowstab in "large" data mode where we consistently use long long for all index arrays.

@alexbovet
Copy link
Owner

Yes, that could be a good option.

I remember there was an issue with the connected_components function from scipy.csgraph which cannot use int64 (here. Something to keep in mind.

Also, I think this is a discussion they have at scipy, and other projects.

@j-i-l
Copy link
Collaborator Author

j-i-l commented Sep 16, 2024

In C++ we could simply use template functions which would allow us not having to specify the type of indices and indptr.

I'm unsure if/how this can be done in cython...

Actually, scipy as nice examples of this approach in their sparsetools:

https://github.com/scipy/scipy/blob/9e93f673c4021cc6dc38487dbe8fb2db8cadf131/scipy/sparse/sparsetools/csr.h#L103-L116

That might be a good approach for us: We would only need to set the type in the numpy representation, i.e. np.int32 or np.int64 and the rest would follow... well that would be the idea.

@j-i-l j-i-l added this to the JOSS paper submission milestone Nov 19, 2024
alexbovet pushed a commit that referenced this issue Dec 17, 2024
* adding code coverage

* started SSM init tests

* silence linter

* fixing wrong testing command

* auto-convert to float if needed

* corrected filename

* show memory usage and timeing

* typing

* speed and memory checks

* adding details about the ssm

* using also __init__ docstring

* minor formating corection

* added autodoc ext to sphinx

* fixing python version also for rtd

* configure autoapi instead of autodoc

* separate memory tracking tests with _memory in name

* better typing

* make cython and non-cython implementatios testable

* separating inplace row norm

* try to satisfy long long vs int

* minor - avoid declaring unused params

* target src for coverage

* config for test coverage

* cleaning up

* allow coverage to trace lines

* reporting code coverage

* establish jobs order

* using correct secret

* always run coverage

* run coverage report

* remove plugin for codecov reports

* single folder several files

* specify package location

* using local install

* workaround for hidden files

* site-package install but discover package name

* fixign artifact option

* sepcify source in config to use relative files

* fixing coverage options

* minor edits

* micro

* started with more tests

* testing mkl

* float matrices

* matmul testing

* testing with larger matrices

* testing mkl and blas/lapak

* wip on profiling and testing csr ops

* completing tests for csr methods

* testing with comparison crs_ operations

* show all memory reports

* covering conversion from/to ssm

* formatting docstring - correcting description

* temp. fix for type inconsistency - see #43

* minor formatting

* tests for inplace ops with diag matrix

* minor correction

* bettery typeing

* adding tests for sparse_autocov_mat inits

* adding tests for SAM creation (also from T)

* better docstring formatting

* drop the math mode

* ditch passing by possition

* including more methods for SAM

* started with synthtempnetwork

* adding minimal tests for synth_temp_network subm

* started with flow_stability sub-m

* initiating FlowStability tests

* init tests for parallel_expm

* init tests for temporal_network

* init tests for parallel_clustering

* minor formatting

* minor notes/info

* minor edits; init cython testing

* simple test cases

* better bassis for test data generation

* tests for temp networks

* minor fixes

* scipy upgrade - use sparse_dot_mk for mkl support

* minor formatting

* minor fix in assertion

* cleaning up old tests

* cleaning up all tests for discarded implementations

* fixing the data type for now to double

* adapting lifetime of modified objects

* adapting to new definitions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants