Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[r] Make create and open into non-instance factory methods #1008

Closed
johnkerl opened this issue Feb 27, 2023 · 1 comment
Closed

[r] Make create and open into non-instance factory methods #1008

johnkerl opened this issue Feb 27, 2023 · 1 comment

Comments

@johnkerl
Copy link
Member

johnkerl commented Feb 27, 2023

Context: R/Python parity as tracked by #839.

In the current Python implementation we have:

  • sdf = soma.DataFrame.create(...) and sdf = soma.DataFrame.open(...)
  • sdf.write(...)

The former two are Python class methods (in particular, non-instance methods): the thing before the . is the class name. The latter is a Python instance method: the thing before the . is the name of an instance of the class.

R6, by contrast, only has the latter and not the former.

The best option seems to be to use top-level function names like SOMADataFrameCreate and SOMADataFrameOpen, etc.. This (moreover) mimics Seurat's Load_10X (note the leading capital L).

See also #1009 and #1010 for follow-on work.


Some more wording worked out between @eddelbuettel and @johnkerl :

  • What
    • We're in need of free-standing helpers that put a shim on the existing functionality and return such R6 objects by hiding some 'details' we need but don't want to expose. These can be called "free-standing helpers" or "non-instance factory methods" -- whatever we call them, say SOMADataFrameCreate and SOMADataFrameOpen etc., they will (a) do the creating or the opening, (b) invoke $new(), and (c) return that to the caller.
    • What was done in the TileDB-SOMA Python API, and what we should probably do here for the TileDB-SOMA R API, is make sure $new() can't be called directly by users, and if they do, give them an informative message telling them to use the create/open methods. Example here: https://github.com/single-cell-data/TileDB-SOMA/blob/1.0.0/apis/python/src/tiledbsoma/_tiledb_object.py#L90-L95
  • Sequencing of code mods
    • Most important is user-impacting syntax:
      • SOMADataFrameCreate, SOMADataFrameOpen
      • SOMASparseNDArrayCreate, SOMASparseNDArrayOpen
      • SOMADenseNDArrayCreate, SOMADenseNDArrayOpen
      • SOMACollectionCreate, SOMACollectionOpen
      • SOMAMeasurementCreate, SOMAMeasurementOpen
      • SOMAExperimentCreate, SOMAExperimentOpen
      • SOMAOpen should check an object's metadata and do return SOMATypeNameGoesHereOpen for the corresponding type -- this is [r] Have a general-purpose opener #1009
      • $close() needs to be exposed -- this is issue [r] Make close methods public #1010 -- that work could be done on this issue if you prefer
      • $new() should be informatively blocked as described above
    • Something that can be done over time but needn't be done straightaway:
      • Do more caching for objects opened in read mode -- profiling/benchmarking can help guide us here
      • Benefits will be less obvious for working on small local-disk data but will become more salient for working on larger datasets which are stored remotely
  • Implementation: this means a tiledb handle (for writes) or a SOMAReader instance (for array reads) needs to be stored within a given object
    • This is what obj$close() must close
    • In Python the semantics are that writes aren't finalized until obj$close() and that should apply here as well
  • Why:
    • API parity with tiledbsoma-py
    • Once we have things explicitly opened for read or explicitly opened for write at the externally visible tiledbsoma API level, then at the internal tiledb/libtiledbsoma level we won't have to do open/close on arrays/group so often -- this will help performance. Another benefit of this is that if an object is opened for read, then we can confidently cache things about it -- e.g. an array's metadata, or a group's names-to-uris mapping, etc
@johnkerl johnkerl added the r-api label Feb 27, 2023
@johnkerl johnkerl changed the title [r] Make create and open non-instance factory methods [r] Make create and open into non-instance factory methods Feb 27, 2023
@eddelbuettel
Copy link
Contributor

Done in #1123

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants