Skip to content

Commit

Permalink
Merge branch 'master' of https://github.com/datajoint/datajoint-python
Browse files Browse the repository at this point in the history
…into hidden-attr-alt
  • Loading branch information
A-Baji committed Nov 16, 2023
2 parents 4c4bac5 + b63900b commit 1edecda
Show file tree
Hide file tree
Showing 148 changed files with 6,655 additions and 1,880 deletions.
5 changes: 5 additions & 0 deletions .codespellrc
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
[codespell]
skip = .git,*.pdf,*.svg,*.csv,*.ipynb,*.drawio
# Rever -- nobody knows
# numer -- numerator variable
ignore-words-list = rever,numer
10 changes: 10 additions & 0 deletions .github/workflows/development.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,16 @@ jobs:
black datajoint --check -v
black tests --check -v
black tests_old --check -v
codespell:
name: Check for spelling errors
permissions:
contents: read
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v3
- name: Codespell
uses: codespell-project/actions-codespell@v2
publish-docs:
if: |
github.event_name == 'push' &&
Expand Down
19 changes: 19 additions & 0 deletions .github/workflows/docs.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
name: Manual docs release
on:
workflow_dispatch:
jobs:
publish-docs:
runs-on: ubuntu-latest
env:
DOCKER_CLIENT_TIMEOUT: "120"
COMPOSE_HTTP_TIMEOUT: "120"
steps:
- uses: actions/checkout@v3
- name: Deploy docs
run: |
export MODE=BUILD
export PACKAGE=datajoint
export UPSTREAM_REPO=https://github.com/${GITHUB_REPOSITORY}.git
export HOST_UID=$(id -u)
docker compose -f docs/docker-compose.yaml up --exit-code-from docs --build
git push origin gh-pages
13 changes: 11 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,14 @@
## Release notes

### Upcoming
- Added - Codespell GitHub Actions workflow
- Added - GitHub Actions workflow to manually release docs
- Changed - Update `datajoint/nginx` to `v0.2.6`
- Changed - Migrate docs from `https://docs.datajoint.org/python` to `https://datajoint.com/docs/core/datajoint-python`
- Fixed - Updated set_password to work on MySQL 8 - PR [#1106](https://github.com/datajoint/datajoint-python/pull/1106)
- Added - Missing tests for set_password - PR [#1106](https://github.com/datajoint/datajoint-python/pull/1106)
- Changed - Returning success count after the .populate() call - PR [#1050](https://github.com/datajoint/datajoint-python/pull/1050)

### 0.14.1 -- Jun 02, 2023
- Fixed - Fix altering a part table that uses the "master" keyword - PR [#991](https://github.com/datajoint/datajoint-python/pull/991)
- Fixed - `.ipynb` output in tutorials is not visible in dark mode ([#1078](https://github.com/datajoint/datajoint-python/issues/1078)) PR [#1080](https://github.com/datajoint/datajoint-python/pull/1080)
Expand Down Expand Up @@ -31,7 +40,7 @@
- Fixed - Fix queries with backslashes ([#999](https://github.com/datajoint/datajoint-python/issues/999)) PR [#1052](https://github.com/datajoint/datajoint-python/pull/1052)

### 0.13.7 -- Jul 13, 2022
- Fixed - Fix networkx incompatable change by version pinning to 2.6.3 (#1035) PR #1036
- Fixed - Fix networkx incompatible change by version pinning to 2.6.3 (#1035) PR #1036
- Added - Support for serializing numpy datetime64 types (#1022) PR #1036
- Changed - Add traceback to default logging PR #1036

Expand Down Expand Up @@ -83,7 +92,7 @@
- Fixed - `schema.list_tables()` is not topologically sorted (#838) PR #893
- Fixed - Diagram part tables do not show proper class name (#882) PR #893
- Fixed - Error in complex restrictions (#892) PR #893
- Fixed - WHERE and GROUP BY clases are dropped on joins with aggregation (#898, #899) PR #893
- Fixed - WHERE and GROUP BY classes are dropped on joins with aggregation (#898, #899) PR #893

### 0.13.0 -- Mar 24, 2021
- Re-implement query transpilation into SQL, fixing issues (#386, #449, #450, #484, #558). PR #754
Expand Down
31 changes: 20 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,18 @@

# Welcome to DataJoint for Python!

DataJoint for Python is a framework for scientific workflow management based on relational principles. DataJoint is built on the foundation of the relational data model and prescribes a consistent method for organizing, populating, computing, and querying data.

DataJoint was initially developed in 2009 by Dimitri Yatsenko in Andreas Tolias' Lab at Baylor College of Medicine for the distributed processing and management of large volumes of data streaming from regular experiments. Starting in 2011, DataJoint has been available as an open-source project adopted by other labs and improved through contributions from several developers.
Presently, the primary developer of DataJoint open-source software is the company DataJoint (https://datajoint.com).
DataJoint for Python is a framework for scientific workflow management based on
relational principles. DataJoint is built on the foundation of the relational data
model and prescribes a consistent method for organizing, populating, computing, and
querying data.

DataJoint was initially developed in 2009 by Dimitri Yatsenko in Andreas Tolias' Lab at
Baylor College of Medicine for the distributed processing and management of large
volumes of data streaming from regular experiments. Starting in 2011, DataJoint has
been available as an open-source project adopted by other labs and improved through
contributions from several developers.
Presently, the primary developer of DataJoint open-source software is the company
DataJoint (https://datajoint.com).

## Data Pipeline Example

Expand All @@ -18,7 +26,13 @@ Presently, the primary developer of DataJoint open-source software is the compan

## Getting Started

- Install from PyPI
- Install with Conda

```bash
conda install -c conda-forge datajoint
```

- Install with pip

```bash
pip install datajoint
Expand All @@ -33,9 +47,4 @@ Presently, the primary developer of DataJoint open-source software is the compan
- Contribute
- [Development Environment](https://datajoint.com/docs/core/datajoint-python/latest/develop/)

- [Guidelines](https://datajoint.com/docs/community/contribute/)

- Legacy Resources (To be replaced by above)
- [Documentation](https://docs.datajoint.org)

- [Tutorials](https://tutorials.datajoint.org)
- [Guidelines](https://datajoint.com/docs/about/contribute/)
2 changes: 1 addition & 1 deletion datajoint/__init__.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
"""
DataJoint for Python is a framework for building data piplines using MySQL databases
DataJoint for Python is a framework for building data pipelines using MySQL databases
to represent pipeline structure and bulk storage systems for large objects.
DataJoint is built on the foundation of the relational data model and prescribes a
consistent method for organizing, populating, and querying data.
Expand Down
12 changes: 10 additions & 2 deletions datajoint/admin.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
import pymysql
from getpass import getpass
from packaging import version
from .connection import conn
from .settings import config
from .utils import user_choice
Expand All @@ -14,9 +15,16 @@ def set_password(new_password=None, connection=None, update_config=None):
new_password = getpass("New password: ")
confirm_password = getpass("Confirm password: ")
if new_password != confirm_password:
logger.warn("Failed to confirm the password! Aborting password change.")
logger.warning("Failed to confirm the password! Aborting password change.")
return
connection.query("SET PASSWORD = PASSWORD('%s')" % new_password)

if version.parse(
connection.query("select @@version;").fetchone()[0]
) >= version.parse("5.7"):
# SET PASSWORD is deprecated as of MySQL 5.7 and removed in 8+
connection.query("ALTER USER user() IDENTIFIED BY '%s';" % new_password)
else:
connection.query("SET PASSWORD = PASSWORD('%s')" % new_password)
logger.info("Password updated.")

if update_config or (
Expand Down
181 changes: 101 additions & 80 deletions datajoint/autopopulate.py
Original file line number Diff line number Diff line change
Expand Up @@ -118,7 +118,7 @@ def _job_key(self, key):

def _jobs_to_do(self, restrictions):
"""
:return: the query yeilding the keys to be computed (derived from self.key_source)
:return: the query yielding the keys to be computed (derived from self.key_source)
"""
if self.restriction:
raise DataJointError(
Expand Down Expand Up @@ -180,6 +180,9 @@ def populate(
to be passed down to each ``make()`` call. Computation arguments should be
specified within the pipeline e.g. using a `dj.Lookup` table.
:type make_kwargs: dict, optional
:return: a dict with two keys
"success_count": the count of successful ``make()`` calls in this ``populate()`` call
"error_list": the error list that is filled if `suppress_errors` is True
"""
if self.connection.in_transaction:
raise DataJointError("Populate cannot be called during a transaction.")
Expand Down Expand Up @@ -222,49 +225,62 @@ def handler(signum, frame):

keys = keys[:max_calls]
nkeys = len(keys)
if not nkeys:
return

processes = min(_ for _ in (processes, nkeys, mp.cpu_count()) if _)

error_list = []
populate_kwargs = dict(
suppress_errors=suppress_errors,
return_exception_objects=return_exception_objects,
make_kwargs=make_kwargs,
)
success_list = []

if processes == 1:
for key in (
tqdm(keys, desc=self.__class__.__name__) if display_progress else keys
):
error = self._populate1(key, jobs, **populate_kwargs)
if error is not None:
error_list.append(error)
else:
# spawn multiple processes
self.connection.close() # disconnect parent process from MySQL server
del self.connection._conn.ctx # SSLContext is not pickleable
with mp.Pool(
processes, _initialize_populate, (self, jobs, populate_kwargs)
) as pool, (
tqdm(desc="Processes: ", total=nkeys)
if display_progress
else contextlib.nullcontext()
) as progress_bar:
for error in pool.imap(_call_populate1, keys, chunksize=1):
if error is not None:
error_list.append(error)
if display_progress:
progress_bar.update()
self.connection.connect() # reconnect parent process to MySQL server
if nkeys:
processes = min(_ for _ in (processes, nkeys, mp.cpu_count()) if _)

populate_kwargs = dict(
suppress_errors=suppress_errors,
return_exception_objects=return_exception_objects,
make_kwargs=make_kwargs,
)

if processes == 1:
for key in (
tqdm(keys, desc=self.__class__.__name__)
if display_progress
else keys
):
status = self._populate1(key, jobs, **populate_kwargs)
if status is True:
success_list.append(1)
elif isinstance(status, tuple):
error_list.append(status)
else:
assert status is False
else:
# spawn multiple processes
self.connection.close() # disconnect parent process from MySQL server
del self.connection._conn.ctx # SSLContext is not pickleable
with mp.Pool(
processes, _initialize_populate, (self, jobs, populate_kwargs)
) as pool, (
tqdm(desc="Processes: ", total=nkeys)
if display_progress
else contextlib.nullcontext()
) as progress_bar:
for status in pool.imap(_call_populate1, keys, chunksize=1):
if status is True:
success_list.append(1)
elif isinstance(status, tuple):
error_list.append(status)
else:
assert status is False
if display_progress:
progress_bar.update()
self.connection.connect() # reconnect parent process to MySQL server

# restore original signal handler:
if reserve_jobs:
signal.signal(signal.SIGTERM, old_handler)

if suppress_errors:
return error_list
return {
"success_count": sum(success_list),
"error_list": error_list,
}

def _populate1(
self, key, jobs, suppress_errors, return_exception_objects, make_kwargs=None
Expand All @@ -275,55 +291,60 @@ def _populate1(
:param key: dict specifying job to populate
:param suppress_errors: bool if errors should be suppressed and returned
:param return_exception_objects: if True, errors must be returned as objects
:return: (key, error) when suppress_errors=True, otherwise None
:return: (key, error) when suppress_errors=True,
True if successfully invoke one `make()` call, otherwise False
"""
make = self._make_tuples if hasattr(self, "_make_tuples") else self.make

if jobs is None or jobs.reserve(self.target.table_name, self._job_key(key)):
self.connection.start_transaction()
if key in self.target: # already populated
if jobs is not None and not jobs.reserve(
self.target.table_name, self._job_key(key)
):
return False

self.connection.start_transaction()
if key in self.target: # already populated
self.connection.cancel_transaction()
if jobs is not None:
jobs.complete(self.target.table_name, self._job_key(key))
return False

logger.debug(f"Making {key} -> {self.target.full_table_name}")
self.__class__._allow_insert = True
try:
make(dict(key), **(make_kwargs or {}))
except (KeyboardInterrupt, SystemExit, Exception) as error:
try:
self.connection.cancel_transaction()
if jobs is not None:
jobs.complete(self.target.table_name, self._job_key(key))
except LostConnectionError:
pass
error_message = "{exception}{msg}".format(
exception=error.__class__.__name__,
msg=": " + str(error) if str(error) else "",
)
logger.debug(
f"Error making {key} -> {self.target.full_table_name} - {error_message}"
)
if jobs is not None:
# show error name and error message (if any)
jobs.error(
self.target.table_name,
self._job_key(key),
error_message=error_message,
error_stack=traceback.format_exc(),
)
if not suppress_errors or isinstance(error, SystemExit):
raise
else:
logger.debug(f"Making {key} -> {self.target.full_table_name}")
self.__class__._allow_insert = True
try:
make(dict(key), **(make_kwargs or {}))
except (KeyboardInterrupt, SystemExit, Exception) as error:
try:
self.connection.cancel_transaction()
except LostConnectionError:
pass
error_message = "{exception}{msg}".format(
exception=error.__class__.__name__,
msg=": " + str(error) if str(error) else "",
)
logger.debug(
f"Error making {key} -> {self.target.full_table_name} - {error_message}"
)
if jobs is not None:
# show error name and error message (if any)
jobs.error(
self.target.table_name,
self._job_key(key),
error_message=error_message,
error_stack=traceback.format_exc(),
)
if not suppress_errors or isinstance(error, SystemExit):
raise
else:
logger.error(error)
return key, error if return_exception_objects else error_message
else:
self.connection.commit_transaction()
logger.debug(
f"Success making {key} -> {self.target.full_table_name}"
)
if jobs is not None:
jobs.complete(self.target.table_name, self._job_key(key))
finally:
self.__class__._allow_insert = False
logger.error(error)
return key, error if return_exception_objects else error_message
else:
self.connection.commit_transaction()
logger.debug(f"Success making {key} -> {self.target.full_table_name}")
if jobs is not None:
jobs.complete(self.target.table_name, self._job_key(key))
return True
finally:
self.__class__._allow_insert = False

def progress(self, *restrictions, display=False):
"""
Expand Down
2 changes: 1 addition & 1 deletion datajoint/blob.py
Original file line number Diff line number Diff line change
Expand Up @@ -449,7 +449,7 @@ def pack_dict(self, d):
)

def read_struct(self):
"""deserialize matlab stuct"""
"""deserialize matlab struct"""
n_dims = self.read_value()
shape = self.read_value(count=n_dims)
n_elem = np.prod(shape, dtype=int)
Expand Down
Loading

0 comments on commit 1edecda

Please sign in to comment.