- Removed unnecessary deletion that did not affect outer scope variable
- Removed
ParquetDb_manager
due to an issue
- None identified
- Updated
_version.py
andCHANGELOG.md
for the new release
- Merged updates from the main branch of the repository
- None identified
- Introduced data classes for handling normalization and loading configurations
- Updated
_version.py
andCHANGELOG.md
for the latest release
- Reformatted code
- Removed the
ParquetDBManager
class - Merged updates from the main branch of repository
https://github.com/lllangWV/ParquetDB
- Fixed bug where some
FixedListArrays
were null. - Resolved issue where some rows were null in the method that enforces numeric and boolean list types.
- Introduced a new method for preprocessing incoming tables.
- Modified create and update methods to apply
table_column_callbacks
, allowing users to reconstruct ndarrays more easily.
- Updated
_version.py
andCHANGELOG.md
for the new release. - Enhanced comments for better code readability.
- Updated tests to ensure functionality.
- Moved data generation logic to
general_utils
. - Cleaned up the codebase for improved quality.
- Fixed a bug related to
FixedListArrays
when they were null. - Resolved issues with processing rows that contained null values.
- Introduced a new method for preprocessing the incoming table.
- Added a test for handling nested data across create, read, and update operations.
- Modified create and update functionalities to apply column callbacks for dynamic modifications.
- Enhanced code readability with additional comments.
- Cleaned up code for better organization.
- Updated the method to restrict list types to numeric and Boolean only.
- Moved data generation code to
general_utils
. - Updated
_version.py
andCHANGELOG.md
for the new release.
- Fixed an issue with normalization when handling batch updates.
- Updated the method to correctly handle updates to list fields.
- Reworked the modification process to keep main files untouched, generating and renaming new files as needed.
- Enhanced documentation for the project.
- Updated
_version.py
andCHANGELOG.md
for new releases. - Updated example files.
- Updated
config.yml
. - Updated
.gitignore
. - Merged the latest updates from the main branch.
- Fixed bug in normalization for batch updates.
- Reworked modification handling to preserve main files and create new versions.
- Improved update method to support list field updates.
- Updated documentation to reflect recent changes.
- Updated configuration file (
config.yml
). - Updated
.gitignore
file. - Updated
_version.py
andCHANGELOG.md
for new release. - Merged latest changes from the main branch of the repository.
- Fixed bug in batch updates resulting in incorrect chunked arrays from columns. Updated record batches to ensure casting to the incoming schema is applied correctly.
- Introduced capability to delete columns.
- Added new data generation methods.
- Implemented new benchmarks for performance evaluation.
- Added utility functions for matplotlib.
- Included benchmark scripts for SQLite, MongoDB, and ParquetDB.
- Enabled rebuilding of nested tables from a flattened structure in the read method.
- Updated the README to include a section for benchmark overview.
- Improved HTML for embedding PDFs in the README.
- Updated CHANGELOG.md and _version.py for the new release.
- Corrected default
normalize_kwargs
. - Optimized the update table method for a 5x speed increase.
- Moved default normalization parameters to config.yml.
- Renamed directory to benchmarks and changed PDF files to PNG format.
- Updated development dependencies.
- Fixed bug in batch updates where generated column data did not produce chunked arrays. Ensured record batches are cast to the incoming schema properly.
- Introduced new benchmarks for performance evaluation.
- Added new data generation methods.
- Implemented column deletion functionality.
- Enhanced create and update methods to efficiently handle various input types (pylists, pydicts, pd.DataFrame, and pa.lib.Table).
- Added an option to rebuild nested tables from a flattened structure in the read method.
- Developed benchmark scripts for databases including SQLite, MongoDB, and ParquetDB.
- Added matplotlib utilities for visualization.
- Updated README.md to include a benchmark overview and embedded PDF section.
- Revised dev dependencies information.
- Improved update table method, enhancing performance by five times.
- Reorganized the directory structure for benchmarks.
- Moved default normalization parameters to config.yml.
- Updated _version.py and CHANGELOG.md for the latest release.
- None identified
- None identified
- Updated
_version.py
andCHANGELOG.md
for the new release
- Improved workflow scripts
- Merged latest changes from the main branch of the repository
- None identified
- None identified
- Updated the workflow script
- Updated
_version.py
andCHANGELOG.md
for the new release
- None identified
- Enhanced the logging mechanism in tests with separate loggers.
- Improved methods for creating, updating, and deleting schemas to support batch operations.
- Updated example for clarity.
- Merged changes from the main branch of the repository.
- Removed unnecessary development scripts.
- Deleted obsolete file.
- Excluded
dev_scripts
from.gitignore
. - Updated
_version.py
andCHANGELOG.md
for the new release.
- None identified
- Introduced a new storage method that flattens all nested structures into a single table and sorts columns alphabetically, enhancing performance.
- Updated
_version.py
andCHANGELOG.md
to reflect the new release.
- Improved
.gitignore
to excludedev_scripts
. - Refined tests.
- Revised example script for clarity.
- None identified
- Added
__version__
import to the Parquet module
- Updated
_version.py
andCHANGELOG.md
for the new release
- Merged changes from the main branch of the repository
- Fixed a bug in external_utils that prevented automatic creation of source and destination directories.
- Added a development script to handle ordering of nested fields.
- Updated dependencies to include 'requests'.
- Updated _version.py and CHANGELOG.md for the new release.
- Improved the workflow script to include test building of the package before pushing the version and changelog.
- Updated the workflow script for better branch management after pushing changes.
- Fixed a bug related to the old method of aligning the table with the new schema, resulting in improved performance.
- Updated example functionality to enhance usability.
- Updated _version.py and CHANGELOG.md for the new release.
- Merged changes from the main branch to ensure project stays up-to-date.
- Fixed issue with database row order inconsistency in tests.
- Corrected incorrect input provided during schema alignment in the create statement.
- Introduced a new utility function for table manipulation and empty table generation.
- Added methods for merging schemas with built-in functions.
- Enhanced documentation with detailed docstrings for several functions.
- Updated the README with relevant information.
- Refactored multiple methods for optimization, including
create
,update
, anddelete
. - Improved developer scripts and added new scripts for schema merging.
- Updated
.gitignore
and revision files like_version.py
andCHANGELOG.md
.
- None identified
- Introduced a new example demonstrating the use of
ParquetDatasetDB
with 4 million structures and highlighted some reading capabilities. - Added an option to create normalization methods that optimize performance by ensuring a consistent number of rows in dataset files.
- Implemented a script for running all tests in the test directory.
- Updated
_version.py
andCHANGELOG.md
for the new release. - Consolidated logging and configuration management into a single config object.
- Added BeautifulSoup as a dependency for examples.
- Moved old examples to the
dev_scripts/examples
directory. - Rearranged and improved the structure of
dev_scripts
. - Updated changelog script and configuration for the timing logger.
- Merged updates from the main branch of the repository.
- None identified
- Introduced
ParquetDB
, the core class, withParquetDBManager
to facilitate management of multiple independent datasets. - Added
beautifulsoup
as a new dependency for example usage. - Implemented an example demonstrating the usage of
ParquetDatasetDB
to write 4 million structures and associated read capabilities. - Added the ability to create normalization methods that optimize performance by ensuring consistent row counts in dataset directories.
- Enhanced the
merge_schema
function withtimeit
to measure performance during execution.
- Updated the changelog script and revised
_version.py
andCHANGELOG.md
for the new release.
- Added todos for future enhancement, specifically to remove raw table read and write operations for batch compatibility.
- Moved outdated examples to the
dev_scripts/examples
directory. - Adjusted default values for normalization.
- Consolidated
logging_config
andconfig
into a unified base objectLoggingConfig
. - Added a script to run tests in the test directory.
- Rearranged and organized the
dev_scripts
directory. - Merged changes from the main branch of the remote repository.
- Made changes so read will no always return an empty table or batch generator if the filtering or column selection fails
- Removed print statement
- Added config class. Now users can change configs by importing parquetdb; parquetdb.config.root_dir='path/to/dir' or change the logging parquetdb.logging_config.loggers.parquetdb.level='Debug'; parquetdb.logging_config.apply()
- New dev scripts
- Updated _version.py and CHANGELOG.md due to new release
- Removed logging from tests
- Merge branch 'main' of https://github.com/lllangWV/ParquetDB into main
- Improved support for nested struct types to prevent issues with empty dictionaries.
- Enhanced
merge_tables
function and added more utility functions for manipulatingpa.struct
types.
- [No changes]
- Updated README.md for clarity.
- Changed
table_name
todataset_name
in the README.md. - Made deployment workflow agnostic to repository and package name.
- Updated
env_dev.yml
andenv.yml
. - Updated package directory structure by moving
parquetdb
andparquet_datasetdb
tocore
and creating autils
folder withgeneral_utils
. - Updated
_version.py
andCHANGELOG.md
due to new release.
- Bug fix: forgot to pass GitHub token and repo_name as env var
- Bug fix in workflow: got the wrong git commits
- None identified
- Updated _version.py and CHANGELOG.md due to new release (noted twice)
- Merge branch 'main' of https://github.com/lllangWV/ParquetDB into main (noted twice)
- None identified
- None identified
- None identified
- No changes