Releases: activeloopai/deeplake
v2.0.12 🌈
🧭 What's Changed
- Fix segmentation fault (#1248) @aliubimov
- added pull_request_target check (#1246) @gautamkrishnar
- removed repository name from pr checks (#1245) @gautamkrishnar
- Added reporting to ingest, ingest_kaggle, and list (#1242) @istranic
- PyTorch & Tensorflow aliases for datasets (#1238) @muhdhisham
- added manual benchmarking for hub baseline (#1224) @gautamkrishnar
- Added benchmarking workflow to hub pull requests (#1218) @gautamkrishnar
- Fix/sagemaker creds (#1204) @AbhinavTuli
- Fix Getting Started link (#1237) @aliubimov
- Fix VC test (#1233) @farizrahman4u
- Add flac and wav support (#1220) @FayazRahman
- Fix pickle5 issue (#1230) @AbhinavTuli
- Fix grayscale tests (#1222) @farizrahman4u
- Fixes issues with kaggle tests after audio merge (#1231) @AbhinavTuli
- Beautify meta .json files (#1229) @jakbin
- Update README.md (#1194) @istranic
- Audio (#1207) @farizrahman4u
- Bring back "Linux Timple Test" (#1203) @Diveafall
- AL-1368 Handle grayscale image after color image (#1206) @jraman
- Added reporting for version control (#1209) @istranic
- [tiny] Update error messages (#1211) @mccrearyd
- Distributed computation with ray (#1140) @AbhinavTuli
- Revert "Adding shared memory support for Pytorch 3.6/3.7, speeding up pytorch integration" (#1210) @AbhinavTuli
- AL-1433 List compression types for hub.compression (#1208) @jraman
- Adding shared memory support for Pytorch 3.6/3.7, speeding up pytorch integration (#1196) @AbhinavTuli
- lz4 backward compatibility (#1201) @FayazRahman
- Update docs for hub.ingest (#1199) @jraman
- Removing "Linux Simple Test" from all workflows (#1198) @Diveafall
- Change lz4 implementation to numcodecs (#1197) @FayazRahman
- Revert "PyTorch & Tensorflow aliases for datasets" (#1195) @mccrearyd
- PyTorch & Tensorflow aliases for datasets (#1183) @muhdhisham
⚙️ Who Contributes
@AbhinavTuli, @Diveafall, @FayazRahman, @aliubimov, @davidbuniat, @farizrahman4u, @gautamkrishnar, @istranic, @jakbin, @jraman, @mccrearyd, @mikayelh, @muhdhisham and @tatevikh
Google Cloud Support and Hierarchical Datasets 🌈
🩰 What's New
- Added hierarchical datasets (tensor groups)
- Added support for Google Cloud Storage (
path = gcs://...
) - Added version control alpha
🧭 What's Changed
- Fix version control speed for first commit (#1192) @AbhinavTuli
- Missing compressions fix (#1191) @farizrahman4u
- Tensor group fixes (#1190) @farizrahman4u
- gcs path fix (#1188) @davidbuniat
- Version Control (#1152) @AbhinavTuli
- One more jpg fix (#1186) @farizrahman4u
- fix running tests fails when no gcs creds exist (#1187) @mccrearyd
- Istranic update readme (#1185) @istranic
- GCS support (#1125) @kristinagrig06
- [Small] Fixes logging (#1153) @AbhinavTuli
- Modify dataset privacy (#1141) @kristinagrig06
- Imagenet weird jpegs fix (#1155) @farizrahman4u
- Hierarchical Tensors (#1139) @farizrahman4u
- [small] client/utils.py: Dont sys.exit on Bad request (#1154) @farizrahman4u
- Add Tensor to API reference (#1146) @kristinagrig06
- [small-Please approve] fix typo (#1144) @thisiseshan
⚙️ Who Contributes
@AbhinavTuli, @davidbuniat, @farizrahman4u, @istranic, @kristinagrig06, @mccrearyd, @tatevikh and @thisiseshan
v2.0.9 🌈
🧭 What's Changed
- Multi - SOF jpeg fix (#1134) @farizrahman4u
- Pytorch shuffling (#1122) @AbhinavTuli
- DS locking: atexit release (#1138) @farizrahman4u
- Treat size 1 arrays as scalars when casting (#1135) @farizrahman4u
- Tensor inplace ops (#1136) @farizrahman4u
- Small fix (#1133) @farizrahman4u
- Faster deserialization (#1131) @farizrahman4u
- hub.read speedup (4x) (#1126) @farizrahman4u
- Delete .from_path() method (#1128) @kristinagrig06
- [small] better error messages + fix ingest_kaggle (#1132) @mccrearyd
- DS lock fix (#1129) @farizrahman4u
- auto refinement (#1124) @thisiseshan
- Chunk-wise compression (#1093) @farizrahman4u
- Ingestion summary (#1117) @thisiseshan
- hub.read speed up (>6X) (#1120) @farizrahman4u
- Dataset locking (#1119) @farizrahman4u
- Backward compat fix (#1121) @farizrahman4u
- backwards compatibility (and CI tests!) (#1110) @mccrearyd
- fix pytorch tests chunk sizes + max_chunk_size in
create_tensor
(#1114) @mccrearyd - Auto compression (#1109) @thisiseshan
- add --kaggle functionality (#1108) @thisiseshan
- Fixes an issue with pytorch old read only (#1113) @AbhinavTuli
- Add Dataset functions to docs (#1098) @kristinagrig06
- [small]
hub.htypes
is a list of all htypes (#1104) @mccrearyd
⚙️ Who Contributes
@AbhinavTuli, @davidbuniat, @farizrahman4u, @kristinagrig06, @mccrearyd, @tatevikh and @thisiseshan
Fixes to parallel computing 🌈
🧭 What's Changed
- Fix/num workers 0 bug (#1112) @davidbuniat
- Move hub.compute reporting to pipeline eval (#1116) @istranic
- Updates transform docstrings/variables to remove usage of the word "transform" (#1115) @AbhinavTuli
- Fixed bugout reporting paths and overreporting during Dataset initialization. (#1111) @davidbuniat
- Release/2.0.6 (#1107) @davidbuniat
- Faster chunk serialization (#1106) @AbhinavTuli
- Remove default logging code in client (#1097) @benchislett
⚙️ Who Contributes
@AbhinavTuli, @benchislett, @davidbuniat, @istranic and @mccrearyd
Refactors and Minor Updates
🩰 What's New
- Added hub.ingest for automatic creation of datasets
- Added hub.list to help users find publicly available datasets
- Lots of refactors to help developers
🧭 What's Changed
- kaggle argument fix (#1101) @thisiseshan
- Polish top directory (#1054) @kristinagrig06
- Ban dataset attributes as tensor names (#1103) @kristinagrig06
- update tensors (#1089) @mccrearyd
- More sample compressions (#1087) @farizrahman4u
- [small] all scalars have shape (1,) instead of () (#1102) @mccrearyd
- refactor input pipeline for samples (#1099) @mccrearyd
- Integrate hub auto + kaggle (#1075) @thisiseshan
- List datasets (#1048) @kristinagrig06
⚙️ Who Contributes
@AbhinavTuli, @farizrahman4u, @kristinagrig06, @mccrearyd and @thisiseshan
Adding metadata and parallel computations
🎁 What's New
- You can add metadata to datasets and tensors
- You can run computations in parallel using
hub.compute
- The dataset API is updated to be more intuitive
🧭 What's Changed
- Add static dataset delete (#1060) @benchislett
- 2.0.4 release Version update (#1094) @davidbuniat
- Adding back transforms for parallel dataset uploads (#1086) @AbhinavTuli
- Release 2.0.3 (#1084) @davidbuniat
- Update code snippets (#1088) @istranic
- [refactor] encoders base class (#1082) @mccrearyd
- Alias "jpg" to "jpeg" (#1073) @benchislett
- Updated readme (#1085) @istranic
- [small] implement
hub.like
in new api (#1083) @mccrearyd - Bugout reporting update (#1081) @istranic
- Info fixes (#1080) @farizrahman4u
- BUGGER_OFF=true when running tests in Circle CI. (#1079) @zomglings
- [small] Remove dataset link during creation of Hub datasets (#1078) @dhiganthrao
- Enable search in pdoc (#1074) @benchislett
- Added "dataset" class for interacting with underlying "Dataset" class (#1063) @AbhinavTuli
- [small] turn off activeloop reporting during circleci tests (#1076) @mccrearyd
- dataset/tensor
info
alongsidemeta
(#1066) @mccrearyd - [small] fix hub cloud throttling for tests (#1077) @mccrearyd
- Add back the history of master into main (#1061) @benchislett
- [Small PR] Removes tfds tests (#1070) @AbhinavTuli
- Old pytorch multiprocessing bug fix (#1068) @AbhinavTuli
- [Small PR] Renames hub.load to hub.read (#1064) @AbhinavTuli
- Made Dataset and LRUCache objects pickleable (#1049) @AbhinavTuli
- Alternate fix for tensor creation bug (#1065) @AbhinavTuli
⚙️ Who Contributes
@AbhinavTuli, @benchislett, @davidbuniat, @dhiganthrao, @farizrahman4u, @istranic, @mccrearyd, @tatevikh and @zomglings
Bug fix for .pytorch DataLoaders 🌈
🎁 What's New
- We mostly focused on refactoring and minor bugs.
- .pytorch() now works with pubic datasets hosted by team Activeloop (e.g. hub://activeloop/mnist-train).
- Underlying data format is now better! Since the new format is incompatible with the prior release, you should update to the new release using
pip3 install --upgrade hub
.
🧭 What's Changed
- version update (#1062) @davidbuniat
- Fixes an issue in which reporting configuration file was not being created if its parent directory didn't exist. (#1058) @zomglings
- Update PR template to new format (#1059) @benchislett
- Add back PR template from master (#1034) @benchislett
- Update htype docs (#1030) @benchislett
- Validate indexing when given, not at compute-time (#1033) @benchislett
- Update readme (#1057) @istranic
- fix meta non-persistence bug (adds test) (#1053) @mccrearyd
- Updating old pytorch warning message (#1055) @AbhinavTuli
- Changed from master to main (#1052) @Anselmoo
- [refactor] Tests/update fixtures (#1046) @mccrearyd
- NPZ replacement format (only) (#1047) @farizrahman4u
- Auto cast (#1041) @farizrahman4u
- Bring back tuple mode, this time serializable (#1028) @farizrahman4u
- Array interface for Tensor (#1042) @farizrahman4u
- Windows always uses old pytorch integration now (#1044) @AbhinavTuli
- [small] remove chunk sizes from htypes (#1037) @mccrearyd
- Small fix for Pytorch shared memory leak (#1040) @AbhinavTuli
- Fixes dataset creation bug with s3/hub cloud datasets having similar names (#1045) @AbhinavTuli
- [small] Update/2.0/hub cloud test (#1023) @mccrearyd
- Fix tensor creation bug (#1043) @farizrahman4u
- Refactor/fstrings (#1035) @dhiganthrao
- update sample compression API (#1038) @mccrearyd
- [small] Silence tensorflow logs in tests (#1029) @benchislett
- [small] update scalar test (#1022) @mccrearyd
🐛 Bug Fixes
- [small] pytorch readonly error bug fix (#1026) @mccrearyd
- [small] Fix/2.0/readonly (#1024) @mccrearyd
🔗 Dependency Updates
- Bump pillow from 7.2.0 to 8.2.0 in /requirements (#1018) @dependabot
⚙️ Who Contributes
@AbhinavTuli, @Anselmoo, @benchislett, @davidbuniat, @dependabot, @dhiganthrao, @farizrahman4u, @istranic, @mccrearyd, @tatevikh and @zomglings
Hub is in Beta!
What's New
- Hub core was redesigned to enable blazing-fast dataset creation. You can create a Hub dataset faster than copy/pasting files on your local machine
Features
- Super simple API
- Easy creation of datasets and hosting on Activellop Storage or S3
- Rapid dataset streaming to any machine
- Simple dataset integration to pytorch with no boilerplate code (Windows support will be added in the next release)
Pre-Release 2.0.1-alpha
Pre-release for Hub 2.0-alpha
2.0 Early Alpha
Merge pull request #916 from activeloopai/task/2.0/append-api-updates [2.0] Various API changes