Skip to content

Commit

Permalink
update readme
Browse files Browse the repository at this point in the history
  • Loading branch information
KeplerC committed Sep 18, 2024
1 parent 1fbbf25 commit 4fd85a6
Showing 1 changed file with 25 additions and 44 deletions.
69 changes: 25 additions & 44 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,71 +1,52 @@
# 🦊 Fog-RT-X
# 🦊 Robo-DM

🦊 Fog-RT-X: An Efficient and Scalable Data Collection and Management Framework For Robotics Learning. Support [Open-X-Embodiment](https://robotics-transformer-x.github.io/), 🤗[HuggingFace](https://huggingface.co/).
🦊 Robo-DM : An Efficient and Scalable Data Collection and Management Framework For Robotics Learning. Support [Open-X-Embodiment](https://robotics-transformer-x.github.io/), 🤗[HuggingFace](https://huggingface.co/).

🦊 Fog-RT-X considers both speed 🚀 and memory efficiency 📈 with active metadata and lazily-loaded trajectory data. It supports flexible and distributed dataset partitioning. It provides native support to cloud storage.
🦊 Robo-DM (Former Name: fog_x) considers both speed 🚀 and memory efficiency 📈 with active metadata and lazily-loaded trajectory data. It supports flexible and distributed dataset partitioning. It provides native support to cloud storage.

[Design Doc](https://docs.google.com/document/d/1woLQVLWsySGjFuz8aCsaLoc74dXQgIccnWRemjlNDws/edit#heading=h.irrfcedesnvr) | [Dataset Visualization](https://keplerc.github.io/openxvisualizer/)

## Note to ICRA Reviewers
We are actively developing the framework. See commit `a35a6` for the version we developed.


## Install

```bash
pip install fog_x
git clone https://github.com/BerkeleyAutomation/fog_x.git
cd fog_x
pip install -e .
```

## Usage

```py
import fog_x

# 🦊 Dataset Creation
# from distributed dataset storage
dataset = fog_x.Dataset(
name="demo_ds",
path="~/test_dataset", # can be AWS S3, Google Bucket!
)
path = "/tmp/output.vla"

# 🦊 Data collection:
# create a new trajectory
episode = dataset.new_episode()
# collect step data for the episode
episode.add(feature = "arm_view", value = "image1.jpg")
traj = fog_x.Trajectory(
path = path
)

traj.add(feature = "arm_view", value = "image1.jpg")
# Automatically time-aligns and saves the trajectory
episode.close()
traj.close()

# 🦊 Data Loading:
# load from existing RT-X/Open-X datasets
dataset.load_rtx_episodes(
name="berkeley_autolab_ur5",
additional_metadata={"collector": "User 2"}
# load it
fog_x.Trajectory(
path = path
)

# 🦊 Data Management and Analytics:
# Compute and memory efficient filter, map, aggregate, groupby
episode_info = dataset.get_episode_info()
desired_episodes = episode_info.filter(episode_info["collector"] == "User 2")

# 🦊 Data Sharing and Usage:
# Export and share the dataset as standard Open-X-Embodiment format
# it also supports hugging face, and more!
dataset.export(desired_episodes, format="rtx")
# Load with pytorch dataloader
torch.utils.data.DataLoader(dataset.as_pytorch_dataset(desired_episodes))
```

## Design
🦊 Fog-RT-X recognizes most post-processing, analytics and management involves the trajectory-level data, such as tags, while actual trajectory steps are rarely read, written and transformed. Acessing and modifying trajectory data is very expensive and hard.

As a result, 🦊 Fog-RT-X proposes
* a user-friendly metadata table via Pandas Datframe for speed and freedom
* a LazyFrame from Polars for the trajectory dataset that only loads and transform the data if needed
* parquet as storage format for distributed storage and columnar support compared to tensorflow records
* Easy and automatic RT-X/Open-X dataset export and pytorch dataloading


## More Coming Soon!
Currently we see a more than 60\% space saving on some existing RT-X datasets. This can be even more by re-paritioning the dataset. Our next steps can be found in the [planning doc](./design_doc/planning_doc.md). Feedback welcome through issues or PR to planning doc!
## Examples

We also note we are at a beta-testing phase. We make our best effort to be backward-compatible but interfaces may be unstable.
* [Data Collection and Loading](./examples/data_collection_and_load.py)
* [Convert From Open_X](./examples/openx_loader.py)
* [Convert From H5](./examples/h5_loader.py)
* [Running Benchmarks](./benchmarks/openx.py)

## Development

Expand Down

0 comments on commit 4fd85a6

Please sign in to comment.