From 4b9977fdcb579276331b37be468ac21bd7748728 Mon Sep 17 00:00:00 2001 From: Sang Choe Date: Fri, 7 Jun 2024 10:08:21 -0400 Subject: [PATCH] Update README.md --- README.md | 103 +++++++++++++++++++++++++++++------------------------- 1 file changed, 56 insertions(+), 47 deletions(-) diff --git a/README.md b/README.md index d18df7b..1b6d0d6 100644 --- a/README.md +++ b/README.md @@ -29,7 +29,7 @@ AI/ML, with a similar logging interface? Try out LogIX that is built upon our cu [Huggingface Transformers](https://github.com/logix-project/logix/tree/main?tab=readme-ov-file#huggingface-integration) and [PyTorch Lightning](https://github.com/logix-project/logix/tree/main?tab=readme-ov-file#pytorch-lightning-integration) integrations)! -- **PyPI** (Default) +- **PyPI** ```bash pip install logix-ai ``` @@ -42,52 +42,14 @@ pip install -e . ``` -## Usage -### Logging -Training log extraction with LogIX is as simple as adding one `with` statement to the existing -training code. LogIX automatically extracts user-specified logs using PyTorch hooks, and stores -it as a tuple of `([data_ids], log[module_name][log_type])`. If needed, LogIX writes these logs -to disk efficiently with memory-mapped files. - -```python -import logix +## Easy to Integrate -# Initialze LogIX -run = logix.init(project="my_project") - -# Specify modules to be tracked for logging -run.watch(model, name_filter=["mlp"], type_filter=[nn.Linear]) - -# Specify plugins to be used in logging -run.setup({"grad": ["log", "covariance"]}) -run.save(True) - -for batch in data_loader: - # Set `data_id` (and optionally `mask`) for the current batch - with run(data_id=batch["input_ids"], mask=batch["attention_mask"]): - model.zero_grad() - loss = model(batch) - loss.backward() -# Synchronize statistics (e.g. covariance) and write logs to disk -run.finalize() -``` - -### Training Data Attribution -As a part of our initial research, we implemented influence functions using LogIX. We plan to provide more -pre-implemented interpretability algorithms if there is a demand. - -```python -# Build PyTorch DataLoader from saved log data -log_loader = run.build_log_dataloader() - -with run(data_id=test_batch["input_ids"]): - test_loss = model(test_batch) - test_loss.backward() - -test_log = run.get_log() -run.influence.compute_influence_all(test_log, log_loader) # Data attribution -run.influence.compute_self_influence(test_log) # Uncertainty estimation -``` +Our software design allows for the seamless integration with popular high-level frameworks including +[HuggingFace Transformer](https://github.com/huggingface/transformers/tree/main) and +[PyTorch Lightning](https://github.com/Lightning-AI/pytorch-lightning), that conveniently handles +distributed training, data loading, etc. Advanced users, who don't use high-level frameworks, can +still integrate LogIX into their existing training code similarly to any traditional logging software +(See our Tutorial). ### HuggingFace Integration Our software design allows for the seamless integration with HuggingFace's @@ -122,7 +84,7 @@ trainer.self_influence() ``` ### PyTorch Lightning Integration -Similarly, we also support the LogIX + PyTorch Lightning integration. The code example +Similarly, we also support the seamless integration with PyTorch Lightning. The code example is provided below. ```python @@ -157,6 +119,53 @@ trainer.extract_log(module, train_loader) trainer.influence(module, train_loader) ``` +## Getting Started +### Logging +Training log extraction with LogIX is as simple as adding one `with` statement to the existing +training code. LogIX automatically extracts user-specified logs using PyTorch hooks, and stores +it as a tuple of `([data_ids], log[module_name][log_type])`. If needed, LogIX writes these logs +to disk efficiently with memory-mapped files. + +```python +import logix + +# Initialze LogIX +run = logix.init(project="my_project") + +# Specify modules to be tracked for logging +run.watch(model, name_filter=["mlp"], type_filter=[nn.Linear]) + +# Specify plugins to be used in logging +run.setup({"grad": ["log", "covariance"]}) +run.save(True) + +for batch in data_loader: + # Set `data_id` (and optionally `mask`) for the current batch + with run(data_id=batch["input_ids"], mask=batch["attention_mask"]): + model.zero_grad() + loss = model(batch) + loss.backward() +# Synchronize statistics (e.g. covariance) and write logs to disk +run.finalize() +``` + +### Training Data Attribution +As a part of our initial research, we implemented influence functions using LogIX. We plan to provide more +pre-implemented interpretability algorithms if there is a demand. + +```python +# Build PyTorch DataLoader from saved log data +log_loader = run.build_log_dataloader() + +with run(data_id=test_batch["input_ids"]): + test_loss = model(test_batch) + test_loss.backward() + +test_log = run.get_log() +run.influence.compute_influence_all(test_log, log_loader) # Data attribution +run.influence.compute_self_influence(test_log) # Uncertainty estimation +``` + Please check out [Examples](/examples) for more detailed examples!