From 4b9977fdcb579276331b37be468ac21bd7748728 Mon Sep 17 00:00:00 2001
From: Sang Choe <sangkeuc@andrew.cmu.edu>
Date: Fri, 7 Jun 2024 10:08:21 -0400
Subject: [PATCH] Update README.md

---
 README.md | 103 +++++++++++++++++++++++++++++-------------------------
 1 file changed, 56 insertions(+), 47 deletions(-)

diff --git a/README.md b/README.md
index d18df7b..1b6d0d6 100644
--- a/README.md
+++ b/README.md
@@ -29,7 +29,7 @@ AI/ML, with a similar logging interface? Try out LogIX that is built upon our cu
 [Huggingface Transformers](https://github.com/logix-project/logix/tree/main?tab=readme-ov-file#huggingface-integration) and
 [PyTorch Lightning](https://github.com/logix-project/logix/tree/main?tab=readme-ov-file#pytorch-lightning-integration) integrations)!
 
-- **PyPI** (Default)
+- **PyPI**
 ```bash
 pip install logix-ai
 ```
@@ -42,52 +42,14 @@ pip install -e .
 ```
 
 
-## Usage
-### Logging
-Training log extraction with LogIX is as simple as adding one `with` statement to the existing
-training code. LogIX automatically extracts user-specified logs using PyTorch hooks, and stores
-it as a tuple of `([data_ids], log[module_name][log_type])`. If needed, LogIX writes these logs
-to disk efficiently with memory-mapped files.
-
-```python
-import logix
+## Easy to Integrate
 
-# Initialze LogIX
-run = logix.init(project="my_project")
-
-# Specify modules to be tracked for logging
-run.watch(model, name_filter=["mlp"], type_filter=[nn.Linear])
-
-# Specify plugins to be used in logging
-run.setup({"grad": ["log", "covariance"]})
-run.save(True)
-
-for batch in data_loader:
-    # Set `data_id` (and optionally `mask`) for the current batch 
-    with run(data_id=batch["input_ids"], mask=batch["attention_mask"]):
-        model.zero_grad()
-        loss = model(batch)
-        loss.backward()
-# Synchronize statistics (e.g. covariance) and write logs to disk
-run.finalize()
-```
-
-### Training Data Attribution
-As a part of our initial research, we implemented influence functions using LogIX. We plan to provide more
-pre-implemented interpretability algorithms if there is a demand.
-
-```python
-# Build PyTorch DataLoader from saved log data
-log_loader = run.build_log_dataloader()
-
-with run(data_id=test_batch["input_ids"]):
-    test_loss = model(test_batch)
-    test_loss.backward()
-
-test_log = run.get_log()
-run.influence.compute_influence_all(test_log, log_loader) # Data attribution
-run.influence.compute_self_influence(test_log) # Uncertainty estimation
-```
+Our software design allows for the seamless integration with popular high-level frameworks including
+[HuggingFace Transformer](https://github.com/huggingface/transformers/tree/main) and
+[PyTorch Lightning](https://github.com/Lightning-AI/pytorch-lightning), that conveniently handles
+distributed training, data loading, etc. Advanced users, who don't use high-level frameworks, can
+still integrate LogIX into their existing training code similarly to any traditional logging software
+(See our Tutorial).
 
 ### HuggingFace Integration
 Our software design allows for the seamless integration with HuggingFace's
@@ -122,7 +84,7 @@ trainer.self_influence()
 ```
 
 ### PyTorch Lightning Integration
-Similarly, we also support the LogIX + PyTorch Lightning integration. The code example
+Similarly, we also support the seamless integration with PyTorch Lightning. The code example
 is provided below.
 
 ```python
@@ -157,6 +119,53 @@ trainer.extract_log(module, train_loader)
 trainer.influence(module, train_loader)
 ```
 
+## Getting Started
+### Logging
+Training log extraction with LogIX is as simple as adding one `with` statement to the existing
+training code. LogIX automatically extracts user-specified logs using PyTorch hooks, and stores
+it as a tuple of `([data_ids], log[module_name][log_type])`. If needed, LogIX writes these logs
+to disk efficiently with memory-mapped files.
+
+```python
+import logix
+
+# Initialze LogIX
+run = logix.init(project="my_project")
+
+# Specify modules to be tracked for logging
+run.watch(model, name_filter=["mlp"], type_filter=[nn.Linear])
+
+# Specify plugins to be used in logging
+run.setup({"grad": ["log", "covariance"]})
+run.save(True)
+
+for batch in data_loader:
+    # Set `data_id` (and optionally `mask`) for the current batch 
+    with run(data_id=batch["input_ids"], mask=batch["attention_mask"]):
+        model.zero_grad()
+        loss = model(batch)
+        loss.backward()
+# Synchronize statistics (e.g. covariance) and write logs to disk
+run.finalize()
+```
+
+### Training Data Attribution
+As a part of our initial research, we implemented influence functions using LogIX. We plan to provide more
+pre-implemented interpretability algorithms if there is a demand.
+
+```python
+# Build PyTorch DataLoader from saved log data
+log_loader = run.build_log_dataloader()
+
+with run(data_id=test_batch["input_ids"]):
+    test_loss = model(test_batch)
+    test_loss.backward()
+
+test_log = run.get_log()
+run.influence.compute_influence_all(test_log, log_loader) # Data attribution
+run.influence.compute_self_influence(test_log) # Uncertainty estimation
+```
+
 Please check out [Examples](/examples) for more detailed examples!