Skip to content

Commit 4b9977f

Browse files
authored
Update README.md
1 parent 58a16aa commit 4b9977f

File tree

1 file changed

+56
-47
lines changed

1 file changed

+56
-47
lines changed

README.md

+56-47
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ AI/ML, with a similar logging interface? Try out LogIX that is built upon our cu
2929
[Huggingface Transformers](https://github.com/logix-project/logix/tree/main?tab=readme-ov-file#huggingface-integration) and
3030
[PyTorch Lightning](https://github.com/logix-project/logix/tree/main?tab=readme-ov-file#pytorch-lightning-integration) integrations)!
3131

32-
- **PyPI** (Default)
32+
- **PyPI**
3333
```bash
3434
pip install logix-ai
3535
```
@@ -42,52 +42,14 @@ pip install -e .
4242
```
4343

4444

45-
## Usage
46-
### Logging
47-
Training log extraction with LogIX is as simple as adding one `with` statement to the existing
48-
training code. LogIX automatically extracts user-specified logs using PyTorch hooks, and stores
49-
it as a tuple of `([data_ids], log[module_name][log_type])`. If needed, LogIX writes these logs
50-
to disk efficiently with memory-mapped files.
51-
52-
```python
53-
import logix
45+
## Easy to Integrate
5446

55-
# Initialze LogIX
56-
run = logix.init(project="my_project")
57-
58-
# Specify modules to be tracked for logging
59-
run.watch(model, name_filter=["mlp"], type_filter=[nn.Linear])
60-
61-
# Specify plugins to be used in logging
62-
run.setup({"grad": ["log", "covariance"]})
63-
run.save(True)
64-
65-
for batch in data_loader:
66-
# Set `data_id` (and optionally `mask`) for the current batch
67-
with run(data_id=batch["input_ids"], mask=batch["attention_mask"]):
68-
model.zero_grad()
69-
loss = model(batch)
70-
loss.backward()
71-
# Synchronize statistics (e.g. covariance) and write logs to disk
72-
run.finalize()
73-
```
74-
75-
### Training Data Attribution
76-
As a part of our initial research, we implemented influence functions using LogIX. We plan to provide more
77-
pre-implemented interpretability algorithms if there is a demand.
78-
79-
```python
80-
# Build PyTorch DataLoader from saved log data
81-
log_loader = run.build_log_dataloader()
82-
83-
with run(data_id=test_batch["input_ids"]):
84-
test_loss = model(test_batch)
85-
test_loss.backward()
86-
87-
test_log = run.get_log()
88-
run.influence.compute_influence_all(test_log, log_loader) # Data attribution
89-
run.influence.compute_self_influence(test_log) # Uncertainty estimation
90-
```
47+
Our software design allows for the seamless integration with popular high-level frameworks including
48+
[HuggingFace Transformer](https://github.com/huggingface/transformers/tree/main) and
49+
[PyTorch Lightning](https://github.com/Lightning-AI/pytorch-lightning), that conveniently handles
50+
distributed training, data loading, etc. Advanced users, who don't use high-level frameworks, can
51+
still integrate LogIX into their existing training code similarly to any traditional logging software
52+
(See our Tutorial).
9153

9254
### HuggingFace Integration
9355
Our software design allows for the seamless integration with HuggingFace's
@@ -122,7 +84,7 @@ trainer.self_influence()
12284
```
12385

12486
### PyTorch Lightning Integration
125-
Similarly, we also support the LogIX + PyTorch Lightning integration. The code example
87+
Similarly, we also support the seamless integration with PyTorch Lightning. The code example
12688
is provided below.
12789

12890
```python
@@ -157,6 +119,53 @@ trainer.extract_log(module, train_loader)
157119
trainer.influence(module, train_loader)
158120
```
159121

122+
## Getting Started
123+
### Logging
124+
Training log extraction with LogIX is as simple as adding one `with` statement to the existing
125+
training code. LogIX automatically extracts user-specified logs using PyTorch hooks, and stores
126+
it as a tuple of `([data_ids], log[module_name][log_type])`. If needed, LogIX writes these logs
127+
to disk efficiently with memory-mapped files.
128+
129+
```python
130+
import logix
131+
132+
# Initialze LogIX
133+
run = logix.init(project="my_project")
134+
135+
# Specify modules to be tracked for logging
136+
run.watch(model, name_filter=["mlp"], type_filter=[nn.Linear])
137+
138+
# Specify plugins to be used in logging
139+
run.setup({"grad": ["log", "covariance"]})
140+
run.save(True)
141+
142+
for batch in data_loader:
143+
# Set `data_id` (and optionally `mask`) for the current batch
144+
with run(data_id=batch["input_ids"], mask=batch["attention_mask"]):
145+
model.zero_grad()
146+
loss = model(batch)
147+
loss.backward()
148+
# Synchronize statistics (e.g. covariance) and write logs to disk
149+
run.finalize()
150+
```
151+
152+
### Training Data Attribution
153+
As a part of our initial research, we implemented influence functions using LogIX. We plan to provide more
154+
pre-implemented interpretability algorithms if there is a demand.
155+
156+
```python
157+
# Build PyTorch DataLoader from saved log data
158+
log_loader = run.build_log_dataloader()
159+
160+
with run(data_id=test_batch["input_ids"]):
161+
test_loss = model(test_batch)
162+
test_loss.backward()
163+
164+
test_log = run.get_log()
165+
run.influence.compute_influence_all(test_log, log_loader) # Data attribution
166+
run.influence.compute_self_influence(test_log) # Uncertainty estimation
167+
```
168+
160169
Please check out [Examples](/examples) for more detailed examples!
161170

162171

0 commit comments

Comments
 (0)