@@ -29,7 +29,7 @@ AI/ML, with a similar logging interface? Try out LogIX that is built upon our cu
29
29
[ Huggingface Transformers] ( https://github.com/logix-project/logix/tree/main?tab=readme-ov-file#huggingface-integration ) and
30
30
[ PyTorch Lightning] ( https://github.com/logix-project/logix/tree/main?tab=readme-ov-file#pytorch-lightning-integration ) integrations)!
31
31
32
- - ** PyPI** (Default)
32
+ - ** PyPI**
33
33
``` bash
34
34
pip install logix-ai
35
35
```
@@ -42,52 +42,14 @@ pip install -e .
42
42
```
43
43
44
44
45
- ## Usage
46
- ### Logging
47
- Training log extraction with LogIX is as simple as adding one ` with ` statement to the existing
48
- training code. LogIX automatically extracts user-specified logs using PyTorch hooks, and stores
49
- it as a tuple of ` ([data_ids], log[module_name][log_type]) ` . If needed, LogIX writes these logs
50
- to disk efficiently with memory-mapped files.
51
-
52
- ``` python
53
- import logix
45
+ ## Easy to Integrate
54
46
55
- # Initialze LogIX
56
- run = logix.init(project = " my_project" )
57
-
58
- # Specify modules to be tracked for logging
59
- run.watch(model, name_filter = [" mlp" ], type_filter = [nn.Linear])
60
-
61
- # Specify plugins to be used in logging
62
- run.setup({" grad" : [" log" , " covariance" ]})
63
- run.save(True )
64
-
65
- for batch in data_loader:
66
- # Set `data_id` (and optionally `mask`) for the current batch
67
- with run(data_id = batch[" input_ids" ], mask = batch[" attention_mask" ]):
68
- model.zero_grad()
69
- loss = model(batch)
70
- loss.backward()
71
- # Synchronize statistics (e.g. covariance) and write logs to disk
72
- run.finalize()
73
- ```
74
-
75
- ### Training Data Attribution
76
- As a part of our initial research, we implemented influence functions using LogIX. We plan to provide more
77
- pre-implemented interpretability algorithms if there is a demand.
78
-
79
- ``` python
80
- # Build PyTorch DataLoader from saved log data
81
- log_loader = run.build_log_dataloader()
82
-
83
- with run(data_id = test_batch[" input_ids" ]):
84
- test_loss = model(test_batch)
85
- test_loss.backward()
86
-
87
- test_log = run.get_log()
88
- run.influence.compute_influence_all(test_log, log_loader) # Data attribution
89
- run.influence.compute_self_influence(test_log) # Uncertainty estimation
90
- ```
47
+ Our software design allows for the seamless integration with popular high-level frameworks including
48
+ [ HuggingFace Transformer] ( https://github.com/huggingface/transformers/tree/main ) and
49
+ [ PyTorch Lightning] ( https://github.com/Lightning-AI/pytorch-lightning ) , that conveniently handles
50
+ distributed training, data loading, etc. Advanced users, who don't use high-level frameworks, can
51
+ still integrate LogIX into their existing training code similarly to any traditional logging software
52
+ (See our Tutorial).
91
53
92
54
### HuggingFace Integration
93
55
Our software design allows for the seamless integration with HuggingFace's
@@ -122,7 +84,7 @@ trainer.self_influence()
122
84
```
123
85
124
86
### PyTorch Lightning Integration
125
- Similarly, we also support the LogIX + PyTorch Lightning integration . The code example
87
+ Similarly, we also support the seamless integration with PyTorch Lightning. The code example
126
88
is provided below.
127
89
128
90
``` python
@@ -157,6 +119,53 @@ trainer.extract_log(module, train_loader)
157
119
trainer.influence(module, train_loader)
158
120
```
159
121
122
+ ## Getting Started
123
+ ### Logging
124
+ Training log extraction with LogIX is as simple as adding one ` with ` statement to the existing
125
+ training code. LogIX automatically extracts user-specified logs using PyTorch hooks, and stores
126
+ it as a tuple of ` ([data_ids], log[module_name][log_type]) ` . If needed, LogIX writes these logs
127
+ to disk efficiently with memory-mapped files.
128
+
129
+ ``` python
130
+ import logix
131
+
132
+ # Initialze LogIX
133
+ run = logix.init(project = " my_project" )
134
+
135
+ # Specify modules to be tracked for logging
136
+ run.watch(model, name_filter = [" mlp" ], type_filter = [nn.Linear])
137
+
138
+ # Specify plugins to be used in logging
139
+ run.setup({" grad" : [" log" , " covariance" ]})
140
+ run.save(True )
141
+
142
+ for batch in data_loader:
143
+ # Set `data_id` (and optionally `mask`) for the current batch
144
+ with run(data_id = batch[" input_ids" ], mask = batch[" attention_mask" ]):
145
+ model.zero_grad()
146
+ loss = model(batch)
147
+ loss.backward()
148
+ # Synchronize statistics (e.g. covariance) and write logs to disk
149
+ run.finalize()
150
+ ```
151
+
152
+ ### Training Data Attribution
153
+ As a part of our initial research, we implemented influence functions using LogIX. We plan to provide more
154
+ pre-implemented interpretability algorithms if there is a demand.
155
+
156
+ ``` python
157
+ # Build PyTorch DataLoader from saved log data
158
+ log_loader = run.build_log_dataloader()
159
+
160
+ with run(data_id = test_batch[" input_ids" ]):
161
+ test_loss = model(test_batch)
162
+ test_loss.backward()
163
+
164
+ test_log = run.get_log()
165
+ run.influence.compute_influence_all(test_log, log_loader) # Data attribution
166
+ run.influence.compute_self_influence(test_log) # Uncertainty estimation
167
+ ```
168
+
160
169
Please check out [ Examples] ( /examples ) for more detailed examples!
161
170
162
171
0 commit comments