Add rescale and readme changes

inseq-team · Jun 27, 2024 · 0e318ce · 0e318ce
1 parent 067351a
commit 0e318ce
Show file tree

Hide file tree

Showing 5 changed files with 23 additions and 4 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -9,15 +9,19 @@
 ## 🔧 Fixes and Refactoring
 
 - Fix the issue in the attention implementation from [#268](https://github.com/inseq-team/inseq/issues/268) where non-terminal position in the tensor were set to nan if they were 0s ([#269](https://github.com/inseq-team/inseq/pull/269)).
-  
+
 - Fix the pad token in cases where it is not specified by default in the loaded model (e.g. for Qwen models) ([#269](https://github.com/inseq-team/inseq/pull/269)).
 
 - Fix bug reported in [#266](https://github.com/inseq-team/inseq/issues/266) making `value_zeroing` unusable for SDPA attention. This enables using the method on models using SDPA attention as default (e.g. `GemmaForCausalLM`) without passing `model_kwargs={'attn_implementation': 'eager'}` ([#267](https://github.com/inseq-team/inseq/pull/267)).
 
+- Fix multi-device support and duplicate BOS for chat template models ([#280](https://github.com/inseq-team/inseq/pull/280)).
+
+- Add `rescale_attributions` to Inseq CLI commands for `rescale=True` ([#280](https://github.com/inseq-team/inseq/pull/280)).
+
 ## 📝 Documentation and Tutorials
 
 *No changes*
 
 ## 💥 Breaking Changes
 
-*No changes*
+*No changes*
diff --git a/README.md b/README.md
@@ -280,7 +280,7 @@ Our vision for Inseq is to create a centralized, comprehensive and robust set of
 
 ## Citing Inseq
 
-If you use Inseq in your research we suggest to include a mention to the specific release (e.g. v0.4.0) and we kindly ask you to cite our reference paper as:
+If you use Inseq in your research we suggest to include a mention to the specific release (e.g. v0.6.0) and we kindly ask you to cite our reference paper as:
 
 ```bibtex
 @inproceedings{sarti-etal-2023-inseq,
@@ -308,7 +308,7 @@ If you use Inseq in your research we suggest to include a mention to the specifi
 Inseq has been used in various research projects. A list of known publications that use Inseq to conduct interpretability analyses of generative models is shown below.
 
 > [!TIP]
-> Last update: May 2024. Please open a pull request to add your publication to the list.
+> Last update: June 2024. Please open a pull request to add your publication to the list.
 
 <details>
   <summary><b>2023</b></summary>
@@ -331,6 +331,9 @@ Inseq has been used in various research projects. A list of known publications t
     <li><a href="https://arxiv.org/abs/2402.00794">ReAGent: A Model-agnostic Feature Attribution Method for Generative Language Models</a> (Zhao et al., 2024)</li>
     <li><a href="https://arxiv.org/abs/2404.02421">Revisiting subword tokenization: A case study on affixal negation in large language models</a> (Truong et al., 2024)</li>
     <li><a href="https://hal.science/hal-04581586">Exploring NMT Explainability for Translators Using NMT Visualising Tools</a> (Gonzalez-Saez et al., 2024)</li>
+    <li><a href="https://arxiv.org/abs/2405.14899">DETAIL: Task DEmonsTration Attribution for Interpretable In-context Learning</a> (Zhou et al., 2024)</li>
+    <li><a href="https://arxiv.org/abs/2406.06399">Should We Fine-Tune or RAG? Evaluating Different Techniques to Adapt LLMs for Dialogue</a> (Alghisi et al., 2024)</li>
+    <li><a href="https://arxiv.org/abs/2406.13663">Model Internals-based Answer Attribution for Trustworthy Retrieval-Augmented Generation</a> (Qi, Sarti et al., 2024)</li>
   </ol>
 
 </details>
diff --git a/inseq/commands/attribute/attribute.py b/inseq/commands/attribute/attribute.py
@@ -11,12 +11,14 @@ def aggregate_attribution_scores(
     selectors: Optional[list[int]] = None,
     aggregators: Optional[list[str]] = None,
     normalize_attributions: bool = False,
+    rescale_attributions: bool = False,
 ) -> FeatureAttributionOutput:
     if selectors is not None and aggregators is not None:
         for select_idx, aggregator_fn in zip(selectors, aggregators):
             out = out.aggregate(
                 aggregator=aggregator_fn,
                 normalize=normalize_attributions,
+                rescale=rescale_attributions,
                 select_idx=select_idx,
                 do_post_aggregation_checks=False,
             )
@@ -79,6 +81,7 @@ def attribute(input_texts, generated_texts, args: AttributeExtendedArgs):
                 selectors=args.attribution_selectors,
                 aggregators=args.attribution_aggregators,
                 normalize_attributions=args.normalize_attributions,
+                rescale_attributions=args.rescale_attributions,
             )
         print(f"Saving {'aggregated ' if args.aggregate_output else ''}attributions to {args.save_path}")
         out.save(args.save_path, overwrite=True)

diff --git a/inseq/commands/attribute/attribute_args.py b/inseq/commands/attribute/attribute_args.py
@@ -61,6 +61,14 @@ class AttributeBaseArgs:
             "for each context are normalized to sum up to 1, providing a relative notion of input salience."
         ),
     )
+    rescale_attributions: bool = cli_arg(
+        default=False,
+        help=(
+            "Whether to rescale the attribution scores for each context. If ``True``, the attribution scores "
+            "for each context are rescaled to sum up to the number of tokens in the input, providing an absolute"
+            " notion of input salience."
+        ),
+    )
     model_kwargs: dict = cli_arg(
         default_factory=dict,
         help="Additional keyword arguments passed to the model constructor in JSON format.",

diff --git a/inseq/commands/attribute_context/attribute_context.py b/inseq/commands/attribute_context/attribute_context.py
@@ -211,6 +211,7 @@ def attribute_context_with_model(args: AttributeContextArgs, model: HuggingfaceM
             selectors=args.attribution_selectors,
             aggregators=args.attribution_aggregators,
             normalize_attributions=args.normalize_attributions,
+            rescale_attributions=args.rescale_attributions,
         )[0]
         if args.show_intermediate_outputs:
             cci_attrib_out.show(do_aggregation=False)