From 621d2b55f844022e98397bdfe566cb4bbd230985 Mon Sep 17 00:00:00 2001 From: Yaroslav Golubev Date: Thu, 6 Jun 2024 13:47:43 +0200 Subject: [PATCH] Update README.md --- module_summarization/README.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/module_summarization/README.md b/module_summarization/README.md index 3e0cbdd..9f23ec7 100644 --- a/module_summarization/README.md +++ b/module_summarization/README.md @@ -1,10 +1,10 @@ # 🏟️ Long Code Arena Baselines ## Module summarization -This directory contains code for running baselines for the Module summarization task in the Long Code Arena benchmark. +This directory contains the code for running baselines for the Module summarization task in the Long Code Arena benchmark. -We provide implementation of baselines running inference via [OpenAI](https://platform.openai.com/docs/overview) and [Together.AI](https://www.together.ai/). -We generate documentation based on an intent (one sentence description of documentation content), target documentation name and relevant code context. +We provide the implementation of baselines running inference via [OpenAI](https://platform.openai.com/docs/overview) and [Together.AI](https://www.together.ai/). +We generate documentation based on an intent (one sentence description of documentation content), target documentation name, and relevant code context. # How-to @@ -27,7 +27,7 @@ The script will generate predictions and put them into the `save_dir` directory #### Metrics -To compare predicted and ground truth metrics we introduce the new metric based on LLM as an assessor. Our approach involves feeding the LLM with relevant code and two versions of documentation: the ground truth and the model-generated text. To mitigate variance and potential ordering effects in model responses, we calculate the probability that the generated documentation is superior by averaging the results of two queries: +To compare predicted and ground truth texts, we introduce the new metric based on LLM as an assessor. Our approach involves feeding the LLM with relevant code and two versions of documentation: the ground truth and the model-generated text. To mitigate variance and potential ordering effects in model responses, we calculate the probability that the generated documentation is superior by averaging the results of two queries: ```math CompScore = \frac{ P(pred | LLM(code, pred, gold)) + P(pred | LLM(code, gold, pred))}{2}