Code Review and Project Workflow Analysis for Git Data

Introduction

This project uses Large Language Models (LLMs) to analyze Git data, providing insights into commit messages, code diffs, and commit categorization. The goal is to enhance the code review process and project workflow analysis by automating the interpretation and summarization of changes in the codebase. This project is part of the final work for the Large Language Models course at Politecnico di Torino (PoliTO).

Proposed Implementation

The language model used in this project is Llama 3.2-1B-Instruct. The testing and evaluation of the framework is done on the MuJS repository, a lightweight JavaScript interpreter.

Commits Extractor

Extracts git commits and preprocesses them to remove irrelevant information. Filters trivial commits (e.g., minor changes, merges, readme updates) and normalizes commit messages for consistency.

Categorization Chain

Predicts a category for each commit from a fixed list. The model sees all relevant commit information, including author, message, changed files, and code changes. Tested in zero-shot and few-shot settings.

Summarization Chain

Generates summaries for each commit, given all relevant information. Two levels of summaries: high-level description ("summary") and detailed code changes ("Technical summary"). Only few-shot setup used.

Quality Assurance Framework

Iterative approach inspired by MAGIS. One LLM agent generates summaries, another evaluates and scores them (0-10). Summaries below a score of 8 are not accepted, ensuring accuracy and reliability.

Story Generation

Generates stories to describe project evolution based on commits. Captures the essence of changes and their impact on the project.

Requirements

Python 3.12+
Torch
Transformers
GitPython

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
images		images
notebooks		notebooks
src		src
LLM_Project_A5_report.pdf		LLM_Project_A5_report.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Code Review and Project Workflow Analysis for Git Data

Introduction

Proposed Implementation

Commits Extractor

Categorization Chain

Summarization Chain

Quality Assurance Framework

Story Generation

Requirements

Team Members

About

Releases

Packages

Contributors 2

Languages

maxfra01/code-review-and-project-workflow-analysis-for-git-data

Folders and files

Latest commit

History

Repository files navigation

Code Review and Project Workflow Analysis for Git Data

Introduction

Proposed Implementation

Commits Extractor

Categorization Chain

Summarization Chain

Quality Assurance Framework

Story Generation

Requirements

Team Members

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages