Build software better, together

liushunyu / awesome-direct-preference-optimization

A Survey of Direct Preference Optimization (DPO)

review survey alignment preference-learning dpo large-language-models llm llms large-language-model reinforcement-learning-from-human-feedback direct-preference-optimization

Updated Mar 18, 2025

mlvlab / VidChain

Star

Official Implementation (Pytorch) of the "VidChain: Chain-of-Tasks with Metric-based Direct Preference Optimization for Dense Video Captioning", AAAI 2025

dense-video-captioning long-video-understanding multimodal-large-language-models direct-preference-optimization aaai2025

Updated Jan 26, 2025
Python

rasyosef / phi-2-sft-and-dpo

Star

Notebooks to create an instruction following version of Microsoft's Phi 2 LLM with Supervised Fine Tuning and Direct Preference Optimization (DPO)

transformers pytorch huggingface trl llm supervised-finetuning direct-preference-optimization

Updated Nov 27, 2024
Jupyter Notebook

rasyosef / phi-1_5-instruct

Star

Notebooks to create an instruction following version of Microsoft's Phi 1.5 LLM with Supervised Fine Tuning and Direct Preference Optimization (DPO)

transformers pytorch trl llm supervised-finetuning direct-preference-optimization

Updated Aug 17, 2024

akhilpandey95 / LLMSciSci

Sponsor

Star

Experiments, and how-to guide for the lecture "Large language models for Scientometrics"

reproducibility scientometrics in-context-learning llms finetuning-llms direct-preference-optimization

Updated Mar 25, 2025
Jupyter Notebook

artaasd95 / rap-music-generator

Star

The Rap Music Generator project is an innovative LLM-based tool designed to create rap lyrics. It offers multiple fine-tuning approaches to accommodate diverse rap generation techniques, providing users with a versatile platform for generating unique and stylistically varied content.

python machine-learning huggingface large-language-models supervised-finetuning llm-training direct-preference-optimization

Updated Jan 3, 2025
Jupyter Notebook

hongbo-wei / llm-algorithms

Sponsor

Star

SciLLM-Opt (Scientific Large Language Model Optimization) , SFT, DPO.

sft dpo direct-preference-optimization supervised-fine-tuning continue-pretrain

Updated Mar 25, 2025

eliashornberg / EPFLLaMA

Star

EPFLLaMA: A lightweight language model fine-tuned on EPFL curriculum content. Specialized for STEM education and multiple-choice question answering. Implements advanced techniques like SFT, DPO, and quantization.

natural-language-processing pytorch artificial-intelligence lora large-language-models supervised-finetuning direct-preference-optimization

Updated Sep 15, 2024
Jupyter Notebook

yflyzhang / RankPO

Star

RankPO: Rank Preference Optimization

information-retrieval dpo large-language-models llm rlhf rlaif reinforcement-learning-human-feedback direct-preference-optimization

Updated Mar 17, 2025
Python

AI-14 / r2gpoallm

Star

[Cog. Comp. 2025] [Official code] - Engaging preference optimization alignment in large language model for continual radiology report generation: A hybrid approach

natural-language-processing bioinformatics deep-learning transformer medical-image-analysis alignment-strategies chest-xrays large-language-models radiology-report-generation direct-preference-optimization

Updated Mar 22, 2025
Python

AliBakly / EPFLLaMA

Star

EPFLLaMA: A lightweight language model fine-tuned on EPFL curriculum content. Specialized for STEM education and multiple-choice question answering. Implements advanced techniques like SFT, DPO, and quantization.

natural-language-processing artificial-intelligence lora large-language-models supervised-finetuning direct-preference-optimization

Updated Sep 23, 2024
Jupyter Notebook

cluebbers / dpo-rlhf-paraphrase-types

Star

Enhancing paraphrase-type generation using Direct Preference Optimization (DPO) and Reinforcement Learning from Human Feedback (RLHF), with large-scale HPC support. This project aligns model outputs to human-ranked data for robust, safety-focused NLP.

reinforcement-learning deep-learning transformers alignment paraphrase-generation human-feedback direct-preference-optimization paraphrase-type-generation

Updated Feb 3, 2025
Jupyter Notebook

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

direct-preference-optimization

Here are 12 public repositories matching this topic...

liushunyu / awesome-direct-preference-optimization

mlvlab / VidChain

rasyosef / phi-2-sft-and-dpo

rasyosef / phi-1_5-instruct

akhilpandey95 / LLMSciSci

artaasd95 / rap-music-generator

hongbo-wei / llm-algorithms

eliashornberg / EPFLLaMA

yflyzhang / RankPO

AI-14 / r2gpoallm

AliBakly / EPFLLaMA

cluebbers / dpo-rlhf-paraphrase-types

Improve this page

Add this topic to your repo