openrlhf-training

Name: openrlhf-training
Author: orchestra-research

✓

Framework RLHF hautes performances avec accélération Ray+vLLM. Utilisation pour la formation PPO, GRPO, RLOO, DPO de grands modèles (7B-70B+). Construit sur Ray, vLLM, ZeRO-3. 2 fois plus rapide que DeepSpeedChat avec une architecture distribuée et le partage des ressources GPU.

orchestra-research·openrlhf·training

16Installations·1Tendance·@orchestra-research