Today at TransformX, we announced a huge step forward for the open source ML community: we are partnering with @StabilityAI to release the first large language model trained with human feedback. carper.ai/instruct-gpt-a… 1/4
5
79
492
0
115
Reinforcement learning with human feedback (RLHF) is what powers the highest performing language models. arxiv.org/abs/2009.01325
Scale partnered with OpenAI on InstructGPT (openai.com/blog/instructi…), and we’re excited to make these techniques available to everyone.
We’ll release our first trained model with Stability AI soon. If you want to start tinkering with RLHF now, we’re also helping develop TRLX: github.com/CarperAI/trlx — the open source library for reinforcement learning with transformers.