December 7 News.OpenAI Launched a 12-day "shipmas" launch cycle with a series of new features, products, and demos. On the second day of the event, OpenAI unveiled Reinforcement Fine-Tuning, the first of its kind in the world.Help developers and machine learning engineers build expert models for specific complex domain tasks.
This project improves the reasoning power and accuracy of models for domain-specific tasks through a new model customization technique that allows developers to fine-tune models using a high-quality task set and evaluate the model's response using reference answers.
Introduction to Intensive Fine Tuning
1AI attaches an official description: developers are able to customize OpenAI's models using dozens to thousands of high-quality tasks and score the model's responses using the provided reference answers. Officially this technology enhances the way models reason about similar problems and improves their accuracy on tasks specific to the domain.
Unlike standard fine-tuning, RFT utilizes reinforcement learning algorithms that can improve model performance from the high school level to the expert PhD level.
RFT differs from supervised fine-tuning in that instead of having the model mimic the inputs, it teaches the model to reason in a completely new way, and by scoring the model's answers and reinforcing the correct line of reasoning, RFT significantly improves the model's performance with just a handful of examples.
RFT supports users to create unique models with their own golden datasets and apply them to areas requiring specialized knowledge such as law, finance, engineering, insurance, etc.
Enhanced fine-tuning of group-oriented
OpenAI encourages applications from research organizations, universities and businesses, especially those that are currently led by experts performing a narrow range of complex tasks and would benefit from AI assistance.
OpenAI says that reinforcement fine-tuning performs well on tasks where the outcome has an objectively "correct" answer that most experts would agree with, so it thinks it will perform better in fields such as law, insurance, healthcare, finance, engineering, and so on.
Participants will have early access to the Alpha version of the enhanced fine-tuning APIs and will be able to test them on domain-specific tasks, and OpenAI encourages participants to share datasets and work together to improve OpenAI models.
OpenAI expects to publicly release enhanced fine-tuning capabilities in early 2025.
OpenAI CEO Sam Altman said, "Intensive fine-tuning that works surprisingly well; it's one of my biggest surprises for 2024."