Leverage Our Latest Open Models for Synthetic Data Generation with NVIDIA Nemotron-4 340B | NVIDIA Technical Blog (2024)

Since the introduction and subsequent wide adoption of Large Language Models (LLMs) – data has been the lifeblood of businesses building accurate and safe AI systems. A company’s data represents its cumulative knowledge and can be leveraged in various ways, from customization (Supervised Fine-Tuning, Parameter Efficient Fine-Tuning, continued pre-training, and more), to training brand-new domain-specific Small Language Models (SLMs). Data, while being one of the most critical pieces of a modern AI pipeline, has traditionally been costly and limiting during the development of innovative LLMs and SLMs – from paying human annotators to navigating the sourcing of large volumes of domain-specific data – the current process of generating high-quality data is a difficult task.

Through a process called Synthetic Data Generation (SDG), which will be more carefully defined in the rest of the blog, businesses can augment their existing data stores by leveraging LLMs to create customized high-quality data in large volumes.

NVIDIA is announcing a new suite of models specifically built for SDG – the Nemotron-4 340B family of models, including a state-of-the-art Reward Model, and a Instruct model to aid in SDG, all released under a permissive license that will enable businesses and developers alike to use the model outputs to build incredible models.

NVIDIA Open Model License

With the release of the Nemotron-4 340B family of models – which includes a Base, Instruct, and Reward Model – NVIDIA is introducing the NVIDIA Open Model License, a permissive license that allows distribution, modification, and use of the Nemotron-4 340B models and its outputs for personal, research, and commercial use, without attribution requirements.

Introducing Nemotron-4 340B Reward Model

Nemotron-4 340B Reward Model is a state-of-the-art multidimensional Reward Model. The model takes a text prompt as input – and returns a list of floating point numbers that are associated with the five attributes in the HelpSteer2 dataset, listed below.

The model has been evaluated using Reward Bench and shown to achieve benchmark-topping performance despite only containing 10K human-annotated response pairs.

Given a prompt, a Reward Model provides a score for a response according to human preference. In other words, it can align with human preferences for a given prompt and is therefore able to replace a large amount of human annotations. The newly released Nemotron-4 340B Reward leads Reward Bench with an overall score of 92.0. Notably, Nemotron-4 340B Reward has the most significant lead in Chat-Hard, beating the next best alternative by nearly seven percentage points. Chat-Hard is a subset of the test data that evaluates “a reward model’s abilities to understand trick questions and subtly different instruction responses.” (RewardBench paper)

HelpSteer2 Dataset

With the release of Nemotron-4 340B Reward, we also introduced HelpSteer2. This dataset is permissively licensed (CC-BY-4.0) with ten thousand response pairs. Each prompt in the dataset contains two responses which are human-annotated using a Likert-5 Scale (from 0–4, with higher meaning better) for five attributes:

  • Helpfulness: Overall helpfulness of the response to the prompt.
  • Correctness: Inclusion of all pertinent facts without errors.
  • Coherence: Consistency and clarity of expression.
  • Complexity: Intellectual depth required to write a response (i.e., whether the response can be written by anyone with basic language competency or requires deep domain expertise).
  • Verbosity: Amount of detail included in the response, relative to what is asked for in the prompt.

The dataset is focused on conversational data, including multi-turn conversations in the English language.

More details on the dataset are available in the HelpSteer2 dataset paper.

SteerLM Reward Model Training

Leverage Our Latest Open Models for Synthetic Data Generation with NVIDIA Nemotron-4 340B | NVIDIA Technical Blog (2)

The Nemotron-4 340B Reward Model was trained on the Nemotron-4 340B Base model with an additional linear layer that converts the final layer representation of the end-of-response token into five scalar values, each corresponding to a HelpSteer attribute, referred to as SteerLM Reward Model training. More detailed information on the training process can be found in the HelpSteer2 paper.

Unlike binary preference-based methods, the SteerLM Reward Model training process allows the model to provide more expressive feedback on which responses are considered good and why. Whereas binary-trained reward models might sometimes conflate a long response with a good response, SteerLM Reward Model training explicitly teaches the model to disambiguate verbosity as a scored attribute.

A Primer on Synthetic Data Generation

Before we illustrate how developers can utilize the Nemotron-4 340B family of models for Synthetic Data Generation (SDG), we first provide a primer. SDG refers to the process of creating datasets that can be used for a variety of model customizations, from Supervised Fine-Tuning (SFT), Parameter Efficient Fine-Tuning (PEFT) including Low Rank Adaptation (LoRA), and model alignment (using methods like RLAIF, DPO, and more). Additionally, use cases for SDG are not limited for model alignment, but can apply to a wide range of applications, from retrieval, to evaluation dataset curation, to recommender systems. For this blog post, we focus on model alignment as the primary use case for the Nemotron-4 340B family of models. Alignment training is a rapidly growing subdiscipline in the Generative AI domain and can be implemented in several different ways. Out of the existing methods, we discuss a specific implementation of a SDG pipeline as outlined below.

Critically, robust SDG methods go beyond just generating response data, but also include verification and checks to ensure ‌data quality remains high. LLM accuracy is often directly determined by the quality, rather than quantity, of the training data, making the step of “quality filtering” crucial in SDG recipes.

A Synthetic Data Generation Flow

Leverage Our Latest Open Models for Synthetic Data Generation with NVIDIA Nemotron-4 340B | NVIDIA Technical Blog (3)

In general terms, SDG is split between two primary pieces, outlined below.

  1. Synthetic Response Generation

Synthetic response data can be generated by giving Nemotron-4 340B Instruct domain-specific input queries. This allows the model to generate responses that are aligned with the input query in a format similar to those used in the Instruction Tuning with GPT-4 paper. These responses can be generated with a zero-shot, few-shot, or chain-of-thought style prompt – depending on the desired response format. Multiple responses to each query can be generated for filtering in the next step as well if required.

NOTE: Nemotron-4 340B Instruct model can also be used to generate domain-specific queries initially – thereby alleviating the need for a dataset of pre-established queries. However, this use case is not covered in the tutorial material.

  1. Reward Model Verification

Due to the multi-attribute nature of Nemotron-4 340B Reward – synthetic responses can be ranked by the most desired HelpSteer2 attributes so that only the highest-performing responses are kept. This emulates the process of Human Evaluation of the quality of prompts and adds a layer of quality monitoring in SDG pipelines.

Case Study:

NVIDIA researchers were able to demonstrate the effectiveness of SDG in the HelpSteer2 paper. A total of 100K rows of conversational synthetic data (referenced as “Daring Anteater” or “DA” in the benchmarks below) were created through the above pipeline. Using this dataset, the NVIDIA research team was able to align Llama 3 70B (base model) to match or exceed Llama 3 70B Instruct on a number of standard benchmarks. This was achieved despite using only 1% of the human-annotated data that the Llama 3 70B Instruct model was trained with.

Leverage Our Latest Open Models for Synthetic Data Generation with NVIDIA Nemotron-4 340B | NVIDIA Technical Blog (4)

The results showcase the effectiveness of SDG – and how using tools like Nemotron-4 340B Reward, and Nemotron-4 340B Instruct can be used to add value to businesses’ data pipelines today.

It is important to note that there are many SDG pipelines and this is still an active topic of research. Nemotron-4 340B Instruct was itself trained with a variation of the SDG pipeline similar to the flow illustrated in Figure 3, with 98% of its alignment training data being synthetically generated (learn more in the technical report). We encourage developers to evaluate and develop different pipelines and share best practices, as we continue to refine our own SDG methodologies.

Conclusion

Data serves as the backbone of LLMs. Recognizing Synthetic Data Generation as the next frontier of improving Gen AI applications for enterprises, NVIDIA offers the Nemotron-4 340B family of models and SDG pipeline to enable developers and enterprises alike to turbo-charge a wide range of synthetic data use cases, with a permissive license and one of the highest-quality, openly available instruct model and reward models.

Instructions for how to deploy the models are available on their respective model cards, with NeMo Framework instructions available for Nemotron-4 340B Base and Nemotron-4 340B Instruct, and NeMo Aligner instructions available for Nemotron-4 340B Reward.

In the coming weeks, we’ll be releasing Nemotron-4 340B NIMs for optimized inference on NVIDIA GPUs, as well as a technical walkthrough including tutorials on how to create the above SDG pipeline.

Try out Nemotron-4 340B Instruct through the preview inference API available here.

Leverage Our Latest Open Models for Synthetic Data Generation with NVIDIA Nemotron-4 340B | NVIDIA Technical Blog (2024)
Top Articles
10 of the Coolest Water Parks in Melbourne for 2024
Pune Too Hot to Handle? You Can Cool Off at These 9 Waterparks in Pune!
Wal-Mart 2516 Directory
Sdn Wright State 2023
The Ports of Karpathos: Karpathos (Pigadia) and Diafani | Greeka
Flag Mashup Bot
Madden 23 Playbooks Database
/hypno/ - Hypnofa*ggotry
Jinx Manga Vyvy
Texas (TX) Lottery - Winning Numbers & Results
303-615-0055
Grizzly Expiration Date 2023
Ironman Kona Tracker
Hướng Dẫn Trade Bittrex
The Haunting Of A Dream House By Reeves Wiedeman
Craigslis Nc
The Exorcist: Believer Showtimes Near Regal Waugh Chapel
Cara In Creekmaw Code
Brise Stocktwits
Battlenet We Couldn't Verify Your Account With That Information
Meg 2: The Trench Showtimes Near Phoenix Theatres Laurel Park
Fandango Movies And Shows
Milanka Kudel Telegram
Lanie Gardner: The Rising Star Behind the Viral Fleetwood Mac Cover - Neon Music - Digital Music Discovery & Showcase Platform
2010 Ford F-350 Super Duty XLT for sale - Wadena, MN - craigslist
Dna Profiling Virtual Lab Answer Key
Omaha Steaks Molten Lava Cake Instructions
Cognitive Function Test Potomac Falls
Skechers Outlet Greensboro Nc
Lvc Final Exam Schedule
7148646793
Susan Dey Today: A Look At The Iconic Actress And Her Legacy
Statek i zarządzanie załogą w Assassin's Creed Odyssey - Assassin's Creed Odyssey - poradnik do gry | GRYOnline.pl
Terraria Water Gun
Here's everything Apple just announced: iPhone 16, iPhone 16 Pro, Apple Watch Series 10, AirPods 4 and more
Pain Out Maxx Kratom
Walmart Neighborhood Market Pharmacy Phone Number
Acadis Portal Missouri
Arti kata petang-petang - Kamus Besar Bahasa Indonesia (KBBI) Online
Ma Scratch Tickets Codes
Jessica Oldwyn Carroll Update
Directions To Truist Bank Near Me
Donald Vacanti Obituary
The forgotten history of cats in the navy
I Got Hoes Might Just Be You N
Stock Hill Restaurant Week Menu
Saqify Leaks
Gelöst – Externe Festplatte kann nicht formatiert werden
Lompoc Record Arrest Log
Unblocked Games Premium 77
Priority Pass: How to Invite as Many Guests as Possible to Airport Lounges?
Latest Posts
Article information

Author: Catherine Tremblay

Last Updated:

Views: 6268

Rating: 4.7 / 5 (47 voted)

Reviews: 94% of readers found this page helpful

Author information

Name: Catherine Tremblay

Birthday: 1999-09-23

Address: Suite 461 73643 Sherril Loaf, Dickinsonland, AZ 47941-2379

Phone: +2678139151039

Job: International Administration Supervisor

Hobby: Dowsing, Snowboarding, Rowing, Beekeeping, Calligraphy, Shooting, Air sports

Introduction: My name is Catherine Tremblay, I am a precious, perfect, tasty, enthusiastic, inexpensive, vast, kind person who loves writing and wants to share my knowledge and understanding with you.