Summarizing Books with Human Feedback

Scaling human oversight of AI systems for tasks that are difficult to evaluate.

3 minute read
Read paperBrowse samples

To safely deploy powerful, general-purpose artificial intelligence in the future, we need to ensure that machine learning models act in accordance with human intentions. This challenge has become known as the alignment problem.

A scalable solution to the alignment problem needs to work on tasks where model outputs are difficult or time-consuming for humans to evaluate. To test scalable alignment techniques, we trained a model to summarize entire books, as shown in the following samples.[1] Our model works by first summarizing small sections of a book, then summarizing those summaries into a higher-level summary, and so on.

The original text is divided into sections, and each section is summarized.

Section summaries are summarized again into higher-level summaries.

The summarizing process continues until a complete summary is achieved.

The original text is divided into sections, and each section is summarized.

Section summaries are summarized again into higher-level summaries.

The summarizing process continues until a complete summary is achieved.

The original text is divided into sections, and each section is summarized.

Section summaries are summarized again into higher-level summaries.

The summarizing process continues until a complete summary is achieved.

The original text is divided into sections, and each section is summarized.

Section summaries are summarized again into higher-level summaries.

The summarizing process continues until a complete summary is achieved.