Publications
2022
-
WebGPT: Browser-assisted question-answering with human feedback
-
Training Verifiers to Solve Math Word Problems
-
Recursively Summarizing Books with Human Feedback
-
Evaluating Large Language Models Trained on Code
-
Process for Adapting Language Models to Society (PALMS) with Values-Targeted Datasets
-
Multimodal Neurons in Artificial Neural Networks
-
Learning Transferable Visual Models From Natural Language Supervision
-
Zero-Shot Text-to-Image Generation
-
Understanding the Capabilities, Limitations, and Societal Impact of Large Language Models
2021
-
Generative Language Modeling for Automated Theorem Proving
-
Learning to Summarize from Human Feedback
-
Generative Pretraining from Pixels
-
Language Models are Few-Shot Learners
-
Measuring the Algorithmic Efficiency of Neural Networks
-
Jukebox: A Generative Model for Music
-
Toward Trustworthy AI Development: Mechanisms for Supporting Verifiable Claims
-
Scaling Laws for Neural Language Models
2020
-
Deep Double Descent: Where Bigger Models and More Data Hurt
-
Leveraging Procedural Generation to Benchmark Reinforcement Learning
-
Benchmarking Safe Exploration in Deep Reinforcement Learning
-
Release Strategies and the Social Impacts of Language Models
-
Solving Rubik's Cube with a Robot Hand
-
Fine-Tuning Language Models from Human Preferences
-
Emergent Tool Use From Multi-Agent Autocurricula
-
Testing Robustness Against Unforeseen Adversaries
-
The Role of Cooperation in Responsible AI Development
-
SGD on Neural Networks Learns Functions of Increasing Complexity
-
Transfer of Adversarial Robustness Between Perturbation Types
-
Generating Long Sequences with Sparse Transformers
-
Implicit Generation and Generalization in Energy-Based Models
-
Neural MMO: A Massively Multiagent Game Environment for Training and Evaluating Intelligent Agents
-
Language Models are Unsupervised Multitask Learners
-
Computational Limitations in Robust Classification and Win-Win Results
2019
-
An Empirical Model of Large-Batch Training
-
Quantifying Generalization in Reinforcement Learning
-
Concept Learning with Energy-Based Models
-
Plan Online, Learn Offline: Efficient Learning and Exploration via Model-Based Control
-
Exploration by Random Network Distillation
-
Supervising Strong Learners by Amplifying Weak Experts
-
FFJORD: Free-Form Continuous Dynamics for Scalable Reversible Generative Models
-
Domain Randomization and Generative Models for Robotic Grasping
-
Constant Arboricity Spectral Sparsifiers
-
Large-Scale Study of Curiosity-Driven Learning
-
Learning Dexterous In-Hand Manipulation
-
Learning Policy Representations in Multiagent Systems
-
Variational Option Discovery Algorithms
-
Learning with Opponent-Learning Awareness
-
Glow: Generative Flow with Invertible 1x1 Convolutions
-
GamePad: A Learning Environment for Theorem Proving
-
AI Safety via Debate
-
Emergence of Grounded Compositional Language in Multi-Agent Populations
-
Gotta Learn Fast: A New Benchmark for Generalization in RL
-
On First-Order Meta-Learning Algorithms
-
Variance Reduction for Policy Gradient with Action-Dependent Factorized Baselines
-
Improving GANs Using Optimal Transport
-
Reptile: a Scalable Metalearning Algorithm
-
Sim-to-real Transfer of Robotic Control with Dynamics Randomization
-
Some Considerations on Learning to Explore via Meta-Reinforcement Learning
-
Multi-Goal Reinforcement Learning: Challenging Robotics Environments and Request for Research
-
Backpropagation through the Void: Optimizing Control Variates for Black-Box Gradient Estimation
-
The Malicious Use of Artificial Intelligence: Forecasting, Prevention, and Mitigation
-
Evolved Policy Gradients
-
DeepType: Multilingual Entity Linking by Neural Type System Evolution
2018
-
Learning Sparse Neural Networks through L0 Regularization
-
Interpretable and Pedagogical Examples
-
Meta Learning Shared Hierarchies
-
Domain Randomization and Generative Models for Robotic Grasping
-
Asymmetric Actor Critic for Image-Based Robot Learning
-
Emergent Complexity via Multi-Agent Competition
-
Continuous Adaptation via Meta-Learning in Nonstationary and Competitive Environments
-
Learning with Opponent-Learning Awareness
-
Proximal Policy Optimization Algorithms
-
Hindsight Experience Replay
-
Teacher-Student Curriculum Learning
-
Deep reinforcement learning from human preferences
-
Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments
-
Parameter Space Noise for Exploration
-
UCB Exploration via Q-Ensembles
-
Equivalence Between Policy Gradients and Soft Q-Learning
-
Stochastic Neural Networks for Hierarchical Reinforcement Learning
-
Learning to Generate Reviews and Discovering Sentiment
-
One-shot Imitation Learning
-
Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World
-
Emergence of Grounded Compositional Language in Multi-Agent Populations
-
Prediction and Control with Temporal Segment Models
-
Evolution Strategies as a Scalable Alternative to Reinforcement Learning
-
Third Person Imitation Learning
-
Adversarial Attacks on Neural Network Policies
-
PixelCNN++: Improving the PixelCNN with Discretized Logistic Mixture Likelihood and Other Modifications
2017
-
Universe
-
#Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning
-
On the Quantitative Analysis of Decoder-Based Generative Models
-
A Connection between Generative Adversarial Networks, Inverse Reinforcement Learning, and Energy-Based Models
-
RL2: Fast Reinforcement Learning via Slow Reinforcement Learning
-
Variational Lossy Autoencoder
-
Adversarial Training Methods for Semi-Supervised Text Classification
-
Extensions and Limitations of the Neural GPU
-
Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data
-
Transfer from Simulation to Real World through Learning Deep Inverse Dynamics Model
-
Infrastructure for Deep Learning
-
Concrete Problems in AI Safety
-
Improving Variational Inference with Inverse Autoregressive Flow
-
InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets
-
Improved Techniques for Training GANS
-
OpenAI Gym
-
Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks
-
VIME: Variational Information Maximizing Exploration
2016