GPT-3 is a neural network trained by the OpenAI organization with significantly more parameters than previous generation models.
There are several variations of GPT-3, which range from 125 to 175 billion parameters. The different variations allow the model to better respond to different types of input, such as a question & answer format, long-form writing, human language translations (e.g. English to French). The large numbers of parameters make GPT-3 significantly better at natural language processing and text generation than the prior model, GPT-2, which only had 1.5 billion parameters.
GPT-3 can only currently be access by an API provided by OpenAI, which is in private beta.
The GPT-3 model can generate texts of up to 50,000 characters, with no supervision. It can even generate creative Shakespearean-style fiction stories in addition to fact-based writing. This is the first time that a neural network model has been able to generate texts at an acceptable quality that makes it difficult, if not impossible, for a typical person to whether the output was written by a human or GPT-3.
To generate output, GPT-3 has a very large vocabulary, which it can combine to generate sentences. These words are sorted into different categories (nouns, verbs, adjectives, etc.), and for each category, there is a “production rule”, which can be used to generate a sentence. The production rules can be modified with different parameters.
A few examples:
In addition, GPT-3 is able to understand negations, as well as the use of tenses, which allows the model to generate sentences in the past, present and future.
GPT-3 is not that useful right now for programmers other than as an experiment. If you get access to OpenAI's API then Python is an easy language to use for interacting with it and you could use its text generation as inputs into your applications. Although there have been some initial impressive experiments in generating code for the layout of the Google homepage, JSX output, and other technical demos, the model will otherwise not (yet) put any developers out of a job who are coding real-world applications.
At a high level, training the GPT-3 neural network consists of two steps.
The first step requires creating the vocabulary, the different categories and the production rules. This is done by feeding GPT-3 with books. For each word, the model must predict the category to which the word belongs, and then, a production rule must be created.
The second step consists of creating a vocabulary and production rules for each category. This is done by feeding the model with sentences. For each sentence, the model must predict the category to which each word belongs, and then, a production rule must be created.
The result of the training is a vocabulary, and production rules for each category.
The model also has a few tricks that allow it to improve its ability to generate texts. For example, it is able to guess the beginning of a word by observing the context of the word. It can also predict the next word by looking at the last word of a sentence. It is also able to predict the length of a sentence.
While those two steps and the related tricks may sound simple in theory, in practice they require massive amounts of computation. Training 175 billion parameters in mid-2020 cost in the ballpark of $4.6 million dollars, although some other estimates calculated it could take up to $12 million depending on how the hardware was provisioned.
These resources range from broad philosophy of what GPT-3 means for machine learning to specific technical details for how the model is trained.
The Ultimate Guide to OpenAI's GPT-3 Language Model is a detailed tutorial on how to use OpenAI's playground user interface, what the parameters do, and how to convert what you have done in the playground into a Python script that calls their API.
OpenAI's GPT-3 Language Model: A Technical Overview and GPT-3: A Hitchhiker's Guide are two long-format guides that analyze how GPT-3's technical specifications fit in the larger machine learning ecosystem, quotes by researchers on its usage, and some initial resources to get a better understanding of what this model is capable of performing.
What Is GPT-3: How It Works and Why You Should Care presents a high-level accessible overview of GPT-3, how it compares to other language models, and resources to learn more.
How GPT3 Works - Visualizations and Animations contains some wonderful animated visuals to show how the model is trained and what happens in various scenarios such as text output and code generation.
GPT 3 Demo and Explanation is a video that gives a brief overview of GPT-3 and shows a bunch of live demos for what has so far been created with this technology.
Tempering expectations for GPT-3 points out that many of the good examples on social media have been cherry picked to impress readers.
Why GPT-3 matters compares and contrasts this model with similar models that have been developed and tries to give an overview of where each one stands with its strengths and weaknesses.
Building a Chatbot with OpenAI's GPT-3 engine, Twilio SMS and Python is a step-by-step tutorial for using GPT-3 as a smart backend for an SMS-based chatbot powered by the Twilio API.
Automating my job by using GPT-3 to generate database-ready SQL to answer business questions walks through how the author created a bridge to translate between plain English-language questions and relational database SQL. The post provides both a story for why someone would want to use GPT-3 for this purpose and incremental steps for how the author started and figured out how to make it better. In the end it does not quite work in all scenarios but the proof of concept is impressive and the story is a fun read.
gpt-3-experiments contains Python code open sourced under the MIT license that shows how to interact with the API.
Twilio put out a series of fun GPT-3 tutorials that show the range of creative outputs the model can generate: