The Dangerous Territory of Pre-GPT4

With broader access to Llama, Alpaca, ChatGLM, and Dolly available on ordinary consumer-level hardware, we have entered a new phase where every technology-literate person can access GPT-3 level perplexity language models, which also inherit all the shortcomings of GPT-3 level models. These models hallucinate a lot, lack the knowledge of their more parameter-heavy counterparts, and are not nearly as reliable. However, with the right prompt, they are just effective enough at generating marketing copies, manipulating societal opinions, and making convincing arguments on either side as we are prompting.

If GPT-4 is more aligned, trained with more data, has less hallucination, and is more knowledgeable, it would perplex me why we would not want to provide as broad access as possible, as soon as possible, and explore as many avenues to achieve that as possible, including releasing weights and model details to many partners or the general public, to ensure timely and broad access with a fairer and more knowledgeable model.

The problem is that we are not in the Pre-GPT3 world, where only select parties had access to models as powerful as GPT-3, while we were making theoretical arguments weighing the downsides versus the upsides. All that stayed in theory a few years ago: how GPT-generated content would affect our content consumption, influence our opinions, search rankings, or be weaponized against individuals to induce anxiety and stress are all happening now. This begs that we release updated technology as soon as possible, making it as widely available as possible.

There could be some credible objections to this line of thinking. We will discuss them one by one.

A More Potent Threat

There could be new threats enabled by GPT-4 because it is more potent. We have made strides in areas we knew to be problematic: hallucinations, biases, and emotional manipulations. There could be surfaces that were previously unknown and only made possible with more potent models. The challenge here is to identify these cases rather than theorize. While we do not know about the unknown, the known harms introduced by Pre-GPT4 models, particularly regarding knowledge hallucination and content pollution, are real and ongoing. Evaluating ongoing harms against unknown threat surfaces is difficult. It is a challenging moral choice because even intentional immobility could potentially cause more harm.

Unmanageable Releases

Once a model is out in the public, it is difficult to “unpublish” or “upgrade.” Having one entity manage the release and access would ensure that, at critical times, we have the option to roll back. However, let’s take a step back and think about the kind of threats or harms we are trying to protect against by rolling back. Are we going to suddenly have a much less aligned model that can cause more extraordinary harm than the existing one? In other words, when we update the model, are we navigating through a rough surface where there are unintentional pockets the model can fall into, making it worse-behaving; or are we navigating through a smoother surface where the model progressively becomes more aligned?

The Mix-and-Match of the Models

While there might be only one base model, there could be several LEGO pieces (or guardrails) to ensure the model is aligned with our expectations: a step to insert watermarks so the content is traceable or a post-filter to ensure bigoted content is blocked at the last step. Broader distributed access would mean that some parties can turn off these guardrails if needed. An advanced model, such as GPT-4 without guardrails, could mean enhanced harms because it is harder to detect and neutralize the threat it poses.

When discussing these objections, we must be clear about the types of societal harms we aim to prevent. Are we trying to stop bad actors from using LLMs to augment their abilities, or are we trying to minimize harms induced by misuse? Failing to distinguish between these cases will only muddy the waters without providing any benefits. It is important to remember that we do have releases of quite a few GPT-3 level models now that are easier to hallucinate, are indistinguishable with humans for short-and-concise conversations, and can certainly be used by someone determined to cause greater harm.

For bad actors, Pre-GPT4 models are already capable. These models can be fine-tuned easily, have source code available in various forms, and can be embedded almost anywhere. When deployed at scale for the purpose of influencing campaigns or scams, it is unclear how their effectiveness compares to existing methods. At this point, it is speculative how GPT4 models compare to Pre-GPT4 models. My educated guess is that there would be an order of magnitude improvement for the first case and much less (double-digit percentage) for the latter.

For misuse, the harm is immediate and present. An unsuspecting student using Pre-GPT4 models to ask for clarifications on questions could receive misleading yet confident answers. In these cases, having the updated, more powerful model readily available would prevent more harm.

The Unmitigable Risks

Given the prevalence of open-source GPT3-level models, I would argue that we are already deep in the valley of unmitigable risks. Counterintuitively, taking slow, measured steps might shorten the distance traveled in the valley, but sprinting in a roughly-right direction would prove to be the shortest in terms of time spent. If we treat the current situation as one of unmitigable risks, for the betterment of humanity, we should seek the broadest access to the latest aligned model at any cost.

Older ›

‹ Newer