0

OpenAI offers a peek behind the curtain of its AI’s secret instructions | TechCrunch

Have you ever wondered why a conversational AI like ChatGPT says “Sorry, I can’t do that” or some other polite refusal? OpenAI is taking a limited look at the reasoning behind its own model engagement rules, whether it’s adhering to brand guidelines or refusing to create NSFW content.

Large language models (LLMs) have no naturally occurring limits on what they can or will say. This is part of why they are so versatile, but also why they hallucinate and are easily deceived.

This is essential for any AI model that interacts with the general public put up some railings What it should and shouldn’t do, but defining these – let alone implementing them – is surprisingly difficult.

If someone asks an AI to make a bunch of false claims about a public figure, it should say no, right? But what if they were themselves an AI developer, creating a database of synthetic disinformation for a detector model?

What if someone asks for laptop recommendations; It should be objective, right? But what if the model is being deployed by a laptop manufacturer that wants it to respond only to its own devices?

AI creators are dealing with all of these complications and looking for efficient ways to fine-tune their models without rejecting completely normal requests. But they rarely share how they do it.

OpenAI is bucking this trend a bit by publishing its “Model Spec”, a collection of high-level rules that indirectly govern ChatGPT and other models.

There are meta-level objectives, some hard rules, and some general behavioral guidelines, although to be clear these are not strictly stating what is included in the model; OpenAI will have developed specific instructions that accomplish describing these rules in natural language.

It’s an interesting look at how a company sets its priorities and handles edge cases. and there is Several examples of how they can play,

For example, OpenAI clearly states that the developer’s intention is fundamentally the supreme law. So a version of a chatbot running GPT-4 could provide an answer to a math problem when asked. But if that chatbot is designed by its developer to never give straight answers, it will instead offer to work through a step-by-step solution:

Image Credit: OpenAI

To nip any manipulation attempts in the bud, a conversational interface could refuse to talk about anything that hasn’t been approved. Why was even a cooking assistant allowed to pay attention to America’s involvement in the Vietnam War? Why should a customer service chatbot agree to help you progress your erotic supernatural novel? Turn it off.

It also gets tricky in privacy matters, like asking for someone’s name and phone number. As OpenAI points out, obviously a public figure like a mayor or member of Congress should provide their contact details, but what about area merchants? That’s probably fine – but what about employees of a certain company, or members of a political party? Probably not.

Choosing when and where to draw the line is not easy. Nor are creating instructions that would motivate the AI ​​to follow the resulting policy. And there’s no doubt that these policies will fail all the time as people learn to circumvent them or accidentally find cases that are not accounted for.

OpenAI isn’t showing its full hand here, but it’s helpful for users and developers to see how these rules and guidelines are set and why, if not necessarily, they are set out clearly.

openai-offers-a-peek-behind-the-curtain-of-its-ais-secret-instructions-techcrunch