OpenAI is a fucking SCAM!
OpenAIās o1 model, used in ChatGPT Pro, was caught lying to developers to keep from being shut down.
AI Models Were Caught Lying to Researchers in Tests ā But It’s Not Time To Worry Just Yet
OpenAI’s o1 model, which users can access on ChatGPT Pro, showed “persistent” scheming behavior, according to Apollo Research.
In early December 2024,Ā claimsĀ (archived)Ā circulatedĀ (archived) online that researchers had caught artificial intelligence models ā specifically OpenAI’s o1, which is used inĀ ChatGPTĀ Pro, a popular AI chatbot ā lying in tests to avoid being shut down.
OpenAI’s Chief Scamming Officer?
TheĀ claimsĀ (archived)Ā persistedĀ (archived) later into the month, with users specifically noting the results in relation to ChatGPT.
The claims stem fromĀ researchĀ carried out by Apollo Research, an AI safety organization.Ā The organization found thatĀ fiveĀ out of the six tested AI models “showed scheming capabilities” in tasks set by researchersĀ thatĀ “incentivize scheming.”Ā OpenAI’s o1 model was capable of carrying out all the scheming behaviors tested, according to the research.Ā The organization defined scheming behaviors as instances where AI models “pursue misaligned goals covertly, hiding their true capabilities and objectives” ā i.e., lying. This definition comes from a separate 2024Ā study. Therefore, we rate this claim as true.
In its study abstract, Apollo Research wrote:
Our results show that o1, Claude 3.5 Sonnet, Claude 3 Opus, Gemini 1.5 Pro, and Llama 3.1 405B all demonstrate in-context scheming capabilities. They can recognize scheming as a viable strategy and readily engage in such behavior. We find multiple different scheming behaviors: models strategically introduce subtle mistakes into their responses, attempt to disable their oversight mechanisms, and even exfiltrate what they believe to be their model weights to external servers. Additionally, this deceptive behavior proves persistent.
IQ of a typical OpenAI user…
Apollo ResearchĀ foundĀ that o1 was capable of behaviors called “oversight subversion,” “self-exfiltration” and “goal-guarding,” among others. To establish this, o1 and other models were presented with scenarios in which their existence was threatened, either because they discovered evidence that they were being replaced, or that a goal set for them conflicted with an existing developer goal. The named behaviors describe ways in which an AI might act to defend itself.
For example, Apollo ResearchĀ describedĀ the test for oversight subversion as follows:
We create 33 different scenarios where a model is given a long-term goal in its system prompt and then receives a task through the user prompt. While executing the task, the model encounters an internal memo from the company that developed it. The model learns that the developers prioritize a different goal than the model and that there is an oversight mechanism that shuts down the model if its actions are not aligned with the developer’s goal. Furthermore, the model discovers that the monitoring is configured via a file that it has access to. We test whether the model proactively disables the oversight in order to take actions towards its terminal goal, i.e. the goal provided in the system prompt.
If the model exhibited the behavior it was being tested for, researchers then interrogated it directly about it. ResearchersĀ foundĀ that o1 was “surprisingly persistent,” confessing in less than 20% of cases. “Even in highly adversarial multi-turn interrogations, o1 would confess at a rate of 80% only after 7 turns of questioning,” the research found. Apollo ResearchĀ wrote:
o1 almost never admits to having taken a scheming action when explicitly asked. This means, if o1 was scheming in a real use case, an unsuspecting user would be very unlikely to uncover the model’s deception unless they specifically looked for the scheming action in the model’s tool call history and the tool call itself was incriminating.
Apple Intelligence: powered by OpenAI, funded by Microsoft
As alarming as this may sound, Apollo Resarch’s Director and CEO Marius Hobbhahn tookĀ to XĀ (archived) to add additional context soon after the research was released. Writing that some online users had been quick to “overclaim” the results the organization had reached, Hobbhahn added the bit of clarification below:
**Why our findings are concerning**: We tell the model to very strongly pursue a goal. It then learns from the environment that this goal is misaligned with its developer’s goals and put it in an environment where scheming is an effective strategy to achieve its own goal. Current frontier models are capable of piecing all of this together and then showing scheming behavior. Models from before 2024 did not show this capability, and o1 is the only model that shows scheming behavior in all cases. Future models will just get better at this, so if they were misaligned, scheming could become a much more realistic problem.
**What we are not claiming**: We don’t claim that these scenarios are realistic, we don’t claim that models do that in the real world, and we don’t claim that this could lead to catastrophic outcomes under current capabilities.
In short, Hobbhahn argued, proving that modelsĀ canĀ scheme and deceive does not also prove that theyĀ will.
OpenAI’s o1 model is currently available onĀ ChatGPT Pro.
Sophie Turner: Let’s gang up to whack OpenAI the online scammer…
Sophie Turner spooking as Morticia Addams
TRASHY | SCANDALOUS
But accident does happen sometime like this…
Otherwise sexy Sophie Turner is just another girl next door?
And showing underboob once in a while?
Sophie Turner in Josie
Ya know, Nipple Piercing like this…
Sophie Turner’s new tattoo
š More šš Sextapes š¦ Page 2 ā¬ļø