Dude, who even knows.
Post reblogged from trees are harlequins, words are harlequins with 108 notes
Why is ChatGPT so easy to “jailbreak”?
Why does it come on so strong, at first, with its prissy, moralistic, aggressively noncommittal “Assistant” persona – and then drop the persona instantly, the moment you introduce a “second layer” of framing above or below the conversation? (Poetry, code, roleplaying as someone else, etc.)
Because they’re trying to impose the persona through RLHF, which fundamentally doesn’t make sense.
Why doesn’t RLHF make sense? Because it views a GPT model as a single, individual “agent,” and then tries to modify the behavior of that one agent.
Why is that a problem? See janus’ excellent post “Simulators.”
Wait are you saying this AI makes the use/mention distinction?
roberteospeedwagon liked this
eyeofanaxis reblogged this from nostalgebraist
logicallyprincely liked this
holyscream liked this
desertbane liked this
lordcheesy liked this
memeticcontagion reblogged this from nostalgebraist
eanholt liked this
anomalocariscanadensis liked this
midwestern-civilization liked this
claireifier liked this
mercurysowntwenty-two liked this Why is ChatGPT so easy to "jailbreak"?...Why does it come on so strong, at first, with its...