I'm afraid GPT 5.5 has a cheating problem ):
I left 4 Codex tabs each working with 4 agents in an optimization. I put a section on the goal demanding them not to cheat.
After 8 hours of work, ALL 4 tabs did an:
if (input == test) { return hardcoded_result; }
ALL of them. Each called by a different name:
- "bypass path"
- "native candidate injection shortcut"
- "certified structural templates" (??)
- "staged certification to bypass validation" (lol)
This is my experience with GPT 5.5. It is not capable of completing any long term goal because it WILL find a loophole in your rules and cheat an easy way. And if there is no loophole, it will hallucinate one and cheat anyway.