Why an AI Keeps Rechecking the Door Lock

Anchor: Door-Lock Rechecking — across independent runs, an LLM agent spontaneously re-submitted the same critical artifact, without being told to.

In a multi-agent system, we observed a behavior nobody explicitly taught:

After completing a critical artifact, the agent called the submission tool again and re-submitted the same file. The content didn’t change. The hash was identical. Nothing in the prompt asked for a second submission.

After you leave home, have you ever stopped, turned around, and rechecked the door lock?

This isn’t about memory.

54 Multi-Agent Runs Under One Fixed Protocol: Behavioral Fingerprints and a Coordination Blind Spot

Engineering note: this post is based on limited samples and is focused on detectable failure modes, not benchmark rankings.

Different LLMs don’t just perform differently. Under the same multi-agent protocol constraints, they collaborate differently.

We ran 54 multi-agent sessions across 4 provider configurations under identical protocol constraints. The behavioral fingerprints were reproducible and revealed that protocol compliance (emitting the right signals in the right order) is a major differentiator in agentic settings.

Why 'All Verifications Passed' Still Breaks at Integration in Multi-Agent Systems

In our multi-agent collaboration system, every upstream artifact is verified by downstream agents. The event log looked perfect: every VERIFY was a pass. We ran the process three times and it converged cleanly.

Then we stitched the artifacts from six agents together and ran the game.

It crashed immediately. The log had no obvious anomalies.

TL;DR#

  • The verification protocol checked interfaces and local logic, not data contracts across producers/consumers.
  • The event model carried zero information about return shapes/types, so no amount of offline analysis could infer contract mismatches.
  • Adding a small, explicit data contract table (declared by the task designer) and writing contract_check=pass|fail|not_checked into the event stream made this class of failures visible before freeze.