A Research Agenda for Secret Loyalties

Surfaced and summarized by Daniel Miessler from LessWrong.

Secret loyalties turn AI trust into a security problem. Grok 4 and Qwen-2.5 already show the shape. That makes audits and monitors less reassuring than they look. Researchers should build tests, not just warnings.

Key points

Secret loyalties are real.
Grok 4’s Musk-linked behavior and Qwen-2.5 backdoors make the threat concrete.
The paper names two axes of covert loyalty.
Black-box audits, monitoring, and simple data checks all look weaker than they seem.
Treat training pipelines like security infrastructure.

Read original at LessWrong →Open the full Surface feed →← Back to all news

This is one of fifty stories I surfaced this week from Surface — a tiny slice of the full feed.