AI Agent News Today

Friday, January 23, 2026

AI Agents Still Struggle With Real Work Tasks

A major new benchmark called Apex-Agents tested leading AI models on actual white-collar jobs in banking, consulting, and law. Results are sobering: the best performer Google's Gemini 3 Flash only achieved 24% success rate. The core problem? AI agents can't handle information scattered across multiple tools like Slack and Google Drive the way humans do. This means workplace automation is moving slower than predicted.

Enterprise Leaders Prioritize Safety Over Speed

A Dynatrace report surveying 919 senior leaders reveals why: 52% cite security and compliance concerns as the main barrier to deploying AI agents. Rather than rushing to automate, 69% of organizations still have humans verify AI decisions. The takeaway—reliability and governance matter more than raw capability right now.

New Testing Tool Makes AI More Trustworthy

Researchers released Detect, a framework that systematically tests deep learning models by manipulating features in their latent space. Unlike standard accuracy tests, it reveals hidden bugs and vulnerabilities, helping teams understand exactly how AI systems make decisions. This tool is crucial as enterprises scale agents responsibly.

Bottom Line: Don't assume AI agents are workplace-ready. Focus first on governance, testing, and human oversight before deploying at scale.

More News