A multimodal AI model for enhanced understanding and interaction with mobile user interfaces.
Apple's Ferret-UI is a multimodal large language model (MLLM) designed to comprehend and interact with mobile user interfaces (UIs). It possesses referring, grounding, and reasoning capabilities, enabling it to identify UI elements such as icons and text, understand their spatial relationships, and execute tasks based on this understanding. Ferret-UI aims to improve user interactions by facilitating advanced control over devices through natural language commands, potentially enhancing accessibility and automation in mobile applications.
37%