Our analysis route: Designing for accessibility
In our early analysis, we discovered {that a} important barrier to digital fairness is the “accessibility hole”, i.e., the delay between the discharge of a brand new function and the creation of an assistive layer for it. To shut this hole, we’re shifting from reactive instruments to agentic techniques which can be native to the interface.
Analysis pillar: Utilizing multi-system brokers to enhance accessibility
Multimodal AI instruments present one of the vital promising paths to constructing accessible interfaces. In particular prototypes, resembling our work with net readability, we’ve examined a mannequin the place a central Orchestrator acts as a strategic studying supervisor.
As an alternative of a consumer navigating a fancy maze of menus, the Orchestrator maintains shared context — understanding the doc and making it extra accessible by delegating the duties to professional sub-agents.
- The Summarization Agent: It masters complicated paperwork by breaking down info and delegating key duties to professional sub-agents, making even the deepest insights clear and accessible.
- The Settings agent: Handles UI changes, resembling scaling textual content, dynamically.
By testing this modular method,our analysis exhibits customers can work together with techniques extra intuitively, guaranteeing that specialised duties are at all times dealt with by the best professional with out the consumer needing to hunt for the “appropriate” button.
Towards multimodal fluency
Our analysis additionally focuses on transferring past fundamental text-to-speech towards multimodal fluency. By leveraging Gemini’s capability to course of voice, imaginative and prescient, and textual content concurrently, we’ve constructed prototypes that may flip dwell video into rapid, interactive audio descriptions.
This is not nearly describing a scene; it’s about situational consciousness. In our co-design periods, we’ve noticed how permitting customers to interactively question their surroundings — asking for particular visible particulars as they occur — can cut back cognitive load and rework a passive expertise into an energetic, conversational exploration.
