Browsing by Author "Ahmed, Mohamed"
Now showing 1 - 1 of 1
- Results Per Page
- Sort Options
Item Open Access Leveraging large language models for latent intention recognition and next action prediction(2024) Ahmed, MohamedAutonomous agents that operate within graphical user interfaces (GUIs) have a significant potential to improve user experience. To achieve this, such agents must be customized and proactive. Understanding user intentions through their interactions and engagements with GUIs enables these agents to better fulfill user needs. This work introduces a novel LLM-based framework, Mistral-Intention, that accurately recognizes latent user intentions from their interactions. A key innovation is the integration of a sub-goal generation step, using prompt engineering to decompose user tasks into actionable steps, enhancing the model's interpretative capabilities and extendability. Furthermore, the incorporation of a keyword extraction-based loss significantly refines the model's focus on critical information of user actions such as typed values, ensuring comprehensive and relevant intention recognition. We evaluate Mistral-Intention using a range of metrics, including manual metrics and automatic methods based on GPT-4o, against a modified version of the state-of-the-art task automation framework, namely SYNAPSE. Results from extensive testing on the MIND2WEB and MoTIF datasets highlight Mistral-Intention's superior performance in intention recognition across various GUI environments. Furthermore, we implement an LLM-based computer agent capable of predicting the user's next action. We have addressed the challenges faced while developing such agents, such as the limited context window, and understanding the current GUI environment. Our LLM-based agent exhibits an improvement of 15.30% in the element accuracy and 13.20% in operation F1 over the previous state-of-the-art method in MindAct on MIND2WEB. Our work not only pushes the boundaries of computational HCI but also opens new pathways for developing more intuitive and effective user-center interaction solutions.