LLMs for AI planning : a study on error detection and correction in PDDL domain models

Thumbnail Image

Date

2024

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Domain modeling is fundamental to Artificial Intelligence (AI) planning, since it facilitates the characterization and organization of intricate planning challenges. A significant difficulty in domain modeling is to guarantee that these models are accurate and comprehensive, encompassing not just syntactic regulations but also complex semantic inter dependencies. Deficiencies in domain models - such as duplicate effects, unmet preconditions, or absent dependencies - can significantly affect planning results, highlighting the necessity for effective error detection and repair systems to improve the efficacy of AI planning. This research examines the capability of Large Language Models (LLMs) to enhance the precision and resilience of domain models by detecting and rectifying flaws intrinsic to these planning frameworks. Due to the intricacy of domain models in AI planning, LLMs present a viable method for automating error detection and refinement, tackling particular challenges such as complementary effects, invalid parameters, and immutable predicates, which are essential for precise planning domain specifications. The research entailed systematic evaluation and progressive refinement of Planning Domain Definition Language (PDDL) models utilizing GPT-4 to analyze the efficacy of LLMs in diminishing syntactic and semantic error rates, therefore enhancing the reliability and quality of PDDL Domain models in AI planning. The results indicate that GPT-4 exhibits some ability to identify and rectify certain mistake types; nevertheless, its performance is not consistently dependable. Semantic mistakes, particularly those involving intricate logical dependencies and complicated interrelations among model components, continue to pose significant challenges for LLMs to address successfully. The LLM-based technique has potential in addressing minor faults but encounters difficulties with complex domain-specific problems, resulting in only marginal enhancements in model accuracy. These findings suggest that, despite their capabilities, LLMs now do not achieve complete automation of PDDL domain model refinement and need considerable manual supervision. Although LLMs offer a potential foundation for improving PDDL models, their existing capabilities are constrained in tackling intricate semantic problems that need sophisticated reasoning. This work emphasizes the accomplishments of LLMs in mistake identification and repair, while also pinpointing areas for future research to further their application in AI planning.

Description

Keywords

Citation

Endorsement

Review

Supplemented By

Referenced By