Backwards Planning from Onward Task Demonstrations via Vision-Language Models
In this paper, we propose a novel method of backward planning using visual-language models (VLM). Previous work on backward planning applied traditional methods that ignore the semantic meaning of manipulation tasks. Our proposed framework utilizes VLMs’ semantic understanding and physical reasoning capabilities to infer backward plans by analyzing onward task executions. Our method explores the barebone usage of these models and provides a comprehensive ablation study comparing the planning capabilities of common closed-source VLMs. We demonstrate that our system reaches an 80% success rate in two robotic manipulation tasks. We also observe that several state-of-the-art VLMs struggle significantly in visual understanding. This limitation still necessitates external embodiments for robust execution. However, the observed planning capabilities suggest that effective backward planning may not require highly complex architectures.
Recommended citation: Gamsız, A. F., Akkoç, D. B., Yıldırım, Y., & Uğur, E. (2025). Backwards planning from onward task demonstrations via vision-language models [Manuscript submitted for review].
Download Paper