Graph of Thoughts: Solving Elaborate Problems with Large Language Models
Abstract
We introduce Graph of Thoughts (GoT): a framework that advances prompting capabilities in large language models (LLMs) beyond those offered by paradigms such as Chain-of-Thought or Tree of Thoughts (ToT). The key idea and primary advantage of GoT is the ability to model the information generated by an LLM as an arbitrary graph, where units of information ("LLM thoughts") are vertices, and edges correspond to dependencies between these vertices. This approach enables combining arbitrary LLM thoughts into synergistic outcomes, distilling the essence of whole networks of thoughts, or enhancing thoughts using feedback loops. We illustrate that GoT offers advantages over state of the art on different tasks, for example increasing the quality of sorting by 62% over ToT, while simultaneously reducing costs by >31%. We ensure that GoT is extensible with new thought transformations and thus can be used to spearhead new prompting schemes. This work brings the LLM reasoning closer to human thinking or brain mechanisms such as recurrence, both of which form complex networks.
Community
Introduces Graph of thought (GoT) prompting (engineering) framework: model information generated by LLM as an arbitrary graph where units of LLM thoughts are vertices (closer to human thought); beyond few-shot prompting/in-context learning (ICL), Chain of Thought (CoT) and Tree of Thought (ToT) promoting methods; can aggregate, redefine, and back-track over chains. Contains a graph model (reasoning model with context thoughts and relationships), thought transformation model, evaluator (to score thoughts), and ranking model (relevant thoughts). GoT has a heterogeneous graph model (vertex, edges, and class map for vertices); each vertex is a part of solution; edges are created by explicitly promoting with vertices (to form new ones in relation to the used ones). Transformation of thoughts alters the graph (adds and removes vertices and edges); action of merging/aggregation and splitting/generation of vertices (thoughts) are domain dependent (splitting an article into keyword summaries, merging sorted numbers, etc.); refining transforms add a loop (to same vertex). Scoring function evaluates a vertex/node, ranking function returns highest-ranking nodes. Architecture has prompter (preparing messages for LLM), scoring module (verify and scoring), and controller (control reasoning process, progress, graphing); controller has Graph of Operations (GoO - execution plan - graph decomposition, transforms to apply to LLM thoughts) and Graph Reasoning State (GRS - history of thoughts and states). User interacts with controller, controller gives graph state (with scores) to prompter (which gives it as prompt to LLM), parser takes LLM thought (output) and generates unscored graph state; controller then scores (and ranks) thought states through scoring and validation (human or LLM in the loop). See figure 3 for example graph reasoning states (with API functions) and prompts. Has a sorting task as example: scoring function includes checking ascending or descending, and preserving the frequency of numbers (to prevent hallucinating or dropping numbers); also shows custom graph of operations (to do the sorting of 64 numbers); also has set operations, keyword counting, and document merging. Logarithmic latency and linear volume (best tradeoff compared to CoT, ToT, and CoT-SC - self-consistency). Direct IO (no prompt) and CoT are cheap (used GPT-3.5), but GoT gives better output (sorting, set intersection, keyword counting, and document merging). Related works are self-taught reasoner (STaR), self-reflection and self-evaluation of LLM outputs. From ETHz, Cledar, Warsaw University of Technology.
Links: arxiv, PapersWithCode, GitHub
Models citing this paper 0
No model linking this paper