https://arxiv.org/pdf/2305.10601
- Intuition: cognitive science suggests that humans have 2 reasoning modes: heuristics (fast & unconscious) and deliberate mode, we try to make LLMs do the latter one
- improved upon CoT and self-consistency
- Generalizes over the chain of thought technique, enables exploration of different reasoning paths and self evaluation of choices
- Enable ‘looking ahead’ or ‘backtracking’

- Hmmmm the paper has too much mathematical formalization for my liking so I’m gonna rewrite in English
- ToT needs to do 4 things:
- Thought decomposition: this can vary between tasks but the idea is to break it down into fundamental units of thought. Should be small enough that LMs can generate promising and diverse samples, but big enough to be evaluated for progress towards the problem
- Thought generator: given any step (and preceding steps) we need possible next steps
- Sample from multiple forward passes, good for rich thought spaces (model can provide diverse responses)
- Use a prompt to get the LM to propose approaches in parallel (in one forward pass), good for more constrained thought spaces (model can propose different steps instead of just 1 or 2 we get from repeated forward passes)
- State/thought evaluator: to decide the next step
- Value each step independently with a LM using a human-defined rubric + some guidance. Used when easy to value progress/goodness
- Vote across states with prompt (or I think we can use consistency here!!), good for when steps are hard to directly value.
- Search algorithm: really any can be used, the paper looks at breadth first and depth first search
- BFS algorithm: at each step cache b most promising steps on the current iteration, and for the next iteration, look at everything downstream those and cache another b steps
- DFS: greedy decode at each thought node until reach the end or failure to solve problem, back track one step, try again, etc
- Results: they used 3 tasks - Game of 24, creative writing, crosswords
- quite a big jump in capabilities across all 3, but depends significantly on ToT setup (decisions made for the 4 steps above)
Questions/comments:
- The proposed method seems to go against the paradigm of ‘provide info + build mechanisms’ and let the model learn. Doesn’t generalize well without human input on different problems, although it is effective for the problems studied
- Like we need to engineer prompts/methods to solve different problems
- I wonder if we can use the fact that consistency indicates accuracy between steps (from CoT paper)
- eg take top 2 consistent answers, look at the next step for each, and pick the one with the most consistent next step etc