SELF-CONSISTENCY IMPROVES CHAIN OF THOUGHT REASONING IN LANGUAGE MODELS

Leverages the intuition that a complex reasoning problem typically admits multiple different ways of thinking leading to its unique correct answer.
- CoT is in some sense a ‘greedy’ reasoning algorithm where the ‘best’ next step is always taken
- Self consistency samples a diverse set of reasoning paths and then selects the most consistent
  - Incorrect reasoning paths are unlikely to produce the same answer, therefore they hypothesize that correct reasoning paths, however diverse, have greater agreement in final answer than incorrect reasoning paths
    - clearly not always true, but probably true most of the time
    - something feels wrong about this but can’t provide sound reason lmao
Significantly improves arithmetic and commonsense tasks from CoT, often SOTA
Method: forward pass the encoder stack, then sample forward passes from the decoder stack to get different outputs
- Use existing sampling techniques like temperature or TopK
- Aggregate by choosing the answer that is most consistent among the sampled, basically taking the most common answer

Untitled

I’m surprised that it works basically just as well for decoder only models…
- Since the model has no explicit mechanism and compute to ‘digest’ with the input and use that information to produce different reasoning paths
Diminishing returns in performance with # of sampled reasoning paths, 20 seems to be enough

Untitled

CoT could hurt performance in some few shot learning tasks, but CoT fixes this
Scales the same as CoT and robust with different sampling setups
Robust to imperfect prompts
Consistency within sampled outputs is a good indicator of performance, makes sense heuristically

Untitled

Questions
- How do we determine which reasoning process to use lol, don’t think it was mentioned