http://arxiv.org/pdf/2203.11171
- Leverages the intuition that a complex reasoning problem typically admits multiple different ways of thinking leading to its unique correct answer.
- CoT is in some sense a ‘greedy’ reasoning algorithm where the ‘best’ next step is always taken
- Self consistency samples a diverse set of reasoning paths and then selects the most consistent
- Incorrect reasoning paths are unlikely to produce the same answer, therefore they hypothesize that correct reasoning paths, however diverse, have greater agreement in final answer than incorrect reasoning paths
- clearly not always true, but probably true most of the time
- something feels wrong about this but can’t provide sound reason lmao
- Significantly improves arithmetic and commonsense tasks from CoT, often SOTA
- Method: forward pass the encoder stack, then sample forward passes from the decoder stack to get different outputs
- Use existing sampling techniques like temperature or TopK
- Aggregate by choosing the answer that is most consistent among the sampled, basically taking the most common answer


- I’m surprised that it works basically just as well for decoder only models…
- Since the model has no explicit mechanism and compute to ‘digest’ with the input and use that information to produce different reasoning paths
- Diminishing returns in performance with # of sampled reasoning paths, 20 seems to be enough

- CoT could hurt performance in some few shot learning tasks, but CoT fixes this
- Scales the same as CoT and robust with different sampling setups
- Robust to imperfect prompts
- Consistency within sampled outputs is a good indicator of performance, makes sense heuristically

- Questions
- How do we determine which reasoning process to use lol, don’t think it was mentioned