https://www.apolloresearch.ai/blog/we-need-a-science-of-evals

Questions that a robust eval system should be able to answer (directly from article)