How close are we to a world where every paper on programming languages is accompanied by an electronic appendix with machine-checked
proofs?
We propose an initial set of benchmarks for measuring progress in this area. Based on the metatheory of System F< :, a typed lambda-calculus with second-order polymorphism, subtyping, and records, these benchmarks embody many aspects of programming
languages that are challenging to formalize: variable binding at both the term and type levels, syntactic forms with variable
numbers of components (including binders), and proofs demanding complex induction principles. We hope that these benchmarks
will help clarify the current state of the art, provide a basis for comparing competing technologies, and motivate further
research.