DeepSeek R1 ties o1 for first place on the Generalization Benchmark.