Reasoning models have limits. Here's what you can and can't expect from them, according to Apple's tests.