Performance claims, data table don't make sense (to me)

Fara-7B appears to under-perform compared to SoM Agents o3-mini and GPT-4o-0513, and mostly under-performs compared to computer use model OpenAI computer-use-preview. This doesn't seem to match the claim that Fara achieves state-of-the-art results [...] outperforming both comparable-sized models and larger systems. Am I missing something?