SimpleQA vs PersonQA: Why Do Newer Models Sometimes Get Worse?
https://500px.com/p/seosupremecommanderazbnm
In the enterprise search and RAG space, we have developed an unhealthy obsession with the "up and to the right" trajectory of model benchmarks