Is FACTS score 68.8 actually better than low Vectara hallucination
https://blogfreely.net/landonwalsh07/mit-january-2025-why-do-models-say-definitely-more-when-they-are-wrong
As of March 2026, the landscape for evaluating large language models has become remarkably fragmented. I remember back in early 2023, we were all just using simple string matching or basic ROUGE scores to guess if a model was lying