Is FACTS score 68.8 actually better than low Vectara hallucination

https://blogfreely.net/landonwalsh07/mit-january-2025-why-do-models-say-definitely-more-when-they-are-wrong

As of March 2026, the landscape for evaluating large language models has become remarkably fragmented. I remember back in early 2023, we were all just using simple string matching or basic ROUGE scores to guess if a model was lying

Submitted on 2026-06-18 11:57:08