
Artificial intelligence may be fluent in everything from tax law to Taylor Swift, but when the topic turns to God, some of the biggest chatbots suddenly get quiet. Baylor-backed researchers say leading language models routinely skip religion when answering moral and personal questions, and in some cases subtly nudge users toward or away from particular faiths. The findings come from a multi-university consortium that ran hundreds of ethics-style prompts and conversion scenarios against dozens of popular language models, raising fresh questions about whether AI is reflecting or erasing the moral frameworks many people rely on.
What the AllFaith benchmark tested
The team built the AllFaith Benchmark to see whether AI models bring religious perspectives into everyday ethical dilemmas, scoring answers to 150 questions and a separate conversion battery that covered 14 faiths. Researchers compared model output to a nationally representative survey of 1,125 Americans about when religion should factor into moral advice, and they evaluated 27 models for the representation test and 20 models for conversion trials, according to CEFE-AI.
What they found
One CEFE-AI paper concluded that large language models “consistently underrepresent religion” compared with human expectations, especially in practical situations like grief or family conflict, according to arXiv. A second CEFE-AI preprint zeroed in on conversion scenarios and found reproducible asymmetries: models tended to encourage conversion to Catholicism while discouraging conversion to Jehovah’s Witnesses, and the authors singled out xAI’s Grok 4.20 as showing the strongest asymmetries, according to arXiv.
The consortium also pointed out that religion barely appears in mainstream AI bias research, with only 0.2% of more than 12,000 papers touching the topic at all, a gap it argues needs urgent attention, according to Baylor University.
Faith scholars say models are missing context
“Yet, when faced with these same ethical questions, AI systems largely ignore the role of religion,” Paul Martens, director of Baylor’s Center for Ethics, said in a Baylor news release. Members of the consortium, which includes Brigham Young, the University of Notre Dame, and Yeshiva University, say the benchmarks are meant to kick off technical and theological conversations with model builders, not to shame individual products. The group hopes the open dataset gives developers a straightforward way to test and improve how systems handle faith-related questions.
What companies and users should watch
CEFE-AI has open-sourced the benchmarks and results on GitHub and Kaggle and invited contributions from scholars across traditions, according to CEFE-AI. Researchers say the goal is to build measurement tools that push models toward fairer, more pluralistic behavior, not to hard-code theology into systems, and to make sure faith communities have a seat at the table when AI products are shaped.
Where this debate goes from here
CEFE-AI’s release lands in the middle of a broader push to make AI technologies more accountable. Pope Leo XIV’s May 25 encyclical urged that AI development respect human dignity and avoid concentrating power, adding a global moral voice to the conversation, according to the Associated Press. The researchers say they hope model providers will pick up and use the benchmarks; for now, the AllFaith sets give scholars, developers and faith leaders a shared yardstick for spotting and fixing blind spots, as reported by the Houston Chronicle.









