Best Practices for Source-Based Research on News Trustworthiness

ICA 2025 in Denver, USA

Jula Luehring, Hannah Metzler, Ruggero Lazzaroni, Apeksha Shetty, Jana Lasser

How to measure misinformation?

Misinformation is hard to detect due to

-   subtle & grey-area content

-   ethical complexity
    
-   conceptual/methodological constraints & traditions

Source-based methods have the advantage to

-   include grey-area content & better reflects information diets

-   easier to make judgments
    
-   scalable & unbiased

NewsGuard

data base of ~12,000 news domains rated for trustworthiness (0–100)
selected based on web data and evaluated using 9 journalistic quality criteria \(\rightarrow\) accurate at scale
ratings are updated regularly by country experts
initially US-focused, now including other countries

++ NewsGuard is the most comprehensive list of source ratings & widely used (in Science & nature)

— it’s not reproducible, and we can’t validate it

Research goals

assess stability and completeness of ratings over time and countries
evaluate the value of additional labels (e.g., political orientation, topics)
provide recommendations for source-based approaches

Rating stability over time

trustworthiness is relatively stable (changes rare, avg. 2 yrs)
drops due to new low-trustworthiness sources being added

Country-level completeness

majority of sources are US-based (~76%)
trustworthiness scores vary by country (US lowest on avg.)
stable state reached by ~2022 for US, DE, FR, IT, CA

blue = number of sources, green = trustworthiness

Use of contextual labels

political orientation label (right/left) sparse (~33%, mostly US)
right-leaning sources score lower on average
most sources have useful topic label (e.g., “health”, “politics”)

What happens when we use different versions?

continuous scores: stable results across time
binary labels: can distort trends (e.g., spike in “untrustworthy” links from Republicans post-2020)

Recommendations

always prefer continuous over binary scores (or match dynamically)
use annual snapshots after first stable state and check for major source additions/removals
check for country-specific journalistic traditions, esp. for comparative research
topic and orientation labels are useful to characterize sources beyond trustworthiness but should be validated (or at least spot-checked)

Coverage: Be cautious for countries other than those mentioned above, as coverage may be incomplete.

Stability: The overall score rarely changes a lot, but coverage does. When using temporal data, check for major additions or removals (or use an annual snapshot).

Country differences: Countries have different journalistic traditions, which we can see in the criteria for rating trustworthiness. Check if they make sense for the context you’re studying (esp. for comparative research).

Binary trustworthiness ratings: True/false labels for trustworthiness can be volatile across time periods and may distort downstream research. We strongly recommend using continuous scores whenever possible.

Look beyond trustworthiness: NG provides additional contextual data, like political orientation and topic labels. Such labels are incomplete but useful to better characterize sources beyond trustworthiness.

Implications for source-based approaches

binary true/false labels can be volatile across time periods
coverage & ratings are relatively insensitive to time once it reaches a stable state, speaking for the reliability of source-based approaches
need more open, transparent alternatives

Published in JQD:DM earlier this year

Thank you!

Email: jula.luehring@univie.ac.at

Bluesky: @julaluehring.bsky.social

Github: github.com/julaluehring