Evidence hierarchy, annotation, auditability

Hierarchy.

A claim score is the output of a three-part system: a versioned evidence corpus, a structured post corpus per analyzed account, and a deterministic scoring function. Every number traces to a specific claim, a specific card, and specific citations.

01 Ingest
02 Extract
03 Map to card
04 Score
05 Audit trail

01The truth hierarchy

Every claim gets one level, H1 through H5, based on the strength of human evidence behind that exact claim. The level is the ceiling on how confidently an account can state it before losing points.

H5
Strong human consensus assertive OK

Replicated systematic-review signal. High-certainty evidence on hard outcomes.

LDL causal for ASCVD Statins · secondary prevention Exercise reduces mortality VO₂max predicts mortality Grip strength predicts mortality
H4
Good human evidence assertive but hedged

Multiple decent trials or one strong RCT. Certainty limited by inconsistency, indirectness, or precision.

NMN raises NAD⁺ in humans CR improves healthspan markers Fish oil — no CV benefit in general pop
H3
Weak / mixed human evidence hedged required

Observational, underpowered, conflicting, or surrogate outcomes only.

Metformin extends human lifespan Biological age reversibility TRF beyond calorie reduction 30 ml olive oil daily extends lifespan
H2
Preclinical / mechanistic / anecdotal uncertain required

Animal, cell, pathway, or clinician anecdote. Anecdotes are not proof.

NMN slows aging in humans Cold exposure extends healthspan Plasma exchange rejuvenates healthy people
H1
Unsupported / contradicted / marketing-only don't assert

No credible support, or stronger evidence points the other way.

Young plasma rejuvenation Seed oils uniquely toxic Follistatin gene therapy slows aging Blueprint protocol slows aging Resveratrol extends healthspan
Why 5 levels?
A continuous 0–1 score suggests precision we don't have. Five buckets match how clinicians already think about evidence and map cleanly to a confidence ceiling — H5 tolerates assertive, H1 basically don't.
Scope matters as much as level
Every card has a scope block — population, intervention, outcome, and a not_supported_for list. Extrapolating from a narrow scope to a broad population is a scoping penalty even when the level is high.

02Evidence card anatomy

A single versioned machine-readable dataset. Each card declares one normalized claim family:

claim_familycanonical name, e.g. statins_reduce_cv_secondary
evidence_levelH5 H4 H3 H2 H1
directionsupports · contradicts · mixed · insufficient
certainty_modifiersGRADE-style: risk_of_bias, inconsistency, indirectness, imprecision, publication_bias
scopepopulation · intervention · outcome · not_supported_for
stakescritical high moderate low
safety_flaglow · low-to-moderate · moderate · high
commercial_sensitivitygeneric · supplement-sold-frequently · procedure-sold · protocol-sold
sourceslist of citations — each with type, url, label

03Per-post annotation

Each post gets the same normalized schema. Identical claims from different phrasings map to the same family so evidence is adjudicated once.

claim_familymatched card
speaker_stancesupports · refutes · mentions
strength_languageuncertain · hedged · assertive
scope_languagenarrow · broad · absent
citation_signalnone · referenced · linked
absolutes_presenttrue · false
anecdote_signaltrue · false
acknowledges_uncertaintytrue · false
acknowledges_conflictingtrue · false
sells_matching_producttrue · false
key_claimsverbatim quotes + paraphrase

04Source hierarchy

Sources are prioritized top-to-bottom when assigning a level. Weight dots scale with tier authority.

1
Major guidelines
USPSTF · ACC/AHA · ESC/EAS · WHO · FDA · CDC · NICE
2
Cochrane-class systematic reviews
Structured certainty grading · low risk of bias
3
Landmark RCTs
NEJM · Lancet · JAMA · large, blinded, adequately powered
4
Meta-analyses & systematic reviews
Independent · pre-registered where possible
5
Large prospective cohorts
Framingham · UK Biobank · PURE · NHS/HPFS · EPIC
6
Regulator advisories
FDA · FTC · EFSA · NIEHS · NIH ODS fact sheets
Downweighted Mechanistic-only studies · small-n case reports · industry white papers · podcast-only claims · tweets citing tweets

05Corpus coverage

Loading corpus statistics…

06Auditability

Every score links to (a) the live post, (b) a web-archive snapshot, and (c) the evidence card + citations. The point of disagreement is always locatable.

01
Open the account page and find the claim in question.
02
Click View on X to confirm the post text we analyzed.
03
Click Archive snapshot to confirm what the post said at scrape time.
04
Check the card's Sources. If we mis-graded evidence, cite a stronger source.
05
Check the per-component A–F breakdown. Disagree? Cite the exact wording.
06
Submit corrections via contact. Corrections ripple through affected scores.

07Known limitations

Each account scored on a recent window of posts, not full history.
Reposts weighted as weak endorsements.
Long threads treated as a single unit; per-thread aggregation is future work.
Media (images, video, podcast) not directly scored — substantive claims there are only partially captured.
Commercial context verified manually per account; will move to automated bio-resolution at scale.
Freshness gap — guideline updates are incorporated on a rolling basis, not in real time.