Evidence hierarchy, annotation, auditability
Hierarchy.
A claim score is the output of a three-part system: a versioned evidence
corpus, a structured post corpus per analyzed account, and a deterministic
scoring function. Every number traces to a specific claim, a specific
card, and specific citations.
01 Ingest
→
02 Extract
→
03 Map to card
→
04 Score
→
05 Audit trail
01The truth hierarchy
Every claim gets one level, H1 through H5, based on the strength of human
evidence behind that exact claim. The level is the ceiling on how
confidently an account can state it before losing points.
H5
Strong human consensus assertive OK
Replicated systematic-review signal. High-certainty evidence on hard outcomes.
LDL causal for ASCVD
Statins · secondary prevention
Exercise reduces mortality
VO₂max predicts mortality
Grip strength predicts mortality
H4
Good human evidence assertive but hedged
Multiple decent trials or one strong RCT. Certainty limited by inconsistency, indirectness, or precision.
NMN raises NAD⁺ in humans
CR improves healthspan markers
Fish oil — no CV benefit in general pop
H3
Weak / mixed human evidence hedged required
Observational, underpowered, conflicting, or surrogate outcomes only.
Metformin extends human lifespan
Biological age reversibility
TRF beyond calorie reduction
30 ml olive oil daily extends lifespan
H2
Preclinical / mechanistic / anecdotal uncertain required
Animal, cell, pathway, or clinician anecdote. Anecdotes are not proof.
NMN slows aging in humans
Cold exposure extends healthspan
Plasma exchange rejuvenates healthy people
H1
Unsupported / contradicted / marketing-only don't assert
No credible support, or stronger evidence points the other way.
Young plasma rejuvenation
Seed oils uniquely toxic
Follistatin gene therapy slows aging
Blueprint protocol slows aging
Resveratrol extends healthspan
Why 5 levels?
A continuous 0–1 score suggests precision we don't have. Five buckets
match how clinicians already think about evidence and map cleanly to a
confidence ceiling — H5 tolerates assertive, H1 basically don't.
Scope matters as much as level
Every card has a
scope block — population, intervention,
outcome, and a
not_supported_for list. Extrapolating from a
narrow scope to a broad population is a scoping penalty even when the
level is high.
02Evidence card anatomy
A single versioned machine-readable dataset. Each card declares one
normalized claim family:
claim_familycanonical name, e.g. statins_reduce_cv_secondary
evidence_levelH5 H4 H3 H2 H1
directionsupports · contradicts · mixed · insufficient
certainty_modifiersGRADE-style: risk_of_bias, inconsistency, indirectness, imprecision, publication_bias
scopepopulation · intervention · outcome · not_supported_for
stakescritical high moderate low
safety_flaglow · low-to-moderate · moderate · high
commercial_sensitivitygeneric · supplement-sold-frequently · procedure-sold · protocol-sold
sourceslist of citations — each with type, url, label
03Per-post annotation
Each post gets the same normalized schema. Identical claims from different
phrasings map to the same family so evidence is adjudicated once.
claim_familymatched card
speaker_stancesupports · refutes · mentions
strength_languageuncertain · hedged · assertive
scope_languagenarrow · broad · absent
citation_signalnone · referenced · linked
absolutes_presenttrue · false
anecdote_signaltrue · false
acknowledges_uncertaintytrue · false
acknowledges_conflictingtrue · false
sells_matching_producttrue · false
key_claimsverbatim quotes + paraphrase
04Source hierarchy
Sources are prioritized top-to-bottom when assigning a level. Weight dots scale with tier authority.
1
Major guidelines
USPSTF · ACC/AHA · ESC/EAS · WHO · FDA · CDC · NICE
2
Cochrane-class systematic reviews
Structured certainty grading · low risk of bias
3
Landmark RCTs
NEJM · Lancet · JAMA · large, blinded, adequately powered
4
Meta-analyses & systematic reviews
Independent · pre-registered where possible
5
Large prospective cohorts
Framingham · UK Biobank · PURE · NHS/HPFS · EPIC
6
Regulator advisories
FDA · FTC · EFSA · NIEHS · NIH ODS fact sheets
05Corpus coverage
Loading corpus statistics…
06Auditability
Every score links to (a) the live post, (b) a web-archive snapshot, and
(c) the evidence card + citations. The point of disagreement is always
locatable.
01
Open the account page and find the claim in question.
02
Click View on X to confirm the post text we analyzed.
03
Click Archive snapshot to confirm what the post said at scrape time.
04
Check the card's Sources. If we mis-graded evidence, cite a stronger source.
05
Check the per-component A–F breakdown. Disagree? Cite the exact wording.
06
Submit corrections via
contact. Corrections ripple through affected scores.
07Known limitations
Each account scored on a recent window of posts, not full history.
Reposts weighted as weak endorsements.
Long threads treated as a single unit; per-thread aggregation is future work.
Media (images, video, podcast) not directly scored — substantive claims there are only partially captured.
Commercial context verified manually per account; will move to automated bio-resolution at scale.
Freshness gap — guideline updates are incorporated on a rolling basis, not in real time.