Skip to content
← StarMapper

Organic Score — Calibration Data

Experimental

Empirical validation of the 4 signals used to compute the Organic Score. Corpus of 17 repos (11 healthy, 4 suspicious, 2 controls). Weights rebalanced 2026-05-06 to add releases cadence signal and redistribute weights (fork 40%→30%, zero-follower 55%→45%, releases new at 20%).

Last updated: May 2026

Signal Normalisation (0 → 100)

SignalGateThresholdsWeight
Fork / Star ratiostars ≥ 5 000≥ 10% → 100 · 7% → 50 · ≤ 2% → 030%
Watcher / Star ratioalways≥ 0.5% → 100 · 0.1% → 50 · ≤ 0.01% → 05%
% zero-follower stargazerssample ≥ 30≤ 10% → 100 · 30% → 50 · ≥ 60% → 045%
Releases cadencealways≥ 100 → 100 · 20 → 60 · 5 → 30 · 0 → 020%

Best fit: 92% of repos correctly classified (healthy ≥ 70, suspicious ≤ 45). Calibrated 2026-05-06.

Corpus Results

RepoExpectedStarsFork/★Watch/★Zero-fol.ReleasesScoreTier
pallets/flaskhealthy71 43223.5%2.9%~140100Healthy
langchain-ai/langchainhealthy134 37316.5%0.6%3.4%~21093Healthy
Significant-Gravitas/AutoGPThealthy183 63625.2%0.8%~7092Healthy
crewAIInc/crewAIhealthy49 42513.7%0.7%~5592Healthy
langgenius/difyhealthy138 64515.7%0.6%3.9%~32092Healthy
agno-agi/agnohealthy39 57313.4%0.6%~10091Healthy
mem0ai/mem0healthy53 71111.2%0.4%~8590Healthy
browser-use/browser-usehealthy89 19711.4%0.5%3.7%~3092Healthy
rtk-ai/rtkhealthy32 3085.8%0.26%7.4%14774Healthy
NousResearch/hermes-function-callinghealthy1 2921.4%~568Moderate
yargs/yargshealthy11 4718.9%0.7%~11075Moderate
unionlabs/unionsuspicious74 1345.2%2.2%~3041Suspicious
shardeum/shardeumsuspicious31 4972.2%0.9%~108Suspicious
Anoma/anomasuspicious33 91612.1%0.6%~2091Healthy
langflow-ai/langflowsuspicious147 2136.0%0.3%~34044Suspicious
sindresorhus/awesomecontrol457 5527.5%1.8%3.5%70Moderate
facebook/reactcontrol244 62920.8%2.7%3.5%~50100Healthy

Controls (awesome, react) are excluded from fit calculation. Anoma/anoma is an anomaly: fork/star looks healthy (12%) but known fraudulent by external sources.

Methodology vs. StarScout

StarMapper uses the 4 most accessible public signals. StarScout (CMU, 98% precision / 85% recall) relies on additional signals that require full dataset access.

SignalStarMapperStarScoutNotes
Fork / star ratio✓ 30%Reduced from 40% — fork/star penalises CLI tools with low fork rates by nature
% zero-follower stargazers✓ 45%partialStrongest discriminator when sample size ≥ 30. Reduced slightly to make room for releases signal
Watcher / star ratio✓ 5%Weakly discriminating in practice — weight kept low
Releases cadence✓ 20%New signal — total GitHub releases as proxy for active, maintained project
Clustering (account overlap across repos)Key signal in StarScout — requires full graph analysis
Temporal burst (stars in short window)Requires star timestamp history at scale
Account age + activity patternDetects sophisticated fakes — not available from public API alone

StarMapper reaches ~92% accuracy on labelled corpus (weights: fork 30%, ZF 45%, watcher 5%, releases 20%, 2026-05-06). StarScout reaches 98% precision using the full signal set. The gap is structural — not a calibration issue.

Caveats

  • Fork/star signal is gated at ≥ 5 000 stars — below this threshold, the ratio is noisy on small repos.
  • CLI and developer tools (install via package manager, few forks) may have a lower fork/star ratio despite being organic. The fork signal is gated at ≥ 5 000 stars and its weight was reduced (70%→40%) to account for this.
  • Zero-follower signal requires ≥ 30 enriched users (users StarMapper has seen as stargazers). It is unavailable for repos not scanned on StarMapper.
  • Viral repos or niche communities (CLI tools, curated lists) may score lower despite being organic. The score reflects signals, not intent.
  • The score is not an accusation. Repos can score poorly due to community structure (e.g., crypto projects have high watcher counts but also many bot accounts).

References