Skip to content
← StarMapper

Organic Score — Calibration Data

Experimental

Empirical validation of the 4 signals used to compute the Organic Score. Corpus of 17 repos (11 healthy, 4 suspicious, 2 controls). Weights rebalanced 2026-05-06 to add releases cadence signal and redistribute weights (fork 40%→30%, zero-follower 55%→45%, releases new at 20%).

Last updated: May 2026

Signal Normalisation (0 → 100)

SignalGateThresholdsWeight
Fork / Star ratiostars ≥ 5 000≥ 10% → 100 · 7% → 50 · ≤ 2% → 030%
Watcher / Star ratioalways≥ 0.5% → 100 · 0.1% → 50 · ≤ 0.01% → 05%
% zero-follower stargazerssample ≥ 30≤ 10% → 100 · 30% → 50 · ≥ 60% → 045%
Releases cadencealways≥ 100 → 100 · 20 → 60 · 5 → 30 · 0 → 020%

Best fit: 92% of repos correctly classified (healthy ≥ 70, suspicious ≤ 45). Calibrated 2026-05-06.

Corpus Results

RepoExpectedStarsFork/★Watch/★Zero-fol.ReleasesScoreTier
pallets/flaskhealthy71 43223.5%2.9%~140100Healthy
langchain-ai/langchainhealthy134 37316.5%0.6%3.4%~21093Healthy
Significant-Gravitas/AutoGPThealthy183 63625.2%0.8%~7092Healthy
crewAIInc/crewAIhealthy49 42513.7%0.7%~5592Healthy
langgenius/difyhealthy138 64515.7%0.6%3.9%~32092Healthy
agno-agi/agnohealthy39 57313.4%0.6%~10091Healthy
mem0ai/mem0healthy53 71111.2%0.4%~8590Healthy
browser-use/browser-usehealthy89 19711.4%0.5%3.7%~3092Healthy
rtk-ai/rtkhealthy32 3085.8%0.26%7.4%14774Healthy
NousResearch/hermes-function-callinghealthy1 2921.4%~568Moderate
yargs/yargshealthy11 4718.9%0.7%~11075Moderate
unionlabs/unionsuspicious74 1345.2%2.2%~3041Suspicious
shardeum/shardeumsuspicious31 4972.2%0.9%~108Suspicious
Anoma/anomasuspicious33 91612.1%0.6%~2091Healthy
langflow-ai/langflowsuspicious147 2136.0%0.3%~34044Suspicious
sindresorhus/awesomecontrol457 5527.5%1.8%3.5%70Moderate
facebook/reactcontrol244 62920.8%2.7%3.5%~50100Healthy

Controls (awesome, react) are excluded from fit calculation. Anoma/anoma is an anomaly: fork/star looks healthy (12%) but known fraudulent by external sources.

Methodology vs. StarScout

StarMapper uses the 4 most accessible public signals. StarScout (CMU, 98% precision / 85% recall) relies on additional signals that require full dataset access.

SignalStarMapperStarScoutNotes
Fork / star ratio✓ 30%Reduced from 40%; fork/star penalises CLI tools with low fork rates by nature
% zero-follower stargazers✓ 45%partialStrongest discriminator when sample size ≥ 30. Reduced slightly to make room for releases signal
Watcher / star ratio✓ 5%Weakly discriminating in practice, weight kept low
Releases cadence✓ 20%New signal: total GitHub releases as proxy for active, maintained project
Clustering (account overlap across repos)Key signal in StarScout, requires full graph analysis
Temporal burst (stars in short window)Requires star timestamp history at scale
Account age + activity patternDetects sophisticated fakes, not available from public API alone

StarMapper reaches ~92% accuracy on labelled corpus (weights: fork 30%, ZF 45%, watcher 5%, releases 20%, 2026-05-06). StarScout reaches 98% precision using the full signal set. The gap is structural, not a calibration issue.

Caveats

  • Fork/star signal is gated at ≥ 5 000 stars. Below this threshold, the ratio is noisy on small repos.
  • CLI and developer tools (install via package manager, few forks) may have a lower fork/star ratio despite being organic. The fork signal is gated at ≥ 5 000 stars and its weight was reduced (70%→40%) to account for this.
  • Zero-follower signal requires ≥ 30 enriched users (users StarMapper has seen as stargazers). It is unavailable for repos not scanned on StarMapper.
  • Viral repos or niche communities (CLI tools, curated lists) may score lower despite being organic. The score reflects signals, not intent.
  • The score is not an accusation. Repos can score poorly due to community structure (e.g., crypto projects have high watcher counts but also many bot accounts).

References