How We Score Every MCP Server

Every server in our index is evaluated across multiple quality dimensions. The scoring methodology is designed to be fair, consistent, and actionable.

Local MCP Servers Definition Quality 50% Protocol 20% Support 30%
Remote MCP Servers Protocol 40% Security 30% Support 30%

Grade Thresholds

A+ 90 – 100
A 80 – 89
B 70 – 79
C 60 – 69
D 50 – 59
F Below 50

Weighted average of all evaluated dimensions determines the grade.

Two Scoring Models

Local MCP GitHub repositories - source code available
Definition Quality 50%
Protocol Readiness 20%
Supportability 30%

Evaluated on tool design quality, MCP protocol adherence, and project health signals from GitHub.

Remote MCP Hosted endpoints - no public source code
Protocol Compliance 40%
Security Checks 30%
Supportability 30%

Evaluated on protocol compliance, authentication security, and enterprise supportability.

Both models use a weighted average of their respective dimensions. The overall score determines the grade.

Scoring Dimensions

Dimension Local Remote What it measures
Definition Quality 50% N/A Tool naming, descriptions, parameter schemas, and composability
Protocol Compliance 20% 40% Transport type, tool registration correctness, MCP spec adherence
Security Checks N/A 30% OAuth 2.0, PKCE, transport security, authentication flows
Supportability 30% 30% Maintenance health, community adoption, organizational backing

Definition Quality

Local only

Measures how well tools are designed for AI agent consumption. Each tool is scored individually on naming clarity (verb-first, unambiguous intent), description quality (explains when, why, and what the tool returns), and parameter schema completeness (typed inputs with constraints and documentation). The overall score is the average of all per-tool evaluations. Tools without visible input schemas score zero for that sub-dimension. Informed by Arcade's 54 Agentic Tool Patterns and production experience building enterprise-grade agentic tools.

Protocol Compliance

Local + Remote

Assesses whether the server correctly implements the MCP protocol. Transport type is the primary signal - HTTP servers can score up to 100, while STDIO-only servers are capped at 50 since they cannot be accessed by hosted MCP clients. Tool registration correctness and MCP error handling are also evaluated. Optional MCP capabilities (prompts, resources, logging, sampling, etc.) are detected and displayed but do not affect the score - a server with only Tools support is fully MCP compliant per the specification.

Security Checks

Remote only

Evaluates authentication and security for remote MCP servers. Covers OAuth 2.0 flow correctness, PKCE support (S256), client registration, protected resource metadata, authorization server discovery (RFC8414), token endpoint authentication methods, and 401 challenge handling.

Supportability

Local + Remote

Measures adoption risk and maintenance health. For local servers: GitHub stars, open-source license, last push date, organization vs individual ownership, contributor count, release history, fork status, documentation, and commercial support indicators. For remote servers: SLA tier, enterprise support, deployment model, compliance certifications (SOC 2, GDPR), encryption, and multi-region availability.

Across these dimensions, ToolBench evaluates every MCP server with a single goal: help everyone build better tools.

Proudly built by Arcade.dev

At Arcade, we've spent months building high-quality agentic tools for enterprises and have seen firsthand what separates tools that work in production from tools that don't. Our customers and the broader community have been a tremendous source of learning along the way.

As the MCP ecosystem grew, we realized that many of the hard-earned lessons we've accumulated - from tool design patterns to protocol best practices - could help the entire community raise the bar. ToolBench is our way of sharing that knowledge, and the 54 Agentic Tool Patterns are the foundation it builds on.

We hope this inspires everyone building MCP tools to keep pushing the quality bar higher. Better tools mean better agents, and better agents mean a better experience for everyone.