> ## Documentation Index
> Fetch the complete documentation index at: https://docs.twill.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Legibility Scorecard

> Score how legible your GitHub repo is for coding agents

The Legibility Scorecard analyzes a GitHub repository and scores how easy it is for coding agents to navigate, understand, and contribute to the codebase. It is based on [OpenAI's Agentic Legibility demo](https://openai.com/index/practice-of-agentic-legibility/) and adapted for Twill.

## How it works

1. **You submit a repo** — paste a public GitHub URL, or select a private repo connected to your workspace.
2. **A hosted shell runs the audit** — Twill sends the request to the OpenAI Responses API, which provisions a sandboxed Linux container. The container clones your repo and runs a deterministic Python scoring script.
3. **The scorer analyzes file patterns** — The script (`score_repo.py`) walks the repo tree and checks for specific files, patterns, and conventions across seven metrics. No dependencies are installed, no code is executed — it is purely static analysis.
4. **Results stream back live** — Shell commands, stdout/stderr, and the final scorecard stream to the UI in real time via NDJSON.

## Seven metrics

Each metric is scored 0–3 based on the presence and quality of specific signals:

| Metric                         | What it measures                                                                                                                 |
| ------------------------------ | -------------------------------------------------------------------------------------------------------------------------------- |
| **Bootstrap self-sufficiency** | Can an agent set up the project from a cold clone? Looks for setup scripts, dependency lockfiles, Docker configs, and Makefiles. |
| **Task entrypoints**           | Are there clear starting points for work? Checks for issue templates, TODO files, CONTRIBUTING guides, and task runners.         |
| **Validation harness**         | Can an agent verify its own changes? Looks for test suites, CI configs, and test scripts.                                        |
| **Lint & format gates**        | Are style checks automated? Checks for linter configs, pre-commit hooks, and format scripts.                                     |
| **Agent repo map**             | Is there explicit guidance for AI agents? Looks for AGENTS.md, CLAUDE.md, Cursor rules, and similar files.                       |
| **Structured docs**            | Is the project well-documented? Checks for READMEs, architecture docs, API docs, and changelogs.                                 |
| **Decision records**           | Are past decisions recorded? Looks for ADRs, RFCs, and design documents.                                                         |

## Scoring

* Each metric produces a score from 0 (no signals) to 3 (strong signals).
* The overall score is the sum across all seven metrics (max 21).
* A letter grade is assigned based on the percentage: A (85%+), B (70%+), C (50%+), D (below 50%).

## Quick wins

The scorer identifies low-effort improvements that would most improve your score — for example, adding a `CLAUDE.md` file or a setup script.

## Public vs. private repos

* **Public repos** can be analyzed by anyone with a Twill account — just paste the GitHub URL.
* **Private repos** require connecting your GitHub App installation to your Twill workspace. Twill uses a short-lived token scoped to the repos you grant access to.

## Attribution

The scoring rubric and methodology are based on OpenAI's [Practices for Governing Agentic AI Systems](https://openai.com/index/practice-of-agentic-legibility/) and their agentic legibility skill demo. The scoring script runs inside OpenAI's hosted shell environment via the Responses API.
