GitHub’s traffic API gives you 14 days of view and clone data per repository. After that, it’s gone. If you want a longer historical record of how your open source projects are performing, you have to collect the data yourself before the window closes.

ghtraffic is a small Go tool that solves this problem. It queries the GitHub traffic API for every repository you have push access to – including organization repos – and writes newline-delimited JSON to stdout, one record per repo per day. Run it hourly via cron and append to a local file, and you accumulate a permanent historical record.

ghtraffic -seen ~/.local/share/ghtraffic/traffic.jsonl \
  >> ~/.local/share/ghtraffic/traffic.jsonl

The -seen flag handles deduplication: records already in the file are skipped, except for today’s data which is always re-fetched since GitHub’s daily counts update throughout the day. Each record captures views, clones, referrers, and popular paths for a given repo and date.

Pushing to Umami

The companion tool ghpush reads this NDJSON and sends the records to an Umami instance as historical pageview events. This is where Umami’s API earns its keep: it accepts arbitrary historical timestamps, so you can backfill months of data and have it appear correctly in the dashboard timeline. Plausible offers no equivalent.

The event mapping is pragmatic:

GitHub metricUmami representation
Page viewsPageviews to /<owner>/<repo>
ClonesPageviews to /clone/<owner>/<repo>
ReferrersPageviews with the Referrer field set
Popular pathsPageviews to the actual GitHub subpath

One deliberate choice sits behind the clone numbers. GitHub reports both a total clone count and a count of unique cloners, and the total is dominated by automation – CI runners, Dependabot, mirrors – routinely running an order of magnitude above the human figure. ghpush pushes the unique cloners as the better proxy for real interest. Views use the raw count, which isn’t inflated the same way.

A few things worth knowing. Umami’s /api/send endpoint requires a browser user-agent – send anything else and it returns {"beep":"boop"} with HTTP 200, silently dropping your data. The visitor count metric is also not meaningful for server-side pushes: Umami deduplicates visitors by IP, so all events pushed from the same server look like one visitor. Use the pageview count and the Pages breakdown filtered to /clone/ for actual numbers.

ghpush tracks what it has already sent so re-runs never double-count – Umami does no deduplication of its own. That state lives in a SQLite file by default (-pushed), or in Postgres (-pg, or the GHPUSH_DATABASE_URL environment variable) for a shared or containerized setup. -migrate-sqlite copies an existing SQLite state into Postgres without re-pushing, so you can switch backends without losing your place. A -init flag bypasses the state and pushes everything – useful for bootstrapping a fresh Umami website or recovering from a corrupted state file.

Usage

Authentication for the GitHub API uses GITHUB_TOKEN, falling back to gh auth token. Umami credentials come from UMAMI_URL and UMAMI_WEBSITE_ID environment variables.

0  * * * * GITHUB_TOKEN=... ghtraffic -seen ~/traffic.jsonl >> ~/traffic.jsonl
5  * * * * UMAMI_URL=https://umami.example.com UMAMI_WEBSITE_ID=... ghpush -pushed ~/pushed.db < ~/traffic.jsonl

GitHub’s traffic API requires push access on the repo, so the tool automatically scopes to repositories you actively maintain.

Running it as a service

The cron pair above is the simplest setup. For an always-on deployment the repo also ships a scheduler binary – the entrypoint of a small distroless container image – that runs the collect-then-push cycle on a fixed interval, so the pipeline keeps running without depending on a particular machine being awake.

Token scope shapes how you configure it. The traffic endpoints need the Administration: read permission, and a fine-grained personal access token is bound to a single account or organization. To cover repositories across several owners, ghtraffic takes one token per owner – a list in GHTRAFFIC_OWNERS and a matching GHTRAFFIC_TOKEN_<OWNER> for each – and collects each owner with its own. One owner’s failure is logged and skipped; the rest still run.