Blacklist

Whitelist

Master

Slave

Time-series history of total counts

Blacklist

Whitelist

Master

Slave

Searchable / Sortable tables of counts & changes

Word counts per file in a given repo

Word

It is not the goal of this dashboard to shame anyone

Rather, it is indended to help those who wish to work on this project to know where to focus their attention and time.

Methodology

Individual counts

Once per week (Monday, 5am UTC) we run a script which takes a list of repos, and clones them all locally. For each repo, and for each word (regular expression) of interest, we run this command:

ag -c $word $repo_path

Which results in a count for that term. We use `ag` because it is much faster than `grep` and because it correctly counts multiple matches per line (e.g. 'master of master' would be 1 hit in grep, but 2 hits in ag).

Excluded directories

We don't exclude any directories from the ag process. There are 3 main reasons for this:

  • Simple methodology:
  • If we exlude "vendor" or "src" then we start playing whack-a-mole with a proliferation of language-specific places-to-put-libraries. Keeping it simple makes it easy for others replicate our results

  • Influencing upstream libraries
  • Our goal is to improve as many projects as we can. If we simply accept the argument that "this isn't my code" then we're missing the chance to positively impact many other projects by working with those upstreams

  • Nowhere to hide (code)
  • It is inevitable that this project will provoke debate, and that's fine. But *we* still want to see what's happening out there, and if we exclude directories, then developers may decide to put code in those directories just to avoid indexing

This approach has a significant consequence: It is quite possible that a repo will have instances of these terms that it cannot remove. Perhaps because of long deprecation cycles, integration with other projects, or internal APIs, this can happen, and it's fine. We'll say again the message at the top of the page: The goal of this dashboard is not to shame anyone

Dashboard Tools

There are 4 main tabs

  • Dashboard has the current data for the repos
  • History explores the timeline data for the repos
  • Tables lets you get exact values for a given repo
  • Files lets you get exact values for a given file in a repo

The Org & Repo selectors on the left sidebar will allow you to filter any of the graphs to a given subset

All graphs can be downloaded, hover for a second and the download button should appear at the bottom-left of the image

About tooling

This dashboard was built in R and RStudio with shiny, shinydashboard, plotly, the tidyverse, and many more packages.

You can view the source for this dashboard at GitHub/conscious-lang/dashboard-frontend

You can view the source for the backend scraper at GitHub/conscious-lang/dashboard-backend