Get a collection of package.json files for non-NPM packages
We needed package.json
files from real projects that aren't packages published to NPM. While
NPM can tell you the absolute usage of NPM packages in terms of download numbers, we were
interested in the set of dependencies that people were using together in a given project.
More details about how we used a sample of these package.json files to simulation for StackAid: StackAid in Beta
On MacOS:
brew install brew install go-task/tap/go-task && task brew:requirements
Use the src
CLI to see if you're authenticated:
task src:login
If you're not logged in, then you should see a link in the output for creating
an access token. Once you have an access token, put it in the .env
file. It
should look like this:
SRC_ACCESS_TOKEN=<your access token>
Once configured correctly, rerun src:login
task to confirm your
configuration.
To query for all package.json files on GitHub that aren't in node_modules
or directories such
as test
, fixture
or examples
:
task src:query
The command will take about 1 minute and return just over 1M results. The results file in the data
directory: ./data/src_github_results.jsonl
and it should look like this:
{"type":"path","path":"package.json","repository":"freeCodeCamp/freeCodeCamp","branches":[""],"commit":"382717cce4ea5593eb623ba5ef0bd47c534411d1"}
{"type":"path","path":"web/package.json","repository":"freeCodeCamp/freeCodeCamp","branches":[""],"commit":"382717cce4ea5593eb623ba5ef0bd47c534411d1"}
{"type":"path","path":"curriculum/package.json","repository":"freeCodeCamp/freeCodeCamp","branches":[""],"commit":"382717cce4ea5593eb623ba5ef0bd47c534411d1"}
{"type":"path","path":"tools/crowdin/package.json","repository":"freeCodeCamp/freeCodeCamp","branches":[""],"commit":"382717cce4ea5593eb623ba5ef0bd47c534411d1"}
{"type":"path","path":"tools/scripts/seed/package.json","repository":"freeCodeCamp/freeCodeCamp","branches":[""],"commit":"382717cce4ea5593eb623ba5ef0bd47c534411d1"}
To convert the file to a CSV:
task src:query:csv
The results will be in ./data/src_github_results.csv
and it should looks this this:
repo,commit_sha,path
freeCodeCamp/freeCodeCamp,382717cce4ea5593eb623ba5ef0bd47c534411d1,package.json
freeCodeCamp/freeCodeCamp,382717cce4ea5593eb623ba5ef0bd47c534411d1,web/package.json
freeCodeCamp/freeCodeCamp,382717cce4ea5593eb623ba5ef0bd47c534411d1,curriculum/package.json
freeCodeCamp/freeCodeCamp,382717cce4ea5593eb623ba5ef0bd47c534411d1,tools/crowdin/package.json
freeCodeCamp/freeCodeCamp,382717cce4ea5593eb623ba5ef0bd47c534411d1,tools/scripts/seed/package.json
Try the query on Sourcegraph!