Note: the following instalation steps have been tested on a MacOS system only, however because most of the commands can be executed via the docker container (as well as locally via npm) it should work on other systems without issues.
- Open a terminal window and clone this repo:
git clone [email protected]:rossanthony/github-miner.git
- Run setup:
npm run docker-setup
(this will trigger a download and build of the required docker containers, plus installation of the npm module dependencies) - Make a copy of
.env.default
named.env
(at the same root level in the project folder, note: this file is in the .gitignore so it won't be commited to avoid exposing secrets), update the following two lines:
GITHUB_CLIENT_ID=
GITHUB_CLIENT_SECRET=
with credentials from your own OAuth Github app. To obtain these to your github account and go to /settings/developers, for more details on this process see here. Without this step the rate-limit will kick in at 10 requests p/min. Authenticated apps are permitted up to 30 requests p/min.
4. Install the neo4j plugins: npm run install-neo4j-plugins
- Start mining:
npm run mine
(local machine) ornpm run docker-exec mine
(to execute inside the docker container) - Import the mined data into Neo4j:
npm run insert
(local) ornpm run docker-exec insert
(docker) - Explore the data via the locally running instance of Neo4j browser: http://localhost:7474
To import the dependencies for a specific repository use the following command npm run insert <username> <repo>
, for example to run it against this repo run:
npm run insert rossanthony github-miner
Once this has run and imported the data, it is possible to run a cypher query like the example below:
MATCH (repo:GitRepo {
full_name:'rossanthony/github-miner'}
)-[:DEPENDS_ON*]->(dependencies)
RETURN repo, dependencies
This should return all node_modules (direct imports and sub-dependencies) of the project.
For examples of other queries to run see /documentation/queries.md.
- Local:
npm run test
- Docker:
npm run docker-exec test
(GitUser)--[:OWNS]-->(GitRepo)--[:DEPENDS_ON]-->(NodeModule)--[:HOSTED_ON]-->(GitRepo)