A set of commands to manipulate ASTs generated by bigcode-astgen packages.
For OSX users, sbt
and graphviz
are available through Homebrew:
brew install sbt
brew install graphviz
The project can be compiled in a single executable jar by running
sbt assembly
once this is done,
./bin/bigcode-ast-tools -h
should provide some help about the available commands.
AST files generated by bigcode-astgen can be visualized using the visualize-ast
command.
./bin/bigcode-ast-tools visualize-ast <filepath> --index <index>
where <filepath>
is a path to a generated JSON
file and <index>
is the index
of the AST to visualize and defaults to 0, the first AST in the file.
The command can also read from standard input if -
is passed as filename, which
allow to pipe astgen result directly:
bigcode-astgen-java MyFile.java | bigcode-ast-tools visualize-ast -
Vocabulary can be extracted from a file using
./bin/bigcode-ast-tools generate-vocabulary <filepath> --size <size> -o <output>
where <size>
is the maximum size of the vocabulary. --strip-identifers
can
be passed to ignore all identifiers (value
field in the JSON AST).
The output is a tsv
file containing the vocabulary.
The distribution of the vocabulary generated with generate-vocabulary
can be
visualized by using
./bin/bigcode-ast-tools visualize-vocabulary-distribution -v <vocabluary-path>
where <vocabluary-path>
is the path to the vocabulary TSV file.
Data can be generated to learn embeddings from the AST data.
./bin/bigcode-ast-tools generate-skipgram-data <filepath> -v <vocabluary-path> \
--children-window-size 2 --ancestors-window-size 2 -o skipgram-data.txt.gz
See the help for more information about each option.