Skip to content

Latest commit

 

History

History

bigcode-ast-tools

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 

bigcode-ast-tools

A set of commands to manipulate ASTs generated by bigcode-astgen packages.

Setup

Requirements

  • JDK >= 8
  • sbt >= 1.0
  • graphviz - only needed for visualize-ast command

For OSX users, sbt and graphviz are available through Homebrew:

brew install sbt
brew install graphviz

Compilation

The project can be compiled in a single executable jar by running

sbt assembly

once this is done,

./bin/bigcode-ast-tools -h

should provide some help about the available commands.

Commands

Visualizing an AST

AST files generated by bigcode-astgen can be visualized using the visualize-ast command.

./bin/bigcode-ast-tools visualize-ast <filepath> --index <index>

where <filepath> is a path to a generated JSON file and <index> is the index of the AST to visualize and defaults to 0, the first AST in the file. The command can also read from standard input if - is passed as filename, which allow to pipe astgen result directly:

bigcode-astgen-java MyFile.java | bigcode-ast-tools visualize-ast -

Extracting vocabulary

Vocabulary can be extracted from a file using

./bin/bigcode-ast-tools generate-vocabulary <filepath> --size <size> -o <output>

where <size> is the maximum size of the vocabulary. --strip-identifers can be passed to ignore all identifiers (value field in the JSON AST). The output is a tsv file containing the vocabulary.

Visualizing vocabulary distribution

The distribution of the vocabulary generated with generate-vocabulary can be visualized by using

./bin/bigcode-ast-tools visualize-vocabulary-distribution -v <vocabluary-path>

where <vocabluary-path> is the path to the vocabulary TSV file.

Generating "skipgram" data

Data can be generated to learn embeddings from the AST data.

./bin/bigcode-ast-tools generate-skipgram-data <filepath> -v <vocabluary-path> \
  --children-window-size 2 --ancestors-window-size 2 -o skipgram-data.txt.gz

See the help for more information about each option.