Skip to content

Regions File Format

Chaochih Liu edited this page Aug 19, 2019 · 2 revisions

ANGSD-wrapper prefers the regions file to be formatted as chr_name:start_position-end_position. Below, we will create a toy BED file as an example and show how we can go from BED file format to ANGSD-wrapper's regions file format.

Create toy BED file

Let's create an example BED file. You can run the below command anywhere on the command line. It will generate a small file with 10 lines.

for i in $(seq 1 10)
do
    echo -e "chr${i}"'\t'"1"'\t'"10" >> toy_file.bed
done

The file should be created in your current working directory. Let's take a look at what the output toy_file.bed looks like:

head toy_file.bed

The output should look like this:

chr1	1	10
chr2	1	10
chr3	1	10
chr4	1	10
chr5	1	10
chr6	1	10
chr7	1	10
chr8	1	10
chr9	1	10
chr10	1	10

Generate correctly formatted regions file

Again, ANGSD-wrapper wants the regions file to be formatted as chr_name:start_position-end_position. We will use a bash one-liner to do the format conversion.

# This command replaces the first tab delimiter with a ':'
# and replaces the second tab delimiter with a '-'
sed -e 's,\t,:,' toy_file.bed | sed -e 's,\t,-,'

The line above will output to STDOUT and should look like this:

chr1:1-10
chr2:1-10
chr3:1-10
chr4:1-10
chr5:1-10
chr6:1-10
chr7:1-10
chr8:1-10
chr9:1-10
chr10:1-10

To save this to a new file, let's run:

sed -e 's,\t,:,' toy_file.bed | sed -e 's,\t,-,' > toy_file_regions.txt

When working with your own data, you can use the sed command above except replace the toy_file.bed with your own BED file and change the output filename.