-
Notifications
You must be signed in to change notification settings - Fork 4
Regions File Format
ANGSD-wrapper prefers the regions file to be formatted as chr_name:start_position-end_position
. Below, we will create a toy BED file as an example and show how we can go from BED file format to ANGSD-wrapper's regions file format.
Let's create an example BED file. You can run the below command anywhere on the command line. It will generate a small file with 10 lines.
for i in $(seq 1 10)
do
echo -e "chr${i}"'\t'"1"'\t'"10" >> toy_file.bed
done
The file should be created in your current working directory. Let's take a look at what the output toy_file.bed
looks like:
head toy_file.bed
The output should look like this:
chr1 1 10
chr2 1 10
chr3 1 10
chr4 1 10
chr5 1 10
chr6 1 10
chr7 1 10
chr8 1 10
chr9 1 10
chr10 1 10
Again, ANGSD-wrapper wants the regions file to be formatted as chr_name:start_position-end_position
. We will use a bash one-liner to do the format conversion.
# This command replaces the first tab delimiter with a ':'
# and replaces the second tab delimiter with a '-'
sed -e 's,\t,:,' toy_file.bed | sed -e 's,\t,-,'
The line above will output to STDOUT and should look like this:
chr1:1-10
chr2:1-10
chr3:1-10
chr4:1-10
chr5:1-10
chr6:1-10
chr7:1-10
chr8:1-10
chr9:1-10
chr10:1-10
To save this to a new file, let's run:
sed -e 's,\t,:,' toy_file.bed | sed -e 's,\t,-,' > toy_file_regions.txt
When working with your own data, you can use the sed
command above except replace the toy_file.bed
with your own BED file and change the output filename.