Skip to content

Running sort results on your data

Blekhman Lab edited this page Jul 11, 2017 · 5 revisions

Running hominid_sort_results on your data

What does hominid_sort_results do?

Once you have run hominid and hominid_stability_selection, you've found the SNPs that are significantly correlated with the microbiome abundances, and the taxa (or other covariates) that are associated with those SNPs.

To make graphing of those associated taxa's abundances easier (e.g., to make a boxplot of the transformed abundances for the three different SNP alleles, like figure 4 from the HOMINID paper) use hominid_sort_results to pull out the (transformed) abundances of those taxa.

hominid_sort_results runs on a single processor.

hominid_sort_results command-line arguments

Command-line arguments 1 through 6 are all required and are expected in this order.

  1. the output file from hominid_stability_selection

  2. the input OTU/taxon table that was used as input to hominid and hominid_stability_selection

  3. transformation of the input abundance data. Use the same value as in hominid and hominid_stability_selection

  4. cutoff for R2: hominid_sort_results will print out results for SNPs whose R2 is greater than the cutoff.

  5. cutoff for the stability score. hominid_sort_results prints out results for the taxa whose stability score are greater than or equal to the cutoff.

  6. SNP count: print out results for this many SNPs.

  7. optionally print out extra columns from the hominid_stability_selection output. If you want to retain the SNP annotation columns in the output of hominid_sort_results, include those columns with this optional command-line argument. For example, if you want to include the CHROM, POS, ID, REF, and ALT columns in the output, command-line argument 7 would be specified like this:

    --extra-columns=CHROM,POS,ID,REF,ALT
    

To see a sample hominid_sort_results command, see test_sort_results.sh

Output file format

The output file is in this format:

GENE_ID SNP_ID rsq_median
  stability_score OTU/taxon/covariate
        abundance       variant_allele_count    genotype        sample_id	[additional columns specified by "--extra-columns" command-line option]
[data row for sample A]
[data row for sample B]
[data row for sample C]
...

The values under the abundance column are the transformed abundances, not the abundances in the input OTU/taxon table.