This repository has been retired, it has been superceded by https://github.com/arvados/lightning
Old README follows
l7g
is the main codebase for the Lightning system being developed
by Curoverse Research.
The repository contains documents, source code and pipelines for the various aspects of Lightning.
Code here should be considered "research grade" and is a work in progress.
Lightning is a system based on "genomic tiling". Genomes are split into small segments, on average roughly 250 base pairs (bp) long, and these small segments are called "tiles".
For a given population of genomic data, the genomic sequences are tiled with tiles that have redundant sequences de-duplicated. Coalescing all unique tiles creates a "Lightning tile library", where a source sequence from the population pool can be stored by using position references into the lightning tile library.
A compact representation of a genome can be created by storing arrays of indexes into the Lightning tile libary referencing their underlying sequence.
A representation of the compact genome representation we've developed is called "compact genome format" (CGF) that can represent a whole genome in ~30Mb, depending on the amount of low quality data in the original genome sample.
Common Workflow Language (CWL) pipelines for creating Lightning data.
Lightning documentation
go
(golang) programs used by Lightning.
Image directory for pictures.
A directory for the Lightning system prototype.
Authentication for Lightning prototype.
Subdirectory for experimental code.
Source and tools used by Lightning.