Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DO NOT MERGE] hpcSPAdes rebase, cleanup & implementation #1380

Open
wants to merge 109 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
109 commits
Select commit Hold shift + click to select a range
f3db51b
Try to use concurrent pair info buffer in pair info improver
asl Dec 18, 2024
4f34cc0
Ensure we're not losing variance in paired info improver.
asl Dec 20, 2024
2c30689
Make sure .clangd finds compile database by default
asl Dec 20, 2024
e8b4e55
Do better in distance estimator
asl Dec 20, 2024
00e00e4
Perform filtering on-fly during DE
asl Dec 20, 2024
06ef490
Use data structures with less overhead
asl Dec 23, 2024
74dab51
Filter missing paired info on fly
asl Dec 23, 2024
10c541c
CMAKE: Hook in MPI
eodus Oct 19, 2018
94e581b
Add MPI runtime detection
asl Dec 17, 2018
e93ec6b
CMAKE: Install spades-hpc
eodus Aug 14, 2018
cbc71df
Add mpi_console_writer log_writer interface impl
eodus Dec 27, 2018
0c795cb
Factor out mpi log writers as well
asl Sep 25, 2020
34c8047
Add mpi partask core
asl Sep 25, 2020
036f492
Add MPI test
asl May 26, 2020
f8856ff
User-friendly rank reporting
asl Sep 28, 2020
3ada731
Rudimentrary MPI stage manager & MPI stage
asl Sep 25, 2020
80e359c
Add TestMPI stage
eodus Nov 7, 2018
c2acf7d
Separate out SPAdes and hpcSPAdes binaries
asl Sep 28, 2020
de102a6
Even better SPAdes / hpcSPAdes separation
asl Sep 28, 2020
61de068
read_converter: changable chunk_num
olga24912 Oct 7, 2020
6a3260b
MPI: sequence mapper notifier
olga24912 Nov 2, 2020
721b3e4
MPI mismatch_corrector
olga24912 Oct 8, 2020
367c633
MPI: Gap closer
olga24912 Nov 2, 2020
76ee8e5
MPI: pair_info_count
olga24912 Nov 5, 2020
8f6c55f
Move implementation to .cpp. Small cleanup while there
asl Feb 26, 2021
e79d82e
Allow name ot be overriden
asl Feb 26, 2021
cf702b4
Factor out implementation into a separate class
asl Feb 26, 2021
9b23169
Simplify parallel processing
asl Feb 26, 2021
92ed240
Simplify
asl Feb 26, 2021
e883eee
Ensure the streams are fresh
asl Feb 26, 2021
bbe47f4
Add MPI construction stage
asl Feb 26, 2021
d688d05
First real MPI construction step: collect k-mer coverage in parallel
asl Feb 27, 2021
06b243c
add comment to DistanceEstimation
olga24912 Aug 4, 2021
4cc3a93
DistanceEstimator MPI
olga24912 Aug 5, 2021
fab5472
Distance Estimator MPI stage
olga24912 Aug 5, 2021
6774809
DistanceEstimator MPI wrapper
olga24912 Aug 5, 2021
268e301
MPI GraphCondensing
olga24912 Mar 7, 2021
16f829f
remove Seq from extansion index
olga24912 May 14, 2021
2cf1951
remove friend from UnbranchingPathExtractor
olga24912 May 14, 2021
6eb8855
style fix
olga24912 Jul 21, 2021
6065eee
MPI EarlyTipClipper
olga24912 Jun 12, 2021
f3d1517
MPI Build Extension Index
olga24912 Jul 22, 2021
e6c2730
comments for MergeKmerFileTask
olga24912 Jul 22, 2021
6daf2f8
verification that after split the #buckets the same in all storages
olga24912 Jul 22, 2021
a8e7dbb
make kpostorage const
olga24912 Jul 22, 2021
e2fb81d
comment for release_all
olga24912 Jul 22, 2021
2e167df
add comment for Splitter
olga24912 Jul 22, 2021
d5c76a1
resize one time in MergeKMers
olga24912 Jul 22, 2021
01d3bcd
VERIFY kmers in kmer storage are sorted and unique
olga24912 Jul 22, 2021
b8d7e90
fix warnings
olga24912 Jul 22, 2021
28b61aa
MPI KMerCounter
olga24912 Jul 26, 2021
567ded0
close stream after use
olga24912 Jul 27, 2021
ccf332f
constat for contig output stage
olga24912 Jul 27, 2021
14a78a7
add sync for read conversion
olga24912 Jul 27, 2021
86a72a3
MPI ATtipClipper
olga24912 Aug 31, 2021
4b07e3a
reduce code dublication
olga24912 Sep 2, 2021
eb770bb
fix pair_info_counter
olga24912 Oct 14, 2021
7074577
detach edge index on load
olga24912 Oct 27, 2021
8c03c10
detach edge index at the end of pair_info_counter
olga24912 Oct 28, 2021
7cbefe6
create lib Spades-MPI
olga24912 Oct 5, 2021
1c5f7cf
separate distEst
olga24912 Oct 12, 2021
9f99b34
separate construction_mpi
olga24912 Oct 14, 2021
c7172b1
separate test_mpi
olga24912 Oct 14, 2021
ff6d4c4
update distEst Arhitecture
olga24912 Nov 19, 2021
8e0b933
make SeqMapperNot in GCMPI as in GC
olga24912 Nov 23, 2021
878f80e
GapCloserBase
olga24912 Nov 23, 2021
21283a4
Separate MPI gap closer
olga24912 Nov 23, 2021
373c19b
mismatch_corrector with functor
olga24912 Nov 23, 2021
07baa4b
declarate MismatchShallNotPass to hpp
olga24912 Nov 23, 2021
169099c
separate MPI mismatch_correction
olga24912 Nov 23, 2021
0c1d6d6
make pair_info_count consistent with master version
olga24912 Nov 24, 2021
ac3c0c3
functor for FillEdgePairFilter
olga24912 Nov 24, 2021
8c4b374
separate PairInfoCount MPI
olga24912 Nov 25, 2021
34063de
MapLibFabric
olga24912 Nov 27, 2021
ad211e6
Separate SeqMapperNotifier
olga24912 Nov 29, 2021
effdce4
PerfectHashMapperBuilder MPI
olga24912 Nov 30, 2021
65952a8
move mpi_kmer_index_builder to mpi dir
olga24912 Nov 30, 2021
10ec1da
separate perfect_hash_map_builder_mpi
olga24912 Nov 30, 2021
ca82113
move kmer_extension_index_builder_mpi to mpi dir
olga24912 Nov 30, 2021
2fec348
Move partask and mpi stage to hpcSPAdes
asl Sep 12, 2024
5142cd7
run_on_load stage type
olga24912 Dec 2, 2021
55bf23a
namespace spaces
olga24912 Dec 2, 2021
eb63ef9
rename function by code style
olga24912 Dec 2, 2021
6c63149
delete mpi from local pipeline
olga24912 Dec 6, 2021
c8f2eb7
separate logger mpi
olga24912 Dec 6, 2021
5b00afc
Add time tracer annotations for MPI stage manager. Some cleanup here …
asl Jan 11, 2022
d47a7d5
Do not overwrite time traces from different nodes
asl Jan 11, 2022
5de1830
Cleanup
asl Jan 11, 2022
b3ee8f9
Time tracing for partask
asl Jan 11, 2022
e4da9df
Better time tracing
asl Jan 11, 2022
e14abbf
More information
asl Jan 11, 2022
17d0360
Annotate PathExtend
asl Jan 11, 2022
aee18fe
A bit more verbosity
asl Jan 11, 2022
09bb088
More events + some cleanups
asl Jan 11, 2022
3db594a
call process lib func in Mismatch Corrector
olga24912 Jan 13, 2022
77299f5
fix: allreduce only in sync
olga24912 Apr 1, 2022
512596b
fix: block size/NNodes
olga24912 Apr 4, 2022
0dbee0e
sequence mapper notifier MPI in paired info counter
olga24912 Apr 6, 2022
d674ad2
PairedInfoCounter SeqMapNot MPI
olga24912 Apr 6, 2022
70ac242
split streams on allthreads cnt
olga24912 Apr 11, 2022
462a701
Move MPI detection down to project
asl Sep 12, 2024
1c6f78a
Normalize include paths
asl Sep 12, 2024
55a9c93
Add spades.py bits
asl Sep 18, 2024
ff1574f
Add hpcSPAdes to list of known projects
asl Sep 18, 2024
d3ec07c
Add SLURM executor
asl Sep 21, 2024
208fb56
Better job names
asl Sep 21, 2024
4eb3a12
Fix some defaults
asl Sep 21, 2024
0e79a76
Run stuff via srun by default
asl Sep 21, 2024
3132f28
Fix some broken deps
asl Dec 24, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .clangd
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
CompileFlags:
CompilationDatabase: build_spades/
1 change: 0 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,6 @@ CMakeCache.txt
cmake_install.cmake
assembler/src/tools/quality/results*
__pycache__
.clangd
.DS_Store
compile_commands.json
.cache
Expand Down
2 changes: 2 additions & 0 deletions src/CMakeListsInternal.txt
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,11 @@ if (SPADES_BUILD_INTERNAL)
add_subdirectory(test/debruijn)
add_subdirectory(test/examples)
add_subdirectory(test/adt)
add_subdirectory(test/mpi)
else()
add_subdirectory(test/include_test EXCLUDE_FROM_ALL)
add_subdirectory(test/debruijn EXCLUDE_FROM_ALL)
add_subdirectory(test/mpi EXCLUDE_FROM_ALL)
add_subdirectory(test/adt EXCLUDE_FROM_ALL)
add_subdirectory(test/examples EXCLUDE_FROM_ALL)
endif()
1 change: 1 addition & 0 deletions src/cmake/includes.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ include_directories(SYSTEM "${Boost_INCLUDE_DIRS}")
if (SPADES_USE_TCMALLOC)
include_directories("${GOOGLE_PERFTOOLS_INCLUDE_DIR}")
endif()

if (SPADES_USE_JEMALLOC)
include_directories("$<TARGET_FILE_DIR:jemalloc-static>/../include")
endif()
2 changes: 1 addition & 1 deletion src/cmake/proj.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@

# Side-by-side subprojects layout: automatically set the
# SPADES_EXTERNAL_${project}_SOURCE_DIR using SPADES_ALL_PROJECTS
set(SPADES_ALL_PROJECTS "spades;hammer;ionhammer;corrector;spaligner;spades_tools;binspreader;pathracer")
set(SPADES_ALL_PROJECTS "spades;hammer;ionhammer;corrector;spaligner;spades_tools;binspreader;pathracer;hpcspades")
set(SPADES_EXTRA_PROJECTS "mts;online_vis;cds_subgraphs")
set(SPADES_KNOWN_PROJECTS "${SPADES_ALL_PROJECTS};${SPADES_EXTRA_PROJECTS}")
set(SPADES_ENABLE_PROJECTS "" CACHE STRING
Expand Down
14 changes: 14 additions & 0 deletions src/common/alignment/long_read_mapper.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,20 @@ class LongReadMapper: public SequenceMapperListener {
return g_;
}

void Serialize(std::ostream &os) const override {
storage_.BinWrite(os);
}

void Deserialize(std::istream &is) override {
storage_.BinRead(is);
}

void MergeFromStream(std::istream &is) override {
PathStorage<Graph> remote(g_);
remote.BinRead(is);
storage_.AddStorage(remote);
}

private:

void ProcessSingleRead(size_t thread_index, const omnigraph::MappingPath<EdgeId>& mapping, const io::SingleRead& r);
Expand Down
18 changes: 18 additions & 0 deletions src/common/alignment/rna/ss_coverage_filler.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@
#include "alignment/sequence_mapper_notifier.hpp"
#include "assembly_graph/paths/mapping_path.hpp"

#include "io/binary/binary.hpp"

namespace debruijn_graph {

class SSCoverageFiller: public SequenceMapperListener {
Expand Down Expand Up @@ -65,6 +67,22 @@ class SSCoverageFiller: public SequenceMapperListener {
storage_.IncreaseKmerCount(it.first, size_t(it.second));
tmp_storages_[thread_index].Clear();
}

void Serialize(std::ostream &os) const override {
io::binary::BinWrite(os, storage_);
}

void Deserialize(std::istream &is) override {
io::binary::BinRead(is, storage_);
}

void MergeFromStream(std::istream &is) override {
SSCoverageStorage remote(g_);
io::binary::BinRead(is, remote);
for (const auto& it : remote) {
storage_.IncreaseKmerCount(it.first, size_t(it.second));
}
}
};


Expand Down
3 changes: 1 addition & 2 deletions src/common/alignment/sequence_mapper_notifier.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,8 @@
#include "io/reads/read_stream_vector.hpp"

namespace debruijn_graph {

SequenceMapperNotifier::SequenceMapperNotifier(size_t lib_count)
: listeners_(lib_count)
: listeners_(lib_count)
{}

void SequenceMapperNotifier::Subscribe(SequenceMapperListener* listener, size_t lib_index) {
Expand Down
69 changes: 63 additions & 6 deletions src/common/alignment/sequence_mapper_notifier.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
#include "io/reads/paired_read.hpp"
#include "io/reads/read_stream_vector.hpp"
#include "utils/perf/timetracer.hpp"
#include "utils/stl_utils.hpp"

#include <string>
#include <vector>
Expand All @@ -36,7 +37,23 @@ class SequenceMapperListener {
virtual void ProcessSingleRead(size_t /* thread_index */, const io::SingleReadSeq& /* r */, const omnigraph::MappingPath<EdgeId>& /* read */) {}

virtual void MergeBuffer(size_t /* thread_index */) {}


virtual void Serialize(std::ostream&) const {
VERIFY_MSG(false, "Serialize() is not implemented");
}

virtual void Deserialize(std::istream&) {
VERIFY_MSG(false, "Deserialize() is not implemented");
}

virtual void MergeFromStream(std::istream&) {
VERIFY_MSG(false, "MergeFromStream() is not implemented");
}

virtual const std::string name() const {
return utils::type_name(typeid(*this).name());
}

virtual ~SequenceMapperListener() {}
};

Expand All @@ -56,8 +73,7 @@ class SequenceMapperNotifier {
const SequenceMapperT& mapper, size_t threads_count = 0) {
return ProcessLibrary(streams, 0, mapper, threads_count);
}

private:

template<class ReadType>
void ProcessLibrary(io::ReadStreamList<ReadType>& streams,
size_t lib_index, const SequenceMapperT& mapper, size_t threads_count = 0) {
Expand All @@ -68,7 +84,7 @@ class SequenceMapperNotifier {
threads_count = streams.size();

streams.reset();
NotifyStartProcessLibrary(lib_index, threads_count);
NotifyStartProcessLibrary(lib_index, streams.size());
size_t counter = 0, n = 15;

#pragma omp parallel for num_threads(threads_count) shared(counter)
Expand Down Expand Up @@ -97,9 +113,10 @@ class SequenceMapperNotifier {
counter += size;
}

for (size_t i = 0; i < threads_count; ++i)
for (size_t i = 0; i < streams.size(); ++i)
NotifyMergeBuffer(lib_index, i);

streams.close();
INFO("Total " << counter << " reads processed");
NotifyStopProcessLibrary(lib_index);
}
Expand All @@ -114,7 +131,47 @@ class SequenceMapperNotifier {

void NotifyMergeBuffer(size_t ilib, size_t ithread) const;

std::vector<std::vector<SequenceMapperListener*> > listeners_; //first vector's size = count libs
protected:
std::vector<ListenersContainer> listeners_; //first vector's size = count libs
};

class MapLibBase {
public:
virtual void operator() (const std::vector<SequenceMapperListener*>& listeners, const SequenceMapper<Graph>& mapper, io::ReadStreamList<io::PairedRead>& streams) const = 0;
virtual void operator() (const std::vector<SequenceMapperListener*>& listeners, const SequenceMapper<Graph>& mapper, io::ReadStreamList<io::SingleRead>& streams) const = 0;
virtual void operator() (const std::vector<SequenceMapperListener*>& listeners, const SequenceMapper<Graph>& mapper, io::ReadStreamList<io::SingleReadSeq>& streams) const = 0;
virtual void operator() (const std::vector<SequenceMapperListener*>& listeners, const SequenceMapper<Graph>& mapper, io::ReadStreamList<io::PairedReadSeq>& streams) const = 0;

template<class Streams>
void operator() (SequenceMapperListener* listener, const SequenceMapper<Graph>& mapper, Streams& streams) const {
this->operator() (std::vector<SequenceMapperListener*>(1, listener), mapper, streams);
}
};

class MapLibFunc : public MapLibBase {
public:
void operator() (const std::vector<SequenceMapperListener*>& listeners, const SequenceMapper<Graph>& mapper, io::ReadStreamList<io::PairedRead>& streams) const override {
MapLib(listeners, mapper, streams);
}
void operator() (const std::vector<SequenceMapperListener*>& listeners, const SequenceMapper<Graph>& mapper, io::ReadStreamList<io::SingleRead>& streams) const override {
MapLib(listeners, mapper, streams);
}
void operator() (const std::vector<SequenceMapperListener*>& listeners, const SequenceMapper<Graph>& mapper, io::ReadStreamList<io::SingleReadSeq>& streams) const override {
MapLib(listeners, mapper, streams);
}
void operator() (const std::vector<SequenceMapperListener*>& listeners, const SequenceMapper<Graph>& mapper, io::ReadStreamList<io::PairedReadSeq>& streams) const override {
MapLib(listeners, mapper, streams);
}

private:
template<class ReadType>
void MapLib(const std::vector<SequenceMapperListener*>& listeners, const SequenceMapper<Graph>& mapper, io::ReadStreamList<ReadType>& streams) const {
SequenceMapperNotifier notifier;
for (auto listener: listeners) {
notifier.Subscribe(listener);
}
notifier.ProcessLibrary(streams, mapper);
}
};

} // namespace debruijn_graph
Expand Down
Loading
Loading