-
Notifications
You must be signed in to change notification settings - Fork 0
/
todo.txt
21 lines (16 loc) · 951 Bytes
/
todo.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
TODO
====
* Argue for likely correctness of our algorithm (and implementation) by
testing on a synthetic data set and on real data:
- Synthetic data set where we know the number of clusters that should be
returned from the algorithm. E.g. generate k random (non-alike)
sequences, make a number of permutations of these k sequences and the
resulting clustering should yield k clusters.
- Real data where we e.g. do multidimensional scaling and color code /
highlight centroids and the sequences belonging to each centroid. Then
centroids should not be close together (maybe they can be in multi.
dim. scaling?) and we better get the clusters that are apparent from
the visualization.
* Order centroids by unique word count and stop search after a specific
number of rejects. This idea might still be interesting with out current
bitset kmer occurrence algorithm.