Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Completely Revamped Clustering Algorithm #200

Merged
merged 64 commits into from
Oct 27, 2023
Merged
Changes from 1 commit
Commits
Show all changes
64 commits
Select commit Hold shift + click to select a range
1bbce83
feat: initial rtree implementation
TurtIeSocks Sep 19, 2023
4dafe65
refactor: tsp uses s2 cells for splitting as well
TurtIeSocks Sep 19, 2023
af5618f
lots of cleanup and time optimizing
TurtIeSocks Sep 19, 2023
aec6593
fix: faster, better
TurtIeSocks Sep 20, 2023
f017abf
ghetto annealing
TurtIeSocks Sep 20, 2023
4f597a0
add iteration based strategy
TurtIeSocks Sep 20, 2023
af984bf
cleanup
TurtIeSocks Sep 20, 2023
e2ba8a9
refactor: cpp type alises
TurtIeSocks Sep 29, 2023
8178d50
accuracy improvements
TurtIeSocks Oct 18, 2023
d72633c
fix: comment logs
TurtIeSocks Oct 18, 2023
a55c7ec
fix: logs
TurtIeSocks Oct 19, 2023
1278404
refactor: calculating mygod score and add it to stats
TurtIeSocks Oct 19, 2023
bfb43b0
refactor: steps towards normalizing stats
TurtIeSocks Oct 20, 2023
f5d9fb1
feat: add routing time to stats
TurtIeSocks Oct 20, 2023
9c06d04
refactor: move stats into algorithm crate & response into api crate
TurtIeSocks Oct 20, 2023
48c10d4
fix: more complete and streamlined cluster stats
TurtIeSocks Oct 20, 2023
9640510
fix: distance stats
TurtIeSocks Oct 20, 2023
1d4ef5e
feat: route stats calculating api
TurtIeSocks Oct 20, 2023
3728be6
fix: send min_points
TurtIeSocks Oct 20, 2023
5b36753
fix: more friendly towards already associated routes
TurtIeSocks Oct 20, 2023
34da95d
fix: do an early state update while waiting for req
TurtIeSocks Oct 21, 2023
d6a7089
fix: lookup area by geofence_id too
TurtIeSocks Oct 21, 2023
38baafb
fix: deduping perf improvement
TurtIeSocks Oct 21, 2023
b48b2c6
feat: add stats time
TurtIeSocks Oct 21, 2023
c29d704
fix: better stats time
TurtIeSocks Oct 21, 2023
2f02da9
feat: add `max_clusters` to api
TurtIeSocks Oct 21, 2023
c4f131a
refactor: filter and sort clusters
TurtIeSocks Oct 21, 2023
0e3cd76
fix: point_covered stats for rtree
TurtIeSocks Oct 21, 2023
6802c1f
fix: sorting & generate clusters with s2 cells
TurtIeSocks Oct 21, 2023
d4b6087
refactor: remove balanced and brute force from api/client
TurtIeSocks Oct 22, 2023
8a0af73
refactor: remove balanced code
TurtIeSocks Oct 22, 2023
2f033a1
refactor: remove bruteforce code
TurtIeSocks Oct 22, 2023
28bb5d7
refactor: rename fast => fastest
TurtIeSocks Oct 22, 2023
8bf1aee
refactor: deprecated `fast` arg results in `fastest`
TurtIeSocks Oct 22, 2023
bea7d6d
feat: fast, balanced, better modes
TurtIeSocks Oct 22, 2023
0aef45a
fix: remove `.clone()`
TurtIeSocks Oct 22, 2023
5ad083e
fix: stricter cluster filtering
TurtIeSocks Oct 22, 2023
a4b9e00
fix: logged avg stats
TurtIeSocks Oct 22, 2023
d84defa
refactor: normalize sorting/routing logic
TurtIeSocks Oct 23, 2023
82096e6
fix: merge cluster/route modes in client
TurtIeSocks Oct 23, 2023
0c543b7
feat: add `none` option to `sort_by`
TurtIeSocks Oct 23, 2023
0f17d40
fix: hide routing options on bootstrap
TurtIeSocks Oct 23, 2023
50dcc7b
refactor: small cleanup
TurtIeSocks Oct 23, 2023
87c1adc
feat: add progress logger
TurtIeSocks Oct 23, 2023
a8bb859
refactor: better logs
TurtIeSocks Oct 23, 2023
e8ed2b3
chore: package updates
TurtIeSocks Oct 23, 2023
7a693f9
chore: crate version uplifts
TurtIeSocks Oct 23, 2023
058f3e2
refactor: move fake info log to its own fn
TurtIeSocks Oct 23, 2023
b7b1820
feat: impl Display for `Point` & `Cluster`
TurtIeSocks Oct 23, 2023
ea31e39
refactor: move `project` into its own module
TurtIeSocks Oct 23, 2023
21d4939
fix: better intellisense for bbox
TurtIeSocks Oct 23, 2023
eee6263
refactor: deduping fn
TurtIeSocks Oct 23, 2023
d2dca41
possibly lighten memory footprint
TurtIeSocks Oct 24, 2023
c302de2
improve clustering speed by 95% :kek:
TurtIeSocks Oct 25, 2023
30b1735
improve possible cluster generation by 75%
TurtIeSocks Oct 25, 2023
ea5ae93
fix: empty point check
TurtIeSocks Oct 25, 2023
4dcabf9
fix: unnecessary compensating
TurtIeSocks Oct 25, 2023
4072d8e
feat: add mimalloc
TurtIeSocks Oct 25, 2023
91cdf7d
fix: better os targeting for mem allocator
TurtIeSocks Oct 25, 2023
2e06cee
fix: remove unused
TurtIeSocks Oct 25, 2023
8df0d64
refactor: time and memory usage reductions
TurtIeSocks Oct 26, 2023
d425caa
refactor: cleanup
TurtIeSocks Oct 27, 2023
337ed88
refactor: more cleanups & 1 edge case
TurtIeSocks Oct 27, 2023
de6e4ee
docs
TurtIeSocks Oct 27, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
add iteration based strategy
TurtIeSocks committed Sep 20, 2023
commit 4f597a0c5fd3521c490b10c95ffd04cc84e9974a
383 changes: 282 additions & 101 deletions server/algorithms/src/clustering/rtree/mod.rs
Original file line number Diff line number Diff line change
@@ -3,16 +3,14 @@ mod point;
use hashbrown::HashSet;
use model::api::{single_vec::SingleVec, stats::Stats, Precision};
use point::Point;
use rand::rngs::mock::StepRng;
use rand::Rng;
use rayon::prelude::{IntoParallelRefIterator, ParallelIterator};
use rstar::RTree;
use shuffle::{irs::Irs, shuffler::Shuffler};
use std::time::Instant;

use crate::s2::create_cell_map;

struct Comparer {
cluster: HashSet<Point>,
struct Comparer<'a> {
cluster: HashSet<&'a Point>,
missed: usize,
score: usize,
}
@@ -41,37 +39,47 @@ pub fn main(
) -> SingleVec {
let time = Instant::now();

log::info!(
"[RTREE] starting algorithm with {} data points",
points.len()
);
stats.total_points = points.len();

let mut return_set = HashSet::new();
let mut missing_count = 0;
let cell_maps = create_cell_map(&points, cluster_split_level);

let mut handlers = vec![];
for (key, values) in cell_maps.into_iter() {
log::debug!("Total {}: {}", key, values.len());
handlers.push(std::thread::spawn(move || {
let tree = point::main(radius, values);
setup(tree, radius, min_points, time)
}));
}
for thread in handlers {
match thread.join() {
Ok((results, missing)) => {
return_set.extend(results);
missing_count += missing;
}
Err(e) => {
log::error!("[S2] Error joining thread: {:?}", e)
let (return_set, missing_count) = if cluster_split_level == 1 {
setup(points, radius, min_points, time)
} else {
let cell_maps = create_cell_map(&points, cluster_split_level);

let mut handlers = vec![];
for (key, values) in cell_maps.into_iter() {
log::debug!("[RTREE] Total {}: {}", key, values.len());
handlers.push(std::thread::spawn(move || {
setup(values, radius, min_points, time)
}));
}
log::info!("[RTREE] created {} threads", handlers.len());

let mut return_set = HashSet::new();
let mut missing_count = 0;
for thread in handlers {
match thread.join() {
Ok((results, missing)) => {
return_set.extend(results);
missing_count += missing;
}
Err(e) => {
log::error!("[RTREE] error joining thread: {:?}", e)
}
}
}
}
(return_set, missing_count)
};

stats.points_covered = stats.total_points - missing_count;
stats.total_clusters = return_set.len();
stats.cluster_time = time.elapsed().as_secs_f64();

println!("total time: {}s", time.elapsed().as_secs_f32());
log::info!("[RTREE] total time: {}s", time.elapsed().as_secs_f32());

return_set.into_iter().map(|p| p.center).collect()
}
@@ -89,39 +97,46 @@ fn get_clusters(point: &Point, neighbors: Vec<&Point>, segments: usize) -> HashS
set
}

fn setup(
tree: RTree<Point>,
radius: f64,
min_points: usize,
time: Instant,
) -> (HashSet<Point>, usize) {
println!("made tree: {}", time.elapsed().as_secs_f32());

let points: Vec<&Point> = tree.iter().map(|p| p).collect();
fn get_initial_clusters(points: &SingleVec, radius: f64, time: Instant) -> Vec<Point> {
let double_tree = point::main(radius * 2., points);
log::info!(
"[RTREE] Generated second tree with double radius: {}",
time.elapsed().as_secs_f32()
);

let mut stats = Stats::new();
stats.total_points = points.len();
let tree_points: Vec<&Point> = double_tree.iter().map(|p| p).collect();

let initial_clusters = points
let clusters = tree_points
.par_iter()
.map(|point| {
let neighbors = tree.locate_within_distance(point.center, radius * 2.);
let neighbors = double_tree.locate_all_at_point(&point.center);
get_clusters(point, neighbors.into_iter().collect(), 8)
})
.reduce(HashSet::new, |a, b| a.union(&b).cloned().collect());

println!(
"generated potential clusters: {}s",
log::info!(
"[RTREE] generated {} potential clusters: {}",
clusters.len(),
time.elapsed().as_secs_f32()
);
clusters.into_iter().collect::<Vec<Point>>()
}

println!("Data {} Clusters {}", tree.size(), initial_clusters.len());

// let cluster_tree = RTree::bulk_load(initial_clusters.into_iter().collect());
fn setup(
points: Vec<[f64; 2]>,
radius: f64,
min_points: usize,
time: Instant,
) -> (HashSet<Point>, usize) {
let tree = point::main(radius, &points);
log::info!(
"[RTREE] made primary tree: {}s",
time.elapsed().as_secs_f32()
);

let initial_clusters = initial_clusters.into_iter().collect::<Vec<Point>>();
let initial_clusters = get_initial_clusters(&points, radius, time);

let mut clusters_with_data: Vec<Cluster> = initial_clusters
let clusters_with_data: Vec<Cluster> = initial_clusters
.par_iter()
.map(|cluster| {
let points = tree
@@ -133,69 +148,208 @@ fn setup(
}
})
.collect();
println!(
"added potential clusters: {}s",
log::info!(
"[RTREE] added data to cluster structs: {}s",
time.elapsed().as_secs_f32()
);

// let mut cluster_map = HashMap::<Point, Vec<&Point>>::new();
// for (key, values) in clusters_with_data {
// cluster_map.insert(*key, values);
// }
println!("created cluster map: {}s", time.elapsed().as_secs_f32());
iter_clustering(min_points, points.len(), &clusters_with_data, time)
// (comparison.cluster, comparison.missed)
}

fn clustering(
min_points: usize,
total_points: usize,
clusters_with_data: &Vec<Cluster>,
time: Instant,
) -> (HashSet<Point>, usize) {
log::info!("Starting clustering: {}", time.elapsed().as_secs_f32());
let mut new_clusters = HashSet::<&Point>::new();
let mut blocked_clusters = HashSet::<&Point>::new();
let mut blocked_points = HashSet::<&Point>::new();

let mut highest = 100;
while highest > min_points {
let local_clusters = clusters_with_data
.par_iter()
.filter_map(|cluster| {
if blocked_clusters.contains(&cluster.point) {
None
} else {
Some((
&cluster.point,
cluster
.points
.iter()
.filter_map(|p| {
if blocked_points.contains(p) {
None
} else {
Some(*p)
}
})
.collect::<Vec<&Point>>(),
))
}
})
.collect::<Vec<(&Point, Vec<&Point>)>>();

let mut best = 0;
for (cluster, points) in local_clusters.iter() {
let length = points.len() + 1;

if length > best {
best = length;
}
if length >= highest {
if blocked_clusters.contains(*cluster) || length == 0 {
continue;
}
let mut count = 0;
for point in points {
if !blocked_points.contains(*point) {
count += 1;
}
}
if count >= min_points {
for point in points {
blocked_points.insert(point);
}
blocked_clusters.insert(cluster);
new_clusters.insert(*cluster);
}
}
}
highest = best;
// println!("Current: {} | {}", highest, new_clusters.len());
}
log::info!("Finished clustering: {}", time.elapsed().as_secs_f32());
(
new_clusters.into_iter().map(|p| *p).collect(),
total_points - blocked_points.len(),
)
}

fn iter_clustering(
min_points: usize,
total_points: usize,
clusters_with_data: &Vec<Cluster>,
time: Instant,
) -> (HashSet<Point>, usize) {
log::info!("Starting clustering: {}", time.elapsed().as_secs_f32());

let mut stats = Stats::new();
stats.total_points = total_points;

let mut rng = StepRng::new(2, 13);
let mut irs = Irs::default();
let mut comparison = Comparer {
cluster: HashSet::new(),
missed: 0,
score: usize::MAX,
};
let mut tries = 0;

while tries < 50 {
match irs.shuffle(&mut clusters_with_data, &mut rng) {
Ok(_) => {
log::info!("Shuffled!")
let mut rng = rand::thread_rng();
let length = clusters_with_data.len();

// let mut highest = 100;
let mut new_clusters = HashSet::<&Point>::new();
let mut blocked_clusters = HashSet::<usize>::new();
let mut blocked_points = HashSet::<&Point>::new();
let mut total_iterations = 0;

while total_iterations <= 1_000_000 {
// log::info!("Starting iteration {}", total_iterations);
let mut fails = 0;
while blocked_points.len() != total_points {
// log::info!(
// "Looping: {} | {} | {}",
// comparison.cluster.len(),
// blocked_points.len(),
// fails
// );

if fails > 100 {
// log::info!("Breaking iteration {}", total_iterations);
break;
}
Err(e) => {
log::warn!("Error while shuffling: {}", e);
let blocked_point_ref = blocked_points.len();

let mut random_index = rng.gen_range(0..length);
while blocked_clusters.contains(&random_index) {
// log::info!(
// "Checking index: {} | {}",
// random_index,
// blocked_clusters.contains(&random_index)
// );
random_index = rng.gen_range(0..length)
}
// log::info!(
// "Found Index: {} | {} | {}",
// random_index,
// blocked_points.len(),
// fails
// );
blocked_clusters.insert(random_index);

let cluster = &clusters_with_data[random_index];
let valid_points: Vec<&&Point> = cluster
.points
.iter()
.filter(|p| !blocked_points.contains(*p))
.collect();
if valid_points.len() >= min_points {
for point in valid_points.iter() {
blocked_points.insert(*point);
}
new_clusters.insert(&cluster.point);
}
if blocked_point_ref == blocked_points.len() {
fails += 1;
// break;
}
// log::info!("Loop finished: {}", time.elapsed().as_secs_f32());
}

let (new_clusters, missed) =
clustering(min_points, stats.total_points, &clusters_with_data);

let missed = total_points - blocked_points.len();
stats.total_clusters = new_clusters.len();
stats.points_covered = stats.total_points - missed;
stats.points_covered = total_points - missed;
let current_score = stats.get_score(min_points);
if current_score < comparison.score {
println!("Current Score: {}", current_score);
comparison.cluster = new_clusters;

if current_score < comparison.score
// && if comparison.cluster.is_empty() {
// true
// } else {
// comparison.cluster.len() >= stats.total_clusters
// }
{
log::info!(
"Old Score: {} | New Score: {}| Iteration {}",
comparison.score,
current_score,
total_iterations,
);
log::info!(
"Covered: {} | Clusters: {}",
stats.points_covered,
stats.total_clusters
);
comparison.cluster = new_clusters.clone();
comparison.missed = missed;
comparison.score = current_score;
}
tries += 1;
}
// let (new_clusters, missed) = clustering(min_points, stats.total_points, &clusters_with_data);
fails = 0;
new_clusters.clear();
blocked_clusters.clear();
blocked_points.clear();

println!("while loop finished: {}s", time.elapsed().as_secs_f32());

// let missed = tree.iter().count() - block_points.len();
// clustering(min_points, stats.total_points, &clusters_with_data)
(comparison.cluster, comparison.missed)
total_iterations += 1;
}
log::info!("Finished clustering: {}", time.elapsed().as_secs_f32());
(
comparison.cluster.into_iter().map(|p| *p).collect(),
comparison.missed,
)
}

fn clustering(
min_points: usize,
total_points: usize,
clusters_with_data: &Vec<Cluster>,
) -> (HashSet<Point>, usize) {
let mut highest = 100;
let mut new_clusters = HashSet::<Point>::new();
let mut block_clusters = HashSet::<&Point>::new();
let mut block_points = HashSet::<&Point>::new();

/*
while highest > min_points {
let local_clusters = clusters_with_data
.par_iter()
@@ -251,16 +405,43 @@ fn clustering(
// println!("Current: {} | {}", highest, new_clusters.len());
}
(new_clusters, total_points - block_points.len())
}

// for point in tree.iter() {
// let neighbors = tree.locate_within_distance(point.center, radius * 2.);
// get_clusters(
// point,
// neighbors.into_iter().collect(),
// 8,
// &mut initial_clusters,
// );
// initial_clusters.insert(*point);
*/

// let mut rng = StepRng::new(2, 13);
// let mut irs = Irs::default();
// let mut comparison = Comparer {
// cluster: HashSet::new(),
// missed: 0,
// score: usize::MAX,
// };
// let mut tries = 0;

// while tries < 10 {
// println!("Starting {} of {}", tries, 10);
// match irs.shuffle(&mut clusters_with_data, &mut rng) {
// Ok(_) => {
// log::info!("Shuffled!")
// }
// Err(e) => {
// log::warn!("Error while shuffling: {}", e);
// continue;
// }
// }

// let (new_clusters, missed) =
// clustering(min_points, stats.total_points, &clusters_with_data);

// stats.total_clusters = new_clusters.len();
// stats.points_covered = stats.total_points - missed;
// let current_score = stats.get_score(min_points);
// if current_score < comparison.score {
// println!("Current Score: {}", current_score);
// comparison.cluster = new_clusters;
// comparison.missed = missed;
// comparison.score = current_score;
// }
// tries += 1;
// }
// let (new_clusters, missed) = clustering(min_points, stats.total_points, &clusters_with_data);

// let missed = tree.iter().count() - block_points.len();
6 changes: 3 additions & 3 deletions server/algorithms/src/clustering/rtree/point.rs
Original file line number Diff line number Diff line change
@@ -110,10 +110,10 @@ impl Default for Point {
}
}

pub fn main(radius: f64, points: SingleVec) -> RTree<Point> {
pub fn main(radius: f64, points: &SingleVec) -> RTree<Point> {
let spawnpoints = points
.into_iter()
.map(|p| Point::new(radius, p))
.iter()
.map(|p| Point::new(radius, *p))
.collect::<Vec<_>>();
RTree::bulk_load(spawnpoints)
}