-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Minor performance improvements #110
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,10 +1,12 @@ | ||
//! The build graph, a graph between files and commands. | ||
|
||
use rustc_hash::FxHashMap; | ||
|
||
use crate::{ | ||
densemap::{self, DenseMap}, | ||
hash::BuildHash, | ||
}; | ||
use std::collections::HashMap; | ||
use std::collections::{hash_map::Entry, HashMap}; | ||
use std::path::{Path, PathBuf}; | ||
use std::time::SystemTime; | ||
|
||
|
@@ -258,7 +260,7 @@ pub struct Graph { | |
#[derive(Default)] | ||
pub struct GraphFiles { | ||
pub by_id: DenseMap<FileId, File>, | ||
by_name: HashMap<String, FileId>, | ||
by_name: FxHashMap<String, FileId>, | ||
} | ||
|
||
impl Graph { | ||
|
@@ -310,21 +312,26 @@ impl GraphFiles { | |
} | ||
|
||
/// Look up a file by its name, adding it if not already present. | ||
/// Name must have been canonicalized already. | ||
/// This function has a funny API to avoid copies; pass a String if you have | ||
/// one already, otherwise a &str is acceptable. | ||
pub fn id_from_canonical<S: AsRef<str> + Into<String>>(&mut self, file: S) -> FileId { | ||
self.lookup(file.as_ref()).unwrap_or_else(|| { | ||
// TODO: so many string copies :< | ||
let file = file.into(); | ||
let id = self.by_id.push(File { | ||
name: file.clone(), | ||
input: None, | ||
dependents: Vec::new(), | ||
}); | ||
self.by_name.insert(file, id); | ||
id | ||
}) | ||
/// Name must have been canonicalized already. Only accepting an owned | ||
/// string allows us to avoid a string copy and a hashmap lookup when we | ||
/// need to create a new id, but would also be possible to create a version | ||
/// of this function that accepts string references that is more optimized | ||
/// for the case where the entry already exists. But so far, all of our | ||
/// usages of this function have an owned string easily accessible anyways. | ||
pub fn id_from_canonical(&mut self, file: String) -> FileId { | ||
// TODO: so many string copies :< | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. BTW the comment here was a reminder to myself around my struggle to not have so many string copies here. The hashmap is mapping string -> File, and file has a .name field which is a string, so in theory you don't need a second copy of that string as the hashmap key. But I couldn't get all the lifetimes to work when I tried that. Fixing that would mean we no longer need two copies of every path string and would also save a lot of allocations on the load path... |
||
match self.by_name.entry(file) { | ||
Entry::Occupied(o) => *o.get(), | ||
Entry::Vacant(v) => { | ||
let id = self.by_id.push(File { | ||
name: v.key().clone(), | ||
input: None, | ||
dependents: Vec::new(), | ||
}); | ||
v.insert(id); | ||
id | ||
} | ||
} | ||
} | ||
|
||
pub fn all_ids(&self) -> impl Iterator<Item = FileId> { | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -56,17 +56,6 @@ pub struct Loader { | |
builddir: Option<String>, | ||
} | ||
|
||
impl parse::Loader for Loader { | ||
type Path = FileId; | ||
fn path(&mut self, path: &mut str) -> Self::Path { | ||
// Perf: this is called while parsing build.ninja files. We go to | ||
// some effort to avoid allocating in the common case of a path that | ||
// refers to a file that is already known. | ||
let len = canon_path_fast(path); | ||
self.graph.files.id_from_canonical(&path[..len]) | ||
} | ||
} | ||
|
||
impl Loader { | ||
pub fn new() -> Self { | ||
let mut loader = Loader::default(); | ||
|
@@ -76,10 +65,19 @@ impl Loader { | |
loader | ||
} | ||
|
||
/// Convert a path string to a FileId. For performance reasons | ||
/// this requires an owned 'path' param. | ||
fn path(&mut self, mut path: String) -> FileId { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's too bad, I went to all the effort to make this area reuse a single String buffer, but I understand why it must be done. RIP |
||
// Perf: this is called while parsing build.ninja files. We go to | ||
// some effort to avoid allocating in the common case of a path that | ||
// refers to a file that is already known. | ||
let len = canon_path_fast(&mut path); | ||
path.truncate(len); | ||
self.graph.files.id_from_canonical(path) | ||
} | ||
|
||
fn evaluate_path(&mut self, path: EvalString<&str>, envs: &[&dyn eval::Env]) -> FileId { | ||
use parse::Loader; | ||
let mut evaluated = path.evaluate(envs); | ||
self.path(&mut evaluated) | ||
self.path(path.evaluate(envs)) | ||
} | ||
|
||
fn evaluate_paths( | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the case where this might come up is when loading .n2_db, where the string paths are stored canonicalized so we know they don't need any mutation as we parse them. But I think that's only worth really thinking about if/when it comes time to speed up that codepath. (This comment is just refreshing my memory on this.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, though currently
read_str
indb.rs
reads owned strings. Using references would also be a tradeoff that requires keeping the db in memory, but we could do it.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way it works is the db is read fully at startup and all the paths are mapped to ids as they're read, so it doesn't really need an owned string in there. But I think it's also probably not the slow part, yet...