-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf tracking issues #313
Comments
Oh dear; I believe I've just found the issue, and it's right here in this equation:
for libc, I modified cli to output some stats:
Basically, almost all of the memory in panopticon is used by Of course an obvious solution to this problem is to reduce the |
Thanks for the debugging! I guess we have to come up with a way to encode RREIL in a more space saving manner. |
Soooo I already did this cause I got bored one fine Sunday. I'll post the work in progress here, or open a new issue so you can see what I wa thinking. But as of now it does reduce the statement to 24 bytes, which is the minimum target it needs to hit imho. I recommend if/when we do this, we change the Vec to initially a type alias like Statements = Vec Why? I've reviewed the required refactor and it's going to suck, mostly because vec of statements is everywhere (and also a string table needs to e added everywhere but that's not so bad) When we do the refactor, if we change the statement vector to our type alias, or even better a struct wrapper, with a custom accessor, we should be free to explore encoding the vector itself. Why would we want to do that? Well, if we also encode the vector, I'm pretty sure we can achieve some incredible amortized space savings by using leb128 for most values in the RREIL IR. Basically this observation comes from the fact that many values are indexed which will be 1-200, say for register names, etc. this will be a single byte instead of 8. This will add up quickly, and it's not just for string table indexes, but addition in operands, etc. If we use the type alias/wrapper we can then add this additional optimization later on, which essentially should be completely transparent to al other panopticon code except for less memory usage |
Good idea. As long as Statements allows iteration, removal and insertion it should be easy to replace all uses of Vec. |
Good idea, in fact, when we do the refactor we should make statements have that exact API so any changes underneath are completely invisible to clients. For now using Vec it will just use the Vec impl for those. We can then do crazy optimization with variable length integers later on if memory pressure becomes problematic again |
Here is the experimental "binary RREIL" encoding I mentioned, which uses 27 bytes per Statement (compared to previous 220): use scroll::{Pread, Gread, Pwrite, Gwrite};
use {Result, Rvalue, Lvalue, Statement};
use serde::{Serialize,Deserialize};
use std::fmt::{self, Formatter, Debug};
use std::collections::HashMap;
///
pub struct StrTable {
data: HashMap<String, u32>,
idx: u32,
}
impl StrTable {
///
pub fn new() -> Self {
StrTable { data: HashMap::<String, u32>::new(), idx: 1 }
}
///
pub fn get(&self, key: &str) -> Option<&u32> {
self.data.get(key)
}
///
pub fn find(&self, idx: u32) -> Option<&String> {
for (name, name_idx) in self.data.iter() {
if *name_idx == idx {
return Some(name)
}
}
None
}
///
pub fn push(&mut self, key: &str) {
self.data.insert(key.to_owned(), self.idx);
self.idx += 1;
}
}
///
#[derive(Clone,PartialEq,Eq,Serialize,Deserialize)]
pub struct RvalueBits {
bits: [u8; 9]
}
impl fmt::Debug for RvalueBits {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
for byte in self.bits.iter() {
write!(f, "{:08b}", byte)?;
}
Ok(())
}
}
impl RvalueBits {
/// bits
pub fn bits(&self) -> &[u8] {
&self.bits[..]
}
///
pub fn into(self, strtable: &StrTable) -> Result<Rvalue> {
let tag = self.bits[0];
if tag == 0xff {
Ok(Rvalue::Undefined)
} else if tag & 0x1 == 0 { // constant
let size = ((tag & 0xfe) >> 1) as usize;
let value: u64 = self.bits.pread(1)?;
Ok(Rvalue::Constant { value, size })
} else if tag & 0x1 == 1 { // variable
let size = ((tag & 0xfe) >> 1) as usize;
let offset = self.bits[1] as usize;
let idx = self.bits.pread::<u32>(2)?;
let subscript = {
let s = self.bits.pread::<u16>(6)?;
if s == 0 { None } else { Some(s as usize) }
};
let name = strtable.find(idx).unwrap();
Ok(Rvalue::Variable { name: name.clone().into(), subscript, offset, size })
} else {
Err(format!("Bad bits {:?}", self.bits).into())
}
}
///
pub fn from_bytes(bytes: &[u8]) -> Result<Self> {
if bytes.len() < 9 {
Err("Rvalue requires 9 byte array".into())
} else {
let mut bits = [0; 9];
bytes.gread_inout(&mut 0, &mut bits[..])?;
Ok(RvalueBits {
bits
})
}
}
///
pub fn new(rvalue: Rvalue, strtable: &StrTable) -> Result<Self> {
// s-vvvv_vvvv
let bits = match rvalue {
Rvalue::Undefined => [0xff; 9],
Rvalue::Constant { value, size } => {
let mut scratch = [0; 9];
let size = size as u8;
let o = &mut 0;
scratch.gwrite((size << 1) & 0xfe, o)?;
scratch.gwrite(value, o)?;
scratch
},
Rvalue::Variable { name, subscript, offset, size } => {
let mut scratch = [0; 9];
let size = size as u8;
let offset = offset as u8;
let subscript = subscript.unwrap_or(0) as u16;
let idx = *strtable.get(&*name).unwrap_or(&0);
let o = &mut 0;
scratch.gwrite((size << 1) | 0x1, o)?;
scratch.gwrite(offset, o)?;
scratch.gwrite(idx, o)?;
scratch.gwrite(subscript, o)?;
scratch
},
};
Ok(RvalueBits { bits })
}
}
///
#[derive(Clone,PartialEq,Eq,Serialize,Deserialize)]
pub struct LvalueBits {
bits: [u8; 7]
}
impl Debug for LvalueBits {
fn fmt(&self, f: &mut Formatter) -> fmt::Result {
for byte in self.bits.iter() {
write!(f, "{:08b}", byte)?;
}
Ok(())
}
}
impl LvalueBits {
/// bits
pub fn bits(&self) -> &[u8] {
&self.bits[..]
}
///
pub fn into(self, strtable: &StrTable) -> Result<Lvalue> {
let tag = self.bits[0];
if tag == 0xff {
Ok(Lvalue::Undefined)
} else if tag & 0x1 == 1 { // variable
let size = ((tag & 0xfe) >> 1) as usize;
let idx = self.bits.pread::<u32>(1)?;
let subscript = {
let s = self.bits.pread::<u16>(5)?;
if s == 0 { None } else { Some(s as usize) }
};
let name = strtable.find(idx).unwrap();
Ok(Lvalue::Variable { name: name.clone().into(), subscript, size })
} else {
Err(format!("Lvalue bad bits {:?}", self.bits).into())
}
}
///
pub fn from_bytes(bytes: &[u8]) -> Result<Self> {
if bytes.len() < 7 {
Err("Lvalue requires 7 byte array".into())
} else {
let mut bits = [0; 7];
bytes.gread_inout(&mut 0, &mut bits[..])?;
Ok(LvalueBits {
bits
})
}
}
///
pub fn new(lvalue: Lvalue, strtable: &StrTable) -> Result<Self> {
let bits = match lvalue {
Lvalue::Undefined => [0xff; 7],
Lvalue::Variable { name, subscript, size } => {
let mut scratch = [0; 7];
let size = size as u8;
let subscript = subscript.unwrap_or(0) as u16;
let idx = *strtable.get(&*name).unwrap_or(&0);
let o = &mut 0;
scratch.gwrite((size << 1) | 0x1, o)?; // always set this to variable tag
scratch.gwrite(idx, o)?;
scratch.gwrite(subscript, o)?;
scratch
},
};
Ok(LvalueBits { bits })
}
}
///
pub const ADD_OPCODE: u8 = 1;
///
pub const SUBTRACT_OPCODE: u8 = 2;
///
pub const MULTIPLY_OPCODE: u8 = 3;
///
pub const DIVIDEUNSIGNED_OPCODE: u8 = 4;
///
pub const DIVIDESIGNED_OPCODE: u8 = 5;
///
pub const SHIFTLEFT_OPCODE: u8 = 6;
///
pub const SHIFTRIGHTUNSIGNED_OPCODE: u8 = 7;
///
pub const SHIFTRIGHTSIGNED_OPCODE: u8 = 8;
///
pub const MODULO_OPCODE: u8 = 9;
///
pub const AND_OPCODE: u8 = 10;
///
pub const INCLUSIVEOR_OPCODE: u8 = 11;
///
pub const EXCLUSIVEOR_OPCODE: u8 = 12;
///
pub const EQUAL_OPCODE: u8 = 13;
///
pub const LESSOREQUALUNSIGNED_OPCODE: u8 = 14;
///
pub const LESSOREQUALSIGNED_OPCODE: u8 = 15;
///
pub const LESSUNSIGNED_OPCODE: u8 = 16;
///
pub const LESSSIGNED_OPCODE: u8 = 17;
///
pub const ZEROEXTEND_OPCODE: u8 = 18;
///
pub const SIGNEXTEND_OPCODE: u8 = 19;
///
pub const MOVE_OPCODE: u8 = 20;
///
pub const CALL_OPCODE: u8 = 21;
///
pub const SELECT_OPCODE: u8 = 22;
///
pub const LOAD_OPCODE: u8 = 23;
///
pub const STORE_OPCODE: u8 = 24;
///
pub const PHI_OPCODE: u8 = 25;
///
#[derive(Clone,PartialEq,Eq,Serialize,Deserialize)]
pub struct StatementBits {
bits: [u8; 1+7+9+9] // opcode lvalue rvalue rvalue
}
impl Debug for StatementBits {
fn fmt(&self, f: &mut Formatter) -> fmt::Result {
for byte in self.bits.iter() {
write!(f, "{:08b}", byte)?;
}
Ok(())
}
}
impl StatementBits {
/// into
pub fn into(self, strtable: &StrTable) -> Result<Statement> {
use Operation::*;
let opcode: u8 = self.bits.pread(0)?;
let lvalue_bits = LvalueBits::from_bytes(&self.bits[1..8])?;
let assignee = lvalue_bits.into(strtable)?;
let op = match opcode {
ADD_OPCODE => {
let r1 = RvalueBits::from_bytes(&self.bits[8..17])?;
let r2 = RvalueBits::from_bytes(&self.bits[17..26])?;
Add(r1.into(strtable)?, r2.into(strtable)?)
},
SUBTRACT_OPCODE => {
let r1 = RvalueBits::from_bytes(&self.bits[8..17])?;
let r2 = RvalueBits::from_bytes(&self.bits[17..26])?;
Subtract(r1.into(strtable)?, r2.into(strtable)?)
},
// MULTIPLY_OPCODE => {
// },
// DIVIDEUNSIGNED_OPCODE => {
// },
// DIVIDESIGNED_OPCODE => {
// },
// SHIFTLEFT_OPCODE => {
// },
// SHIFTRIGHTUNSIGNED_OPCODE => {
// },
// SHIFTRIGHTSIGNED_OPCODE => {
// },
// MODULO_OPCODE => {
// },
// AND_OPCODE => {
// },
// INCLUSIVEOR_OPCODE => {
// },
// EXCLUSIVEOR_OPCODE => {
// },
// EQUAL_OPCODE => {
// },
// LESSOREQUALUNSIGNED_OPCODE => {
// },
// LESSOREQUALSIGNED_OPCODE => {
// },
// LESSUNSIGNED_OPCODE => {
// },
// LESSSIGNED_OPCODE => {
// },
// ZEROEXTEND_OPCODE => {
// },
// SIGNEXTEND_OPCODE => {
// },
// MOVE_OPCODE => {
// },
// CALL_OPCODE => {
// },
// SELECT_OPCODE => {
// },
// LOAD_OPCODE => {
// },
// STORE_OPCODE => {
// },
// PHI_OPCODE => {
// },
opcode => return Err(format!("invalid opcode {}", opcode).into())
};
Ok(Statement { assignee, op })
}
///
pub fn new(statement: Statement, strtable: &StrTable) -> Result<Self> {
use Operation::*;
let mut bits = [0; 1+7+9+9];
let lvalue = LvalueBits::new(statement.assignee, strtable)?;
bits.pwrite(lvalue.bits(), 1)?;
match statement.op {
Add(r1, r2) => {
bits.pwrite(ADD_OPCODE, 0)?;
bits.pwrite(r1.into_bits(strtable)?.bits(), 8)?;
bits.pwrite(r2.into_bits(strtable)?.bits(), 8+9)?;
},
Subtract(r1, r2) => {
bits.pwrite(SUBTRACT_OPCODE, 0)?;
bits.pwrite(r1.into_bits(strtable)?.bits(), 8)?;
bits.pwrite(r2.into_bits(strtable)?.bits(), 8+9)?;
},
Multiply(r1, r2) => {
bits.pwrite(MULTIPLY_OPCODE, 0)?;
bits.pwrite(r1.into_bits(strtable)?.bits(), 8)?;
bits.pwrite(r2.into_bits(strtable)?.bits(), 8+9)?;
},
DivideUnsigned(r1, r2) => {
bits.pwrite(DIVIDEUNSIGNED_OPCODE, 0)?;
bits.pwrite(r1.into_bits(strtable)?.bits(), 8)?;
bits.pwrite(r2.into_bits(strtable)?.bits(), 8+9)?;
},
_ => unimplemented!()
};
Ok(StatementBits { bits })
}
} The StrTable is very much a WIP/prototype for what it might look like, not much thought was put into it; we probably want something like shawshank for string interning, but the returned indices must be stable I think; we should also be able to approximate this behavior with the current model. |
This is tracking issue for performance issues in panopticon.
Currently, memory usage is spiraling out of control ;) running
panop
onlibc.so.6
(a 1.9 MB binary), panopticon uses 2.1GB to perform a full disassembly.This obviously will not scale to larger binaries.
The current hotspots are:
(MAJOR)
function.rs:451
- 1.2GB of 2.1GB comes from the clone on this line:panopticon/core/src/function.rs
Line 451 in b7802d0
i've verified that commenting out this line with the clone reduces memory usage (and runtime) by half, though obviously this is not an appropriate solution
500MB
data-flow/ssa.rs:172
-panopticon/data-flow/src/ssa.rs
Line 172 in b7802d0
amd64/disassembler.rs:277
- There's a lot of slowdown allocating the RREIL statement vectors during disassembly, accounting for about 20-30% of peak allocations, specifically, right here:panopticon/amd64/src/disassembler.rs
Line 277 in b7802d0
I've been using heaptrack via something like:
with the following patch applied (requires nightlyt):
The text was updated successfully, but these errors were encountered: