Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attempt at UnifiedLog Iterator #26

Merged
merged 2 commits into from
Nov 13, 2024
Merged

Attempt at UnifiedLog Iterator #26

merged 2 commits into from
Nov 13, 2024

Conversation

puffyCid
Copy link
Collaborator

Basic PR to try to create an iterator for tracev3 files.

Allows caller to iterate through chunk(s) of tracev3. The original implementation of the parser, parsed all chunks at once and returned everything.

Now we offer a way to parse one chunk at a time. With the goal of reducing memory usage a little bit

@puffyCid
Copy link
Collaborator Author

puffyCid commented Sep 24, 2024

hey @jrouaix, i finally have time to try to improve this library a bit to reduce memory usage. Apologies for the delay

I see in ur PR #22 you are also making some changes.

I added a function to iterate through the log files in chunks.
Below is very high-level overview of the log format:

Tracev3 log format:

header
gzip compressed block 
catalog <- Metadata on the logs
chunkset <— Collection of logs
chunkset
chunkset
chunkset
chunkset
End of block

Gzip compressed block
catalog
chunkset
chunkset
chunkset
chunkset
chunkset
chunkset
chunkset
End of block

<repeat XXXX times>

Originally the library would parse entire tracev3 file at once. Looping through and parsing and decompressing all data. Then return all log data for the user to then build (build_log)

This PR adds an iterator to now parse one gzip compressed chunk at a time
This reduces memory usage a good bit. A ~150-200MB decrease on my test system

From the example projects in the repo:
unifiedlog_parser - ~500MB - ~600MB - uses the original parser to parse entire tracev3 file at once
unifiedlog_iterator - 400MB - uses the new iterator approach (example included in this PR)

Based on my tests it looks like most of the memory usage is coming from the strings we read and cache at /var/log/db/uuidtext. On my test system this directory contains ~350MBs of data.

I added an example project in this PR that uses the new iterator approach to parsing log files.
If you have time to run/test it, i would be interested if you see a memory decrease.

@jrouaix
Copy link
Contributor

jrouaix commented Sep 26, 2024

If you have time to run/test it, i would be interested if you see a memory decrease.

will do !


let mut count = 0;
for mut chunk in log_iterator {
chunk.oversize.append(&mut oversize_strings.oversize);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels clunky, you move all oversize_strings to this chunk, and a few lines later you'll move the same lines back (with a few more).
Could it be that you just lend over a &oversize_strings.oversize, and get only the new missings to append later ?

@jrouaix
Copy link
Contributor

jrouaix commented Oct 2, 2024

Hi @puffyCid !
So I gave a try on this branch and indeed there is a very nice memory wise improvement.

On a standard run on our product, I can see around 200+ MB max memory used (over 900+, more than 20% but it's not all unified logs so it represent a bigger improvement on your lib only).

Yet, there is a pretty big perf regression when using this version :
new version :

real    13m29.036s
user    13m56.268s
sys     1m18.550s

previous one :

real    9m57.806s
user    11m13.398s
sys     0m45.651s

I wonder if can be caused by the oversize_strings copying I just commented about previously : #26 (comment)

@puffyCid
Copy link
Collaborator Author

puffyCid commented Oct 3, 2024

Thanks for the benchmarks, I'll take a look and troubleshoot the performance

@puffyCid
Copy link
Collaborator Author

puffyCid commented Oct 8, 2024

@jrouaix I made some minor fixes to the unifiedlog_iterator example. And ran some basic benchmarks with hyperfine

Performance for unifiedlog_iterator and unifiedlog_parser examples appear to be nearly the same.

In the example below, system_logs.logarchive has ~9.6 million log entries

hyperfine "./unifiedlog_iterator -i system_logs.logarchive -o out.csv" "./unifiedlog_parser -i system_logs.logarchive -o out2.csv"
Benchmark 1: ./unifiedlog_iterator -i system_logs.logarchive -o out.csv
  Time (mean ± σ):     58.799 s ±  0.327 s    [User: 54.750 s, System: 3.507 s]
  Range (min … max):   58.203 s … 59.275 s    10 runs

Benchmark 2: ./unifiedlog_parser -i system_logs.logarchive -o out2.csv
  Time (mean ± σ):     57.804 s ±  0.129 s    [User: 53.366 s, System: 3.984 s]
  Range (min … max):   57.576 s … 57.949 s    10 runs

Summary
  ./unifiedlog_parser -i system_logs.logarchive -o out2.csv ran
    1.02 ± 0.01 times faster than ./unifiedlog_iterator -i system_logs.logarchive -o out.csv

Do you still see a performance regression?

@jrouaix
Copy link
Contributor

jrouaix commented Nov 13, 2024

I'm so sorry @puffyCid, I didn't have much time in the previous week (won't be better in the next ones)
If it's ok on your side, don't worry and merge, probably the pref regression is in our usage (expecting chunks or data, now are chunks of 1, could have a pretty big side effect !)

@puffyCid puffyCid merged commit e8c3857 into main Nov 13, 2024
5 checks passed
@puffyCid puffyCid deleted the iterator-parser branch November 13, 2024 22:56
@jrouaix
Copy link
Contributor

jrouaix commented Nov 15, 2024

@puffyCid I have more intel here, yet I can't find where/why it's slightly slower

  • same counts
  • same overall output in our product
  • new version saves a lot of RAM 🎉
  • yet the new version is about 4% slower than the old one (2s over 50s full process)

2 implementation on our product

  fn parse_file_new(&mut self, full_path: &str) {
    let Ok(log_bytes) = fs::read(full_path) else {
      return;
    };

    let log_iterator = macos_unifiedlogs::iterator::UnifiedLogIterator { data: log_bytes, header: Vec::new() };

    let mut missing_unified_log_data_vec = UnifiedLogData { header: Vec::new(), catalog_data: Vec::new(), oversize: Vec::new() };

    for mut chunk in log_iterator {
      chunk.oversize.append(&mut self.oversize_strings);
      let (results, mut missing_logs) = build_log(&chunk, &self.string_results, &self.shared_strings_results, &self.timesync_data, true);
      self.oversize_strings = chunk.oversize;

      self.logs_count += results.len();
      self.chuncks_count += 1;

      missing_unified_log_data_vec.header.append(&mut missing_logs.header);
      missing_unified_log_data_vec.catalog_data.append(&mut missing_logs.catalog_data);
      missing_unified_log_data_vec.oversize.append(&mut missing_logs.oversize);
    }

    self.leftover_logs.push(missing_unified_log_data_vec);
  }

the old one is

  fn parse_file(&mut self, full_path: &str) {
  let Ok(mut log_data) = parse_log(full_path).map_err(|e| tracing::error!("unified log parse_file error : {e}")) else {
    return;
  };

  log_data.oversize.append(&mut self.oversize_strings);

  // Get all constructed logs and any log data that failed to get constructed (exclude_missing = true)
  let Ok(iterator) = iter_log(&log_data, &self.string_results, &self.shared_strings_results, &self.timesync_data, true)
    .map_err(|e| tracing::warn!("Failed to iterate over logs: {e}"))
  else {
    return;
  };

  let mut missing_unified_log_data_vec = UnifiedLogData { header: Vec::new(), catalog_data: Vec::new(), oversize: Vec::new() };

  for (results, mut missing_logs) in iterator {
    // Track Oversize entries
    self.logs_count += results.len();
    self.chuncks_count += 1;

    missing_unified_log_data_vec.header.append(&mut missing_logs.header);
    missing_unified_log_data_vec.catalog_data.append(&mut missing_logs.catalog_data);
    missing_unified_log_data_vec.oversize.append(&mut missing_logs.oversize);
  }

  self.oversize_strings = log_data.oversize;
  self.leftover_logs.push(missing_unified_log_data_vec);
}

@puffyCid
Copy link
Collaborator Author

@jrouaix let me see if can make some local benchmarks based on ur implementation

One interesting observation:
It looks like ur appending data individually

missing_unified_log_data_vec.header.append(&mut missing_logs.header); missing_unified_log_data_vec.catalog_data.append(&mut missing_logs.catalog_data); missing_unified_log_data_vec.oversize.append(&mut missing_logs.oversize);

I think in the example I track missing data as a Vec<UnifiedlogData>
`missing.push(missing_logs);'

Perhaps appending this data 3 times individually for every iteration (vs once per file in the original implementation) could be the cause of the regression?

As mentioned let me try to recreate this locally and see if I can replicate the performance regression

@puffyCid
Copy link
Collaborator Author

puffyCid commented Nov 16, 2024

Another idea, what if u check first for empty values first. If header, catalog_data, and oversize are all empty

Do not append to missing_unified_log_data_vec and just continue

@@ -311,13 +311,15 @@ fn parse_trace_file(
}

exclude_missing = false;
println!("Oversize cache size: {}", oversize_strings.oversize.len());
println!("Logs with missing oversize strings: {}", missing_data.len());
println!("Checking Oversize cache one more time...");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@puffyCid Are these leftovers? Either way I believe it's best to use eprintln! as per my previous suggestion for separating output and logs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants