Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory refactor #6

Closed
wants to merge 8 commits into from
Closed

Conversation

dgilperez
Copy link

Hi!

I was playing with derailed_benchmarks gem in one of my projects, looking to reduce its memory footprint, and I noticed rest-client was using quite a few, mainly because of it's dependency to domain_name.

I was looking at your code and noticed that we could improve that by using marshaling rather than a huge constant in lib/domain_name/etld_data.rb.

This PR is my take towards this. Please let me know what do you think.

The output of bundle exec derailed bundle:mem (locally adding derailed as a development dependency) was:

TOP: 2.4141 MiB
  domain_name: 2.3789 MiB
    domain_name/etld_data: 2.0859 MiB

And after the changes here is:

TOP: 0.7109 MiB
  domain_name: 0.668 MiB
    yaml: 0.4609 MiB
      psych: 0.457 MiB

Being half of the memory actually been used by yaml, which will be already a dependency quite a lot of apps already down there.

Thanks!

@tijmenb
Copy link

tijmenb commented Sep 16, 2015

👍

@knu
Copy link
Owner

knu commented Sep 17, 2015

Cool, but I'd like to know how this would reduce memory footprint before anything. Is that the cost of running the code to build a large hash and then keep the byte code that is only run once? Just curious, but how much of it will stay on memory forever?

Yaml would be fine, but I'm not fond of distributing Marshal'd binary because it will enforce you to build a gem with the oldest ruby supported in order to make sure to generate a binary that is readable by any version of ruby.

etld_data = { 'data_date' => etld_data_date, 'data' => parse(dat) }

File.open(yaml_file, 'w:utf-8') do |yaml|
yaml.write etld_data.to_yaml
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Yaml.dump(etld_data, yaml) would be better because it'd save the generation of a large string.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@dgilperez
Copy link
Author

About the second part of your question, if I understand correctly how Marshalling works, there should be no problem with ruby >= 1.8.0, as the docs say:

The first two bytes of the stream contain the major and minor version, each as a single byte encoding a digit. The version implemented in Ruby is 4.8 (stored as “x04x08”) and is supported by ruby 1.8.0 and newer.

Different major versions of the Marshal format are not compatible and cannot be understood by other major versions. Lesser minor versions of the format can be understood by newer minor versions. Format 4.7 can be loaded by a 4.8 implementation but format 4.8 cannot be loaded by a 4.7 implementation.

So we might have problems with people using Marshal 4.7 (previous to ruby 1.8), but even if we want to support that (do we?), we may generate the file with that format ...

Anyway, the main reason for using Marshall is speed, and although that is a clear advantage for .dump, that is probably not the point here. Deserializing is also reportedly faster with Marshal, but not that much.

I may be missing a whole bunch of points here anyway. Let me know if you want me to use Yaml instead.

@dgilperez
Copy link
Author

About the first part of the question, I'm not entirely sure. bundle exec derailed bundle:mem calculates the memory used at boot time. I assume that memory stays as a base memory footprint for any app depending on ruby-domain_name. If that's true, and considering rest-client is a common dependency, even if we are talking about just about 2MiB, that would be noticeable by many. I may be wrong.

May I politely ask @schneems about his opinion?

@dgilperez
Copy link
Author

@knu I just pushed a non-marshall version that performs about as well.

@dgilperez
Copy link
Author

Hey @knu, did you have time to review this? Thanks 😄

@dgilperez
Copy link
Author

Hi @knu any news on this?

@knu
Copy link
Owner

knu commented Jun 16, 2016

So sorry for the delay. Could you rebase this on master?

@dgilperez
Copy link
Author

There you go @knu ! I'm closing in favour of #8

@dgilperez dgilperez closed this Jun 16, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants