Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactored some of the code and added in extract lines performance #15

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

ekeric13
Copy link

@ekeric13 ekeric13 commented Feb 14, 2018

Looks like I am having some white space issue.

My text editor is saying spaces: 4.

What are you using?

edit: I think I resolved the whitespace issues.

Found this super useful btw! Not many good options for synchronously reading line by line while not all in memory.

Also if you don't want the dependency, we can check to see if Buffer.indexOf is supported, and if not use the old _extractLines function.

@ekeric13 ekeric13 mentioned this pull request Feb 14, 2018
@nacholibre
Copy link
Owner

Hello,

thanks for your contribution, but I cannot merge this before I know for sure this is better than original version. How did you verified your code is faster? Can you post some test results?

@ekeric13
Copy link
Author

ekeric13 commented Mar 8, 2018

Makes total sense!

I just added a test that computes how long it currently takes to go through a ~27k line file.

It doesn't give you much context as it is not running against the previous version though.

I can upload my own benchmark file if you want, and for now will just post the results here.

I used the test helpers in this PR along with these functions to compare the old and new version of node-readlines.

const lineByLine = require('./lineByLine');
const oldLineByLine = require('n-readlines');

function percentDecrease(newNum, oldNum) {
  return Math.floor(((oldNum-newNum) / oldNum) * 100) + '%';
}

function logResultsFor(file, trials) {
  var newAvg = runThrough(file, lineByLine, trials);
  console.log(`new average ${file} x ${trials} in s:`, newAvg.toFixed(5));
  var oldAvg = runThrough(file, oldLineByLine, trials);
  console.log(`old average ${file} x ${trials} in s:`, oldAvg.toFixed(5));
  console.log(`time to parse through ${file} reduced by`, percentDecrease(newAvg, oldAvg));
}

logResultsFor('movies', 100);

logResultsFor('ratings', 3);

Where movies was 27k lines and ratings was 20m lines.

The results were

new average movies x 100 in s: 0.01400
old average movies x 100 in s: 0.03010
time to parse through movies reduced by 53%
new average ratings x 3 in s: 8.51500
old average ratings x 3 in s: 13.73300
time to parse through ratings reduced by 37%

Ran this multiple times, and the movies file was consistently ~50% better, and the ratings file was consistently ~35% better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants