Refactored some of the code and added in extract lines performance #15

ekeric13 · 2018-02-14T22:02:25Z

Looks like I am having some white space issue.

My text editor is saying spaces: 4.

What are you using?

edit: I think I resolved the whitespace issues.

Found this super useful btw! Not many good options for synchronously reading line by line while not all in memory.

Also if you don't want the dependency, we can check to see if Buffer.indexOf is supported, and if not use the old _extractLines function.

… of reading file

nacholibre · 2018-03-04T21:31:32Z

Hello,

thanks for your contribution, but I cannot merge this before I know for sure this is better than original version. How did you verified your code is faster? Can you post some test results?

ekeric13 · 2018-03-08T02:53:48Z

Makes total sense!

I just added a test that computes how long it currently takes to go through a ~27k line file.

It doesn't give you much context as it is not running against the previous version though.

I can upload my own benchmark file if you want, and for now will just post the results here.

I used the test helpers in this PR along with these functions to compare the old and new version of node-readlines.

const lineByLine = require('./lineByLine');
const oldLineByLine = require('n-readlines');

function percentDecrease(newNum, oldNum) {
  return Math.floor(((oldNum-newNum) / oldNum) * 100) + '%';
}

function logResultsFor(file, trials) {
  var newAvg = runThrough(file, lineByLine, trials);
  console.log(`new average ${file} x ${trials} in s:`, newAvg.toFixed(5));
  var oldAvg = runThrough(file, oldLineByLine, trials);
  console.log(`old average ${file} x ${trials} in s:`, oldAvg.toFixed(5));
  console.log(`time to parse through ${file} reduced by`, percentDecrease(newAvg, oldAvg));
}

logResultsFor('movies', 100);

logResultsFor('ratings', 3);

Where movies was 27k lines and ratings was 20m lines.

The results were

new average movies x 100 in s: 0.01400
old average movies x 100 in s: 0.03010
time to parse through movies reduced by 53%
new average ratings x 3 in s: 8.51500
old average ratings x 3 in s: 13.73300
time to parse through ratings reduced by 37%

Ran this multiple times, and the movies file was consistently ~50% better, and the ratings file was consistently ~35% better.

ekeric13 added 3 commits February 14, 2018 13:46

Refactor code to be more concise

d12d5ea

Extract lines faster

4817873

Add in polyfill for buffer.indexOf

d9d119c

ekeric13 mentioned this pull request Feb 14, 2018

Performance issue #14

Closed

ekeric13 added 2 commits February 14, 2018 14:56

remove undefined variable

bbdcb5f

Get rid of edge case when have read bytes but no bytes in latest pass…

678cac3

… of reading file

ekeric13 added 4 commits March 7, 2018 18:29

Add in larger data set and add a test to compute how long it takes

721eea7

Remove es6 function

cc82977

Get rid of automatic key value assignment

0ab8e19

Convert execSync to exec

45b8dee

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactored some of the code and added in extract lines performance #15

Refactored some of the code and added in extract lines performance #15

ekeric13 commented Feb 14, 2018 •

edited

Loading

nacholibre commented Mar 4, 2018

ekeric13 commented Mar 8, 2018 •

edited

Loading

Refactored some of the code and added in extract lines performance #15

Are you sure you want to change the base?

Refactored some of the code and added in extract lines performance #15

Conversation

ekeric13 commented Feb 14, 2018 • edited Loading

nacholibre commented Mar 4, 2018

ekeric13 commented Mar 8, 2018 • edited Loading

ekeric13 commented Feb 14, 2018 •

edited

Loading

ekeric13 commented Mar 8, 2018 •

edited

Loading