Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/#61: Improve logfile parsing speed by factor 2 for AbstractDataReaderSun #177

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from

Conversation

geld0r
Copy link
Contributor

@geld0r geld0r commented Aug 18, 2016

This is a follow up for PR #176 that improves the parsing speed for parsers based on AbstractDataReaderSun.

Profiling showed that a large part of the time for parsing a logfile is spent
while parsing the contained datestamps. This is noticeable when either big logfiles or a series of logfiles are parsed.
See https://github.com/geld0r/DateTimeFormatBenchmarks for a speed comparison of various DateTimeFormatters.
FastDateFormat was chosen as a replacement and Apache Commons Lang added as dependency.
Hand timed benchmarks show that parsing logfiles of supported types now works roughly twice as fast.

Note that this required to add Apache commons-lang3 as project dependencies. It contains many utilities that are useful and avoids having to rewrite them. Among them is FastDateFormat, which is a thread-safe and fast alternative to SimpleDateFormat. This is now used for parsing dates in such logfiles.

geld0r added 2 commits August 18, 2016 19:50
…n-shade plugin

Apache commons-lang3 was added to the project dependencies. It contains many utilities that are useful and avoid having to rewrite them.
Among them is FastDateFormat, which is a thread-safe and fast alternative to SimpleDateFormat.

This dependency now has to be delivered together with GCViewer. The maven-shade plugin is now used to build an executable "fat-jar" that contains this and all other future dependencies.
The maven-jar plugin has been removed, as it is no longer required.
…ataReaderSun is used

Profiling showed that a large part of the time for parsing a logfile is spent
while parsing the contained datestamps. This is noticeable when either big logfiles or a series of logfiles are parsed.
See https://github.com/geld0r/DateTimeFormatBenchmarks for a speed comparison of various DateTimeFormatters.
FastDateFormat was chosen as a replacement and Apache Commons Lang added as dependency.
Hand timed benchmarks show that parsing logfiles of supported types now works roughly twice as fast.
@codecov-io
Copy link

codecov-io commented Aug 18, 2016

Codecov Report

Merging #177 into develop will decrease coverage by <.01%.
The diff coverage is 95%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #177      +/-   ##
===========================================
- Coverage    62.62%   62.62%   -0.01%     
===========================================
  Files          140      140              
  Lines         8057     8073      +16     
  Branches      1454     1455       +1     
===========================================
+ Hits          5046     5056      +10     
- Misses        2532     2538       +6     
  Partials       479      479
Impacted Files Coverage Δ
...m/tagtraum/perf/gcviewer/imp/DataReaderFacade.java 83.05% <100%> (+3.45%) ⬆️
...traum/perf/gcviewer/imp/AbstractDataReaderSun.java 83.56% <88.88%> (+0.35%) ⬆️
...va/com/tagtraum/perf/gcviewer/view/GCDocument.java 33.97% <0%> (-1.93%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8389030...40c5a1a. Read the comment docs.

@chewiebug chewiebug added this to the 1.35 milestone Sep 4, 2016
@chewiebug
Copy link
Owner

HI @geld0r

As you mentioned in your comment in #61 I have checked the deployment of GCViewer after your change. At the moment, it doesn't work, as I think, it should. Here a diagram showing a possible approach:
deployment

build + shade jar

  • currently: uber-jar is created with the same name as the jar with dependencies + jar with dependencies is renamed to "original-..."
  • should: uber-jar should have a new name, jar with dependencies should retain the original name

create assembly for mac distribution

  • will just work, because the assembly plugin will use the jar with dependencies + copy the dependent jars along

deploy original jar to maven repository

  • should also just work, because the original jar still has the original name

deploy uber-jar to SourceForge

  • should: copy the uber-jar instead of the original jar to SourceForge (must rename the uber-jar back to its original name before / during copy)

This should work, but I probably, there is another approach using the standard naming generated by the shade-plugin and fix the assembly + the deployment to the maven repository steps.

What do you think? Will you give it a try, or shall I?

Best regards,
Jörg

PS: To check, what the assembly plugin creates, you can run "mvn package -P sourceforge-release". This will create the mac assembly.

@chewiebug
Copy link
Owner

Hi @geld0r

I have tried to reproduce the performance improvement on my machine and was a bit confused, that neither hand stopping (as you implemented with the stop watch) nor profiling showed the performance improvements, that you found.

I have then rerun the benchmark you used to back your improvements on and believe, that the later jdk1.8.0 versions have improved the ZonedDateTimeParser so much, that the Apache implementation is not faster any more (I have created a pull request for your benchmark repository: add benchmark results for different hardware / jdk).

Now I have two questions:

  • It would be very interesting, if you could publish the benchmark results on your hardware with the latest jdk. Do you then also see the improved performance of the standard ZonedDateTimeParser?
  • Might I have made a mistake running the benchmark or interpreting its results? To me, it looks like zonedDateTimeParse + jodaDateTimeFormatter are about equal in speed, while the fastDateFormat implementation slightly lags behind.

Unless there is a mistake in my benchmarking / interpretation, I'd refrain from merging this pull request while it includes the usage of the Apache parser.

I'd still find it interesting though, to have the maven-shade-plugin + StopWatch working for GCViewer, because that would open up a wealth of opportunities, that I've always shunned in the past, because I didn't know how to keep the deployment process simple. This pull request still has it's merits in that area for me.

@chewiebug chewiebug removed this from the 1.35 milestone Aug 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants