-
Notifications
You must be signed in to change notification settings - Fork 980
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/#61: Improve logfile parsing speed by factor 2 for AbstractDataReaderSun #177
base: develop
Are you sure you want to change the base?
Conversation
…n-shade plugin Apache commons-lang3 was added to the project dependencies. It contains many utilities that are useful and avoid having to rewrite them. Among them is FastDateFormat, which is a thread-safe and fast alternative to SimpleDateFormat. This dependency now has to be delivered together with GCViewer. The maven-shade plugin is now used to build an executable "fat-jar" that contains this and all other future dependencies. The maven-jar plugin has been removed, as it is no longer required.
…ataReaderSun is used Profiling showed that a large part of the time for parsing a logfile is spent while parsing the contained datestamps. This is noticeable when either big logfiles or a series of logfiles are parsed. See https://github.com/geld0r/DateTimeFormatBenchmarks for a speed comparison of various DateTimeFormatters. FastDateFormat was chosen as a replacement and Apache Commons Lang added as dependency. Hand timed benchmarks show that parsing logfiles of supported types now works roughly twice as fast.
Codecov Report
@@ Coverage Diff @@
## develop #177 +/- ##
===========================================
- Coverage 62.62% 62.62% -0.01%
===========================================
Files 140 140
Lines 8057 8073 +16
Branches 1454 1455 +1
===========================================
+ Hits 5046 5056 +10
- Misses 2532 2538 +6
Partials 479 479
Continue to review full report at Codecov.
|
HI @geld0r As you mentioned in your comment in #61 I have checked the deployment of GCViewer after your change. At the moment, it doesn't work, as I think, it should. Here a diagram showing a possible approach: build + shade jar
create assembly for mac distribution
deploy original jar to maven repository
deploy uber-jar to SourceForge
This should work, but I probably, there is another approach using the standard naming generated by the shade-plugin and fix the assembly + the deployment to the maven repository steps. What do you think? Will you give it a try, or shall I? Best regards, PS: To check, what the assembly plugin creates, you can run "mvn package -P sourceforge-release". This will create the mac assembly. |
Hi @geld0r I have tried to reproduce the performance improvement on my machine and was a bit confused, that neither hand stopping (as you implemented with the stop watch) nor profiling showed the performance improvements, that you found. I have then rerun the benchmark you used to back your improvements on and believe, that the later jdk1.8.0 versions have improved the ZonedDateTimeParser so much, that the Apache implementation is not faster any more (I have created a pull request for your benchmark repository: add benchmark results for different hardware / jdk). Now I have two questions:
Unless there is a mistake in my benchmarking / interpretation, I'd refrain from merging this pull request while it includes the usage of the Apache parser. I'd still find it interesting though, to have the maven-shade-plugin + StopWatch working for GCViewer, because that would open up a wealth of opportunities, that I've always shunned in the past, because I didn't know how to keep the deployment process simple. This pull request still has it's merits in that area for me. |
This is a follow up for PR #176 that improves the parsing speed for parsers based on AbstractDataReaderSun.
Profiling showed that a large part of the time for parsing a logfile is spent
while parsing the contained datestamps. This is noticeable when either big logfiles or a series of logfiles are parsed.
See https://github.com/geld0r/DateTimeFormatBenchmarks for a speed comparison of various DateTimeFormatters.
FastDateFormat was chosen as a replacement and Apache Commons Lang added as dependency.
Hand timed benchmarks show that parsing logfiles of supported types now works roughly twice as fast.
Note that this required to add Apache commons-lang3 as project dependencies. It contains many utilities that are useful and avoids having to rewrite them. Among them is FastDateFormat, which is a thread-safe and fast alternative to SimpleDateFormat. This is now used for parsing dates in such logfiles.