-
Notifications
You must be signed in to change notification settings - Fork 460
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OPENNLP-421 - Replace String.intern() in StringList #568
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understood the issue and read the benchmark gist results correctly, with the new code JMH benchmark starts faster, then the old code catchs up and plateaus around 1.2K ops/s. But in overall the non interned version still process more ops/s, with the advantage of not having the JVM max memory issue.
Do you think we could look at memory usage running these tests too? Maybe through GC metrics in JMH or VisualVM, YourKit, etc.?
Thanks!
opennlp-tools/src/jmh/java/opennlp/tools/jmh/ExecutionPlan.java
Outdated
Show resolved
Hide resolved
if we look at the mean values:
vs.
Systemarray.copy(...) is the reason for the speed improvement from merely 1k ops/s to 328k op/s (worst case) So what we see is a huge improvement in terms of throughut because we avoid copying and creating objects all the time ;-) - I agree, that we could investigate memory consumption, too. |
opennlp-tools/src/jmh/java/opennlp/tools/jmh/ExecutionPlan.java
Outdated
Show resolved
Hide resolved
Added GC Profiling output to JMH. Here it is Old Code
New Code
|
Thanks for the GC numbers. The new code seems a bit heavier on the GC and mem allocations, but I think that was expected. A bit late here, but I decided to give it a try. Excuse if I say anything silly, I can check these results again tomorrow after having some ☕ . I opened your branch on IntelliJ, then made the jmh profile active by default (how do you tell IntelliJ to include the deps of a profile? anywho), invalidated cache, etc.. Executed the tests just to make sure they were running, as you already provided the results. Then executed with the profiler, to generate the Opened both in IntelliJ and selected the CPU samplesI think it just confirms the change of calls to intern functions. Also the increase in GC calls. And I think the Random calls are from JMH for having more/less samples due to higher throughput. Memory allocationsWith a lot more Strings, as expected, as well as bytes due to the array copy calls. Unfortunately I don't have any better or more practical test to compare memory (maybe a single run of a large example would be better for comparing memory, instead of multiple jmh runs…). But I don't think this should really be a problem. I wonder if that was some premature optimization or if at some point someone had memory issues. But a change that's easy to revert if needed, and that way users won't have to modify heap settings. So I am inclined to approve it! 🎉 Will wait until tomorrow so you, Martin, Jeff, others had some time to review 🛌 Thanks! Bruno |
@kinow thanks for the details and the flight recording :-) I think, that the current use of String interning was an attempt to optimize string comparisons by using the string constant pool (until it explodes). I guess, that the idea was also to save memory. There is a really nice blog article which summarizes the pros/cons: https://www.codecentric.de/wissens-hub/blog/save-memory-by-using-string-intern-in-java The state, that
I think, that we can remove a lot of the current object creation related to the use of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code looks good, performance tradeoff also seems OK IMO. Thanks @rzo1 !
@mawiesne and myself just started an eval run from the office. Let's see if we see anything ;-) |
I think, we c/should consider using a https://shipilev.net/jvm/anatomy-quarks/10-string-intern/ The removal of the String deduplication, as found in the code segments of this PR, could otherwise lead to more required RAM at runtime - as @kinow already expressed. Wdyt? @jzonthemtn @rzo1 (A) Removal (PR as is now), or |
I am OK with A or B. Feel free to add B on-top of this PR (don't have time in the next days) |
Let's make it a draft and add B. Saving memory is always a good thing. |
(B) looks interesting, +1 for trying that out, and for @rzo1 suggestion to add that on top of this one.
First time I've seen an intern made in the code (intentionally like that). Sounds like a good idea! You do the intern without using the heap. Not sure if it will perform as fast as String.intern, but I think it might work! Thanks @mawiesne |
I reproduced the basic String deduplication benchmark by Shipilev (Java 8) under MacOS with Java 17; the results are consistent with his findings:
|
Looks great, @mawiesne that it also applies to Java 17. Think we should give it a try and add our own |
Feel free to proceed after I've pushed the basic string dedup benchmark, leave Jira as is. |
Ack. |
- adds StringDeduplicationBenchmark (jmh), adapted from A. Shipilëv for benchmarking on different operating systems / Java environments - fixes inconsistent path in build-helper-maven-plugin config for jmh test addition
8e9f0a3
to
c1acc8c
Compare
OPENNLP-412 - Add some interner implementations based on the examples of Aleksey Shipilëv. Introduces mechanismn similar to Hadoop for project wide usage. OPENNLP-412 - Use our user provided StringInterner in StringList OPENNLP-412 - Update Interner Benchmarker with new OpenNLP classes.
Hi all, I added related interner / dedup implementations inspired / based on the code provided by @mawiesne from Aleksey. I added the following interner impls:
The implementation of the static
Currently, an updated benchmark of the different impls is running as well as a full eval build with the default. Will update this PR with the JMH results in a few hours. |
Here is the JMH result for the interner impls only: Benchmark (size) Mode Cnt Score Error Units
StringDeduplicationBenchmark.chm 1 thrpt 25 36070236,872 ± 1204122,063 ops/s
StringDeduplicationBenchmark.chm 100 thrpt 25 354445,896 ± 2893,432 ops/s
StringDeduplicationBenchmark.chm 10000 thrpt 25 2323,076 ± 25,596 ops/s
StringDeduplicationBenchmark.chm 1000000 thrpt 25 13,137 ± 0,199 ops/s
StringDeduplicationBenchmark.chmd05 1 thrpt 25 45289035,126 ± 143794,468 ops/s
StringDeduplicationBenchmark.chmd05 100 thrpt 25 433279,497 ± 3165,815 ops/s
StringDeduplicationBenchmark.chmd05 10000 thrpt 25 2692,779 ± 8,173 ops/s
StringDeduplicationBenchmark.chmd05 1000000 thrpt 25 13,413 ± 0,155 ops/s
StringDeduplicationBenchmark.hm 1 thrpt 25 35123958,776 ± 472485,649 ops/s
StringDeduplicationBenchmark.hm 100 thrpt 25 371997,311 ± 6780,622 ops/s
StringDeduplicationBenchmark.hm 10000 thrpt 25 2311,588 ± 115,117 ops/s
StringDeduplicationBenchmark.hm 1000000 thrpt 25 14,073 ± 0,068 ops/s
StringDeduplicationBenchmark.intern 1 thrpt 25 10040026,472 ± 77470,327 ops/s
StringDeduplicationBenchmark.intern 100 thrpt 25 87644,048 ± 844,053 ops/s
StringDeduplicationBenchmark.intern 10000 thrpt 25 764,752 ± 34,300 ops/s
StringDeduplicationBenchmark.intern 1000000 thrpt 25 2,956 ± 0,024 ops/s
StringDeduplicationBenchmark.noop 1 thrpt 25 148719353,102 ± 743493,703 ops/s
StringDeduplicationBenchmark.noop 100 thrpt 25 1491614,947 ± 1587,406 ops/s
StringDeduplicationBenchmark.noop 10000 thrpt 25 9848,732 ± 11,076 ops/s
StringDeduplicationBenchmark.noop 1000000 thrpt 25 78,726 ± 0,064 ops/s https://gist.github.com/rzo1/754b15381041b28718cf44073f4986c5 |
Love it. Curious to see the impact on things like training. |
Here is the other benchmark: Benchmark (internerClazz) (size) Mode Cnt Score Error Units
StringListBenchmark.newWithArrayConstructor opennlp.tools.util.jvm.CHMStringDeduplicator 1 thrpt 25 45717169,968 ± 4894532,156 ops/s
StringListBenchmark.newWithArrayConstructor opennlp.tools.util.jvm.CHMStringDeduplicator 100 thrpt 25 544627,570 ± 9900,671 ops/s
StringListBenchmark.newWithArrayConstructor opennlp.tools.util.jvm.CHMStringDeduplicator 10000 thrpt 25 3876,808 ± 211,760 ops/s
StringListBenchmark.newWithArrayConstructor opennlp.tools.util.jvm.CHMStringDeduplicator 1000000 thrpt 25 11,190 ± 1,009 ops/s
StringListBenchmark.newWithArrayConstructor opennlp.tools.util.jvm.CHMStringInterner 1 thrpt 25 43774046,611 ± 11369890,960 ops/s
StringListBenchmark.newWithArrayConstructor opennlp.tools.util.jvm.CHMStringInterner 100 thrpt 25 557563,692 ± 12776,109 ops/s
StringListBenchmark.newWithArrayConstructor opennlp.tools.util.jvm.CHMStringInterner 10000 thrpt 25 2951,642 ± 189,985 ops/s
StringListBenchmark.newWithArrayConstructor opennlp.tools.util.jvm.CHMStringInterner 1000000 thrpt 25 8,232 ± 0,847 ops/s
StringListBenchmark.newWithArrayConstructor opennlp.tools.util.jvm.HMStringInterner 1 thrpt 25 63103971,334 ± 1958932,641 ops/s
StringListBenchmark.newWithArrayConstructor opennlp.tools.util.jvm.HMStringInterner 100 thrpt 25 854268,631 ± 22261,811 ops/s
StringListBenchmark.newWithArrayConstructor opennlp.tools.util.jvm.HMStringInterner 10000 thrpt 25 3191,138 ± 241,943 ops/s
StringListBenchmark.newWithArrayConstructor opennlp.tools.util.jvm.HMStringInterner 1000000 thrpt 25 8,891 ± 1,040 ops/s
StringListBenchmark.newWithArrayConstructor opennlp.tools.util.jvm.JvmStringInterner 1 thrpt 25 8881121,516 ± 108767,585 ops/s
StringListBenchmark.newWithArrayConstructor opennlp.tools.util.jvm.JvmStringInterner 100 thrpt 25 81073,526 ± 342,867 ops/s
StringListBenchmark.newWithArrayConstructor opennlp.tools.util.jvm.JvmStringInterner 10000 thrpt 25 729,416 ± 6,782 ops/s
StringListBenchmark.newWithArrayConstructor opennlp.tools.util.jvm.JvmStringInterner 1000000 thrpt 25 2,346 ± 0,076 ops/s
StringListBenchmark.newWithArrayConstructor opennlp.tools.util.jvm.NoOpStringInterner 1 thrpt 25 132530907,915 ± 368546,226 ops/s
StringListBenchmark.newWithArrayConstructor opennlp.tools.util.jvm.NoOpStringInterner 100 thrpt 25 6446247,778 ± 14485,272 ops/s
StringListBenchmark.newWithArrayConstructor opennlp.tools.util.jvm.NoOpStringInterner 10000 thrpt 25 53441,292 ± 107,629 ops/s
StringListBenchmark.newWithArrayConstructor opennlp.tools.util.jvm.NoOpStringInterner 1000000 thrpt 25 124,158 ± 1,921 ops/s https://gist.github.com/rzo1/1e85f65ff5d152e168c58e72b2de5375 |
Basically, it is now ready for review. Latest eval run also green: https://ci-builds.apache.org/job/OpenNLP/job/eval-tests-configurable/14/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx @rzo1 for this substantial improvement, tackling a three digit Issue number!
I've left minor comments, which can easily be addressed.
opennlp-tools/src/main/java/opennlp/tools/util/jvm/CHMStringDeduplicator.java
Show resolved
Hide resolved
opennlp-tools/src/main/java/opennlp/tools/util/jvm/CHMStringDeduplicator.java
Outdated
Show resolved
Hide resolved
opennlp-tools/src/main/java/opennlp/tools/util/jvm/CHMStringInterner.java
Outdated
Show resolved
Hide resolved
opennlp-tools/src/main/java/opennlp/tools/util/jvm/HMStringInterner.java
Outdated
Show resolved
Hide resolved
opennlp-tools/src/main/java/opennlp/tools/util/jvm/JvmStringInterner.java
Outdated
Show resolved
Hide resolved
opennlp-tools/src/main/java/opennlp/tools/util/jvm/StringInterners.java
Outdated
Show resolved
Hide resolved
opennlp-tools/src/main/java/opennlp/tools/util/jvm/StringInterners.java
Outdated
Show resolved
Hide resolved
If this code is integrated, there is another branch based on this PR ( 7d55599 ) which will remove the unnecessary creation of |
I'm working until noon today, then I should have time to test it again 🎉 (sorry the delay) Trying the branch again after rebuilding it, but I keep getting # Run progress: 0.74% complete, ETA 00:01:51
# Fork: 1 of 1
<failure>
java.lang.NoClassDefFoundError: opennlp/tools/jmh/ExecutionPlan
at java.base/java.lang.Class.forName0(Native Method)
at java.base/java.lang.Class.forName(Class.java:375)
at org.openjdk.jmh.util.ClassUtils.loadClass(ClassUtils.java:73)
... |
Did you activate the JMH profile, so it adds the related dependencies? Also note, that Execution plan was moved to |
I did, like the previous time. I invalidated the cache and restart IntelliJ, deleted the Run configuration… that's really odd — although it's Friday afternoon, so my 🧠 is probably giving up on me already. Will try again later. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a mvn clean install -DskipTests
and that solved it.
Then I left it running, and after 40 minutes on my old Thinkpad:
# Run progress: 7.50% complete, ETA 05:13:04
😅 I think I will skip running the memory tests on my machine. Code looks OK, so no objection to merge it from me (we can revert if needed). Thanks
opennlp-tools/src/main/java/opennlp/tools/util/jvm/HMInterner.java
Outdated
Show resolved
Hide resolved
opennlp-tools/src/main/java/opennlp/tools/util/jvm/DefaultStringInterner.java
Outdated
Show resolved
Hide resolved
opennlp-tools/src/main/java/opennlp/tools/util/jvm/Interner.java
Outdated
Show resolved
Hide resolved
opennlp-tools/src/main/java/opennlp/tools/util/jvm/Interner.java
Outdated
Show resolved
Hide resolved
@kinow thx for trying. Both JMH benchmark took around 8h together on my old tower. I will address your comments, squash it up and merge it. Stay tuned for the wrapper removal after 😁 |
Thx to all reviewers / comments with feedback & ideas! Think we solved a good thing here for 2.3.2 ! Other PR for OPENNLP-421 incoming in a few minutes. |
For all changes:
Is there a JIRA ticket associated with this PR? Is it referenced
in the commit message?
Does your PR title start with OPENNLP-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
Has your PR been rebased against the latest commit within the target branch (typically main)?
Is your initial contribution a single, squashed commit?
For code changes:
For documentation related changes:
Note:
I found OPENNLP-421 in our issue tracker and added a simple JMH benchmark to look into the impact of interning strings and our multiple copy-operations in
StringList
.Here is a JMH Benchmark for the vanilla version of OpenNLP: https://gist.github.com/rzo1/d2ba3e48c6bc190977baf9ee42388823
Here is a JMH Benchmark with String interning removed from
StringList
and usage ofSystem.arraycopy(...)
: https://gist.github.com/rzo1/327c84ebac18b62baf927f1a87ec7480As stated by the original issue opener, String interning was most likely used because of:
So if we have huge
StringLists
orDictionaries
, users need to increase-XX:MaxPermSize=
option to avoid an OutOfMemoryException.There might also be room for improvement in the
Dictionary
implemention as we mess around withStringLists
,StringListWrappers
, etc. - but that would be something to look into next.Just want to have some thoughts on the interning removal here ;-)