Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document timestamp encoder precision fix #1623

Open
wants to merge 9 commits into
base: series/0.18
Choose a base branch
from

Conversation

msosnicki
Copy link
Contributor

@msosnicki msosnicki commented Dec 2, 2024

PR Checklist (not all items are relevant to all PRs)

fixes #1622

  • Added unit-tests (for runtime code)
  • Added bootstrapped code + smoke tests (when the rendering logic is modified)
  • Added build-plugins integration tests (when reflection loading is required at codegen-time)
  • Added alloy compliance tests (when simpleRestJson protocol behaviour is expanded/updated)
  • Updated dynamic module to match generated-code behaviour
  • Added documentation
  • Updated changelog

DNumber(epochSecondsWithNanos)
val bd = (BigDecimal(ts.epochSecond) + (BigDecimal(
ts.nano
) * BigDecimal(10).pow(-9))).underlying.stripTrailingZeros
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

potential issue: this is different from the current behavior in the regular encoders, because it strips all the trailing zeros, rather than setting the scale to 9 at all times. Based on this comment jsoniter will respect the scales set on the BigDecimal, so striping all trailing zeros seem like a more general solution, rather than doing that only if nano == 0.

Please let me know what you think, it definitely has to be consistent across both visitors, so I will adjust once this discussion is resolved.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with stripping trailing zeros, as long as the adjustment on the jsoniter-based codecs is not too complicated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps I missed something, but to me it looks like it's as straighforward as the Document fix.

@msosnicki msosnicki changed the title Fix encoder Document timestamp encoder precision fix Dec 3, 2024
@kubukoz
Copy link
Member

kubukoz commented Dec 6, 2024

@Baccata can you take a look?

} else {
out.writeVal(BigDecimal(x.epochSecond) + x.nano / 1000000000.0)
}
val bd = BigDecimal(x.epochSecond) + (x.nano * BigDecimal(10).pow(-9))
Copy link
Contributor

@ghostbuster91 ghostbuster91 Dec 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: shouldn't it be BigDecimal(x.epochSecond) + (BigDecimal(x.nano) * BigDecimal(10).pow(-9))?

otherwise we might lose some information due to overflow before the conversion to BigDecimal happens.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The result of this multiplication is already a BigDecimal, so I think it is fine

@Baccata
Copy link
Contributor

Baccata commented Dec 10, 2024

I'm happy with the change. Did you want to do anything else on this PR ?

@plokhotnyuk
Copy link
Contributor

plokhotnyuk commented Dec 10, 2024

I stored the following script in timestamp.scala and ran it using scala-cli --power timestamp.scala:

//> using jmh
//> using jmhVersion 1.37
package bench

import java.util.concurrent.TimeUnit
import org.openjdk.jmh.annotations.*

@BenchmarkMode(Array(Mode.Throughput))
@OutputTimeUnit(TimeUnit.SECONDS)
@Warmup(iterations = 2, time = 1, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 2, time = 1, timeUnit = TimeUnit.SECONDS)
@State(Scope.Thread)
@Fork(1)
class Timestamp {

  @Param(Array("123456789", "123456000", "123000000", "0"))
  var nano: Int = 123456789

  var epochSecond: Long = System.currentTimeMillis() / 1000

  @Benchmark
  def pow() = BigDecimal((BigDecimal(epochSecond) + (nano * BigDecimal(10).pow(-9))).underlying.stripTrailingZeros)

  @Benchmark
  def scale() = BigDecimal(java.math.BigDecimal.valueOf(epochSecond).add(java.math.BigDecimal.valueOf(nano.toLong, 9).stripTrailingZeros))
}

On Intel® Core™ i7-11800H I got the following result using JDK 21:

Benchmark           (nano)   Mode  Cnt          Score   Error  Units
Timestamp.pow    123456789  thrpt    2     689083.298          ops/s
Timestamp.pow    123456000  thrpt    2     753617.071          ops/s
Timestamp.pow    123000000  thrpt    2     745549.753          ops/s
Timestamp.pow            0  thrpt    2     768658.620          ops/s
Timestamp.scale  123456789  thrpt    2  184984953.592          ops/s
Timestamp.scale  123456000  thrpt    2  100797574.077          ops/s
Timestamp.scale  123000000  thrpt    2   83752128.337          ops/s
Timestamp.scale          0  thrpt    2  204584419.459          ops/s

@msosnicki
Copy link
Contributor Author

@plokhotnyuk

The version you posted unfortunately breaks the bootstrappedJS/DocumentSpec (possibly something that could be reported to Scala JS)

Below you can find the results for the version that passes all tests, it behaves slightly worse, but still much better than the pow alternative:

@Benchmark 
def scalaBd() = BigDecimal(epochSecond) + BigDecimal(nano * 1/1000000000.0).underlying().stripTrailingZeros()
Benchmark             (nano)   Mode  Cnt          Score   Error  Units
Timestamp.pow      123456789  thrpt    2     657035.513          ops/s
Timestamp.pow      123456000  thrpt    2     716802.325          ops/s
Timestamp.pow      123000000  thrpt    2     714326.181          ops/s
Timestamp.pow              0  thrpt    2     719915.018          ops/s
Timestamp.scalaBd  123456789  thrpt    2   10934885.665          ops/s
Timestamp.scalaBd  123456000  thrpt    2    9444356.670          ops/s
Timestamp.scalaBd  123000000  thrpt    2    9101303.611          ops/s
Timestamp.scalaBd          0  thrpt    2   51482186.648          ops/s
Timestamp.scale    123456789  thrpt    2  164914297.116          ops/s
Timestamp.scale    123456000  thrpt    2   84470614.413          ops/s
Timestamp.scale    123000000  thrpt    2   69055873.696          ops/s
Timestamp.scale            0  thrpt    2  184479155.075          ops/s

measured on JDK 21, AMD Ryzen 7 3700X.

EDIT: unfortunately this one breaks the same test for Scala Native

@plokhotnyuk
Copy link
Contributor

plokhotnyuk commented Dec 11, 2024

The workaround for Scala.js and Scala Native is using of yet more optimized version:

BigDecimal({
   val es = java.math.BigDecimal.valueOf(epochSecond)
   if (nano == 0) es
   else es.add(java.math.BigDecimal.valueOf(nano, 9).stripTrailingZeros)
})

Here are results of benchmarks:

Benchmark              (nano)   Mode  Cnt          Score   Error  Units
Timestamp.pow       123456789  thrpt    2     748598.854          ops/s
Timestamp.pow       123456000  thrpt    2     672352.329          ops/s
Timestamp.pow       123000000  thrpt    2     749301.419          ops/s
Timestamp.pow               0  thrpt    2     695728.197          ops/s
Timestamp.scalaBd   123456789  thrpt    2   12535456.203          ops/s
Timestamp.scalaBd   123456000  thrpt    2   12591612.068          ops/s
Timestamp.scalaBd   123000000  thrpt    2   12442059.688          ops/s
Timestamp.scalaBd           0  thrpt    2   59858539.270          ops/s
Timestamp.scaleOpt  123456789  thrpt    2  181264531.330          ops/s
Timestamp.scaleOpt  123456000  thrpt    2   92364559.709          ops/s
Timestamp.scaleOpt  123000000  thrpt    2   78387325.699          ops/s
Timestamp.scaleOpt          0  thrpt    2  203003351.828          ops/s

@msosnicki
Copy link
Contributor Author

@plokhotnyuk thanks for help, all green now

@plokhotnyuk
Copy link
Contributor

plokhotnyuk commented Dec 11, 2024

@msosnicki I've provided a serialization method for timestamps in the 2.32.0 release of jsoniter-scala, so after version upgrade you can serialize timestamps without any allocations or redundant CPU overhead as here:

def encodeValue(x: Timestamp, out: JsonWriter): Unit = out.writeTimestampVal(x.epochSecond, x.nano)

Also, DNumber creation could be improved by adding support of negative epochSecond values:

BigDecimal({
   val es = java.math.BigDecimal.valueOf(epochSecond)
   if (nano == 0) es
   else es.add(java.math.BigDecimal.valueOf({
      if (epochSeconds < 0) -nano
      else nano
   }.toLong, 9).stripTrailingZeros)
})

@msosnicki msosnicki marked this pull request as ready for review December 12, 2024 10:03
@plokhotnyuk
Copy link
Contributor

@msosnicki Please read my previous comment that I updated recently, where I propose more correct and efficient solution.

@msosnicki
Copy link
Contributor Author

BigDecimal({
   val es = java.math.BigDecimal.valueOf(epochSecond)
   if (nano == 0) es
   else es.add(java.math.BigDecimal.valueOf({
      if (epochSeconds < 0) -nano
      else nano
   }.toLong, 9).stripTrailingZeros)
})

Will add a comment about the new jsoniter method.

Quesion about the negative nano from your example, where is it documented? The same value before 1970 could be expressed in two ways: with flipping the sign of nano like you did, or adding to a negative epochSeconds.

@plokhotnyuk
Copy link
Contributor

plokhotnyuk commented Dec 12, 2024

@msosnicki @Baccata I've found only following documentation about the timestamp serialization methods: https://smithy.io/2.0/spec/protocol-traits.html#timestampformat-trait

It requires truncation to milliseconds and do not mention any support of dates before the Linux epoch (Jan 1, 1970).
But from current sources of smithy4s.Timestamp I see that we support parsing and serialization up to nanoseconds (without truncation to milliseconds) and for dates before the Linux epoch.

Also, some method implementations of smith4s.Timestamp require fixes to support negative epochSecond. For example def epochMilli: Long = epochSecond * 1000 + nano / 1000000 should be replaced by

def epochMilli: Long = epochSecond * 1000 + {
  if (epochSecond < 0) -nano
  else nano
} / 1000000

@msosnicki
Copy link
Contributor Author

msosnicki commented Dec 13, 2024

@msosnicki @Baccata I've found only following documentation about the timestamp serialization methods: https://smithy.io/2.0/spec/protocol-traits.html#timestampformat-trait

It requires truncation to milliseconds and do not mention any support of dates before the Linux epoch (Jan 1, 1970). But from current sources of smithy4s.Timestamp I see that we support parsing and serialization up to nanoseconds (without truncation to milliseconds) and for dates before the Linux epoch.

Also, some method implementations of smith4s.Timestamp require fixes to support negative epochSecond. For example def epochMilli: Long = epochSecond * 1000 + nano / 1000000 should be replaced by

def epochMilli: Long = epochSecond * 1000 + {
  if (epochSecond < 0) -nano
  else nano
} / 1000000

Yeah, I think the logic needs to change in multiple places, to correctly handle values below epoch. Will look into that

EDIT: only changed the decoding side, nanos are expected to always be positive (as in Timestamp class, tests that I added were raising constructor exceptions before).
So during decode we're always rounding down for seconds part (also for negative), no need to flip sings of nano in other places.
It is more consistent with other parts of the code, like Timestamp constructors.
For example 1969-12-31T23:59:58.123Z (-1877 in millis) is expected to be expressed as Timestamp(-2L, 123000000), rather than Timestamp(-1, 877000000).

@ghostbuster91
Copy link
Contributor

@Baccata are we good to go?

Copy link
Member

@kubukoz kubukoz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add a CHANGELOG entry for the next patch version

@msosnicki
Copy link
Contributor Author

please add a CHANGELOG entry for the next patch version

done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Timestamp inconsistent JSON representation when using Document
5 participants