FoundationDB

Using unsigned 64-bit integers with the Go tuple package


(Justin Lowery) #1

I noticed that the Go tuple package does not support encoding/decoding uint64 values.

The Java implementation appears to first decode all integers into BigInteger, which supports arbitrary precision, and then converts the value to Long if the value is within range.

The Go implementation currently decodes all integers to int64, whereas *big.Int from the math/big package would appear to be a better equivalent to use in Go, which might then return the appropriate integer type based on the range of the value, or the result of bigVal.IsInt64(), etc.

If this were implemented, I imagine that the returned types might be limited to int64, uint64, and *big.Int for simplicity.

Has this been attempted before in Go with some sort of unexpected results returned, or would this appear to be a decent solution by handling cases similarly to the Java tuple implementation?

Edit: I submitted a PR that adds uint64 support.


(Alec Grieser) #2

I reviewed your PR, and I think the most relevant comment I had for this space is that there is some concern that because there isn’t any distinction in the type codes between a uint64 and an int64, then there has to be some amount of type merging one way or the other (in other words, because of types, the encode composed with decode does not form the identity). We already crossed that Rubicon in the Java bindings, so maybe it’s not such a big deal to cross it with the Go bindings, though.

I don’t think we’d be opposed to using big.Int from math/big in the Go tuple implementation. The Java bindings didn’t always use big integers (they used Longs), but when they were updated, we just didn’t also update the Go bindings. My main concern would be that, memory and performance wise, using big integers could have more overhead than one might want. (This issue already exists in the Java bindings, and it’s something that we should probably address sooner rather than later.) But handling larger integers in the general case sounds fine to me.


(Justin Lowery) #3

I will leave the inline replies on GitHub, though I’ll post my reply to your summary comments here as it’s more discussion-oriented in some ways:

  1. I think this probably needs additional tests. There is a tuple test that was recently added, the tuple_test.go file, that should probably be updated with a uint64 example. It would also be great if the bindingtester for go could be made to generate uint64 types, but that’s a little hard given that it can’t support negative numbers with 64 bits of information.

I did manually add a test for it in a local stash, though seeing as it appears to expect the test data to be manually verified, I had left this uncommitted until I received some sort of feedback and set aside some time to make sure that the end result does not interfere in any way with the types that I may not be using in my keys.

  1. I think you ran into this (and I had kind of meant to ask this to you on the forums but it slipped my queue), but I’m a little concerned that it doesn’t honor the contract that if I give you x and you tuple pack it and unpack it, I always get x back (in that I could give the tuple encoder a uint64 containing, say, the integer 1066, encode, decode, and then I get the int64 1066 back). We already don’t quite honor that contract, so maybe it’s not a huge deal, but it’s something I’m somewhat concerned about.

You are right, I did have to pick a side with this. These changes externally behave similarly to the Java implementation, returning an int64 (Long) when it is in range, preserving the existing behavior regarding int64 values.

At first, I had decodeInt return an int64 for only its min value through values of -1, and return uint64 for all of the values that the type can represent. In practice, there are int64 type assertions in the directory package that cause a panic using this approach.

I tend to default to using the switch v := v.(type) pattern and specify type cases that I expect, though it is generally reasonable in Go for someone to simply assert an int64 type, like in the directory package, so there could also be existing user code that may break due to int64 type assertions.

I considered adding a new type code and updating the bindings. You probably already realize this, though like the first approach in Go, it wouldn’t be backwards compatible, affecting Java users that might attempt to decode existing data.

So for those reasons, and the way that the Java implementation was already converting BigInteger to Long where possible, I found the current behavior to be the most intuitive.

I do have to use the switch v := v.(type) pattern in situations where the value range spans sections of the multiple int types, though I understand the big picture and why this appears to be a reasonable compromise, so it doesn’t bother me personally, though I can see this as being a con of the current approach.

I don’t think we’d be opposed to using big.Int from math/big in the Go tuple implementation. The Java bindings didn’t always use big integers (they used Longs), but when they were updated, we just didn’t also update the Go bindings. My main concern would be that, memory and performance wise, using big integers could have more overhead than one might want. (This issue already exists in the Java bindings, and it’s something that we should probably address sooner rather than later.) But handling larger integers in the general case sounds fine to me.

I now see that using BigInteger makes more sense in Java, as the 64-bit unsigned Long type is a more-recent addition. If the overhead is enough to be noticeable in the Java implementation, and you might consider deprecating support for BigInteger, in favor of using a byte string for example, it would make sense to me if implementing big.Int were skipped in favor of encouraging the use of byte slices.

That would of course raise the issue of backwards compatibility, though there might not be nearly as many users who would rely on BigInteger encoding/decoding who require sizes larger than the now current built-in types. That is purely anecdotal, and I am sure a scientist somewhere would disagree, so some sort of data or at least a poll raising the issue of performance costs might be helpful. I am not opinionated here for this reason.

I am interested if you have any thoughts on other potential approaches after reading my comments. I’ll reply to your inline comments shortly.

Thanks for quickly reviewing this PR!