[Data Modeling] Efficient encoding for WGS84-coordinate key IDs

umpc · May 5, 2018, 11:41pm

Hello, and thank you all for rereleasing FoundationDB as an open source project.

I am reading about and planning to use FoundationDB, with one of the uses being to store data that would be referenced by a key containing an ID returned from a latitude/longitude-based R-tree query, from within a client application.

Is there is a consensus on the appropriate type of key IDs to use for retrieving location-specific data for individual values based on WGS84 coordinates?

I ask because I’ve found conflicting info about the behavior of tuple encoding in the Go library/package, so I am curious about how that will work in the near future.

I don’t believe that my use-case would require key range queries, as any queries would be handled by the client application. Some precision-loss, such as when using a Geohash string with a precision of 12, would be acceptable.

So for example, would it be more space-efficient to use a tuple type (37.3316812, -122.0301672), or to use a Geohash string 9q9hrhh2g1b2?

Another, more direct way of phrasing this:
In this case, would the FDB client parse and encode the values contained within the tuples as the bytes representing the individual single/double precision floats, or would it encode the character values as an array of UTF-8 bytes?

To add to this, it says in the Go docs:

“FoundationDB tuples can currently encode byte and unicode strings, integers and NULL values.”

while under the TupleElement type it says:

“The valid types for TupleElement are []byte (or fdb.KeyConvertible), string, int64 (or int), float, double, bool, UUID, Tuple, and nil.”

Since posting this, I have looked at the code and it seems that the confusion stemmed from a doc comment that wasn’t updated with the code itself. Tuple encoding does handle floats efficiently, and that was added recently, in 5.0.0.

spullara · May 7, 2018, 5:47pm

Not answering with any authority, but I would think it would be convenient to use geohashes as you could potentially do prefix scans to find entries in a particular bounding box very cheaply.

dave · May 7, 2018, 9:37pm

Geohashes are on a Z order curve, which is a great way to do spatial indexing in FDB. If you care, you could get a little more precision in a little less space by skipping the base 32 encoding part and storing them in binary form.

If you don’t care about range queries, basically any representation will work.

umpc · May 7, 2018, 9:37pm

This is a much better solution than what I was thinking of. Thanks!

umpc · May 7, 2018, 9:47pm

I had not looked into the way that Geohashes are encoded logically, so that is a very nice surprise.

Thanks for the suggestion about skipping base32 encoding.

alloc · May 9, 2018, 5:22am

Oops. It looks like different parts of our docs are inconsistent. The actual state of affairs is that it can do all of the types listed in the second list (i.e., the larger list). So, it will order floats correctly (assuming that all floats are stored using the same precision–i.e., single floats will sort correctly with other single floats, and double floats will sort correctly with other double floats, but single floats will sort incorrectly with double floats).

alloc · May 9, 2018, 5:23am

Oh, but if you use a geo-hash instead of a tuple, maybe that doesn’t matter.

umpc · May 9, 2018, 2:22pm

Right. In this case specifically, the locations are represented as a grid that rarely changes, so I had first considered using a traditional spatial index in my application, though now using Geohashes with FDB’s key sorting seems to be a better solution.

I opened a PR that updates the Go tuple package docs to contain the rest of the supported types.

amirouche · May 9, 2018, 6:21pm

amirouche · December 28, 2018, 9:48pm

This might be of interest Xz-ordering: A space-filling curve for objects with spatial extension (1999)

amirouche · July 11, 2021, 6:00pm

There is an interesting article about the subject at Building a Spatial Index Supporting Rectangular Range Query using Hilbert Curve | Lobsters

Topic		Replies	Views
Impact of encoding on range read performance Using FoundationDB performance	5	1083	May 10, 2019
Small library for building geospatial layers in Go FoundationDB Layers	0	1097	May 26, 2018
Using the same keys and values across different languages Using FoundationDB	6	518	January 24, 2021
Application design using Subspace and Tuple Using FoundationDB	8	3648	July 14, 2018
Request for feedback: tuple encoding bug Using FoundationDB	12	975	January 24, 2021

[Data Modeling] Efficient encoding for WGS84-coordinate key IDs

Related topics