[Data Modeling] Efficient encoding for WGS84-coordinate key IDs

Hello, and thank you all for rereleasing FoundationDB as an open source project.

I am reading about and planning to use FoundationDB, with one of the uses being to store data that would be referenced by a key containing an ID returned from a latitude/longitude-based R-tree query, from within a client application.

Is there is a consensus on the appropriate type of key IDs to use for retrieving location-specific data for individual values based on WGS84 coordinates?

I ask because I’ve found conflicting info about the behavior of tuple encoding in the Go library/package, so I am curious about how that will work in the near future.

I don’t believe that my use-case would require key range queries, as any queries would be handled by the client application. Some precision-loss, such as when using a Geohash string with a precision of 12, would be acceptable.

So for example, would it be more space-efficient to use a tuple type (37.3316812, -122.0301672), or to use a Geohash string 9q9hrhh2g1b2?

Another, more direct way of phrasing this:
In this case, would the FDB client parse and encode the values contained within the tuples as the bytes representing the individual single/double precision floats, or would it encode the character values as an array of UTF-8 bytes?

To add to this, it says in the Go docs:

  • “FoundationDB tuples can currently encode byte and unicode strings, integers and NULL values.”

while under the TupleElement type it says:

  • “The valid types for TupleElement are []byte (or fdb.KeyConvertible), string, int64 (or int), float, double, bool, UUID, Tuple, and nil.”

Since posting this, I have looked at the code and it seems that the confusion stemmed from a doc comment that wasn’t updated with the code itself. Tuple encoding does handle floats efficiently, and that was added recently, in 5.0.0.

1 Like

Not answering with any authority, but I would think it would be convenient to use geohashes as you could potentially do prefix scans to find entries in a particular bounding box very cheaply.

Geohashes are on a Z order curve, which is a great way to do spatial indexing in FDB. If you care, you could get a little more precision in a little less space by skipping the base 32 encoding part and storing them in binary form.

If you don’t care about range queries, basically any representation will work.

1 Like

This is a much better solution than what I was thinking of. Thanks!

I had not looked into the way that Geohashes are encoded logically, so that is a very nice surprise.

Thanks for the suggestion about skipping base32 encoding.

Oops. It looks like different parts of our docs are inconsistent. The actual state of affairs is that it can do all of the types listed in the second list (i.e., the larger list). So, it will order floats correctly (assuming that all floats are stored using the same precision–i.e., single floats will sort correctly with other single floats, and double floats will sort correctly with other double floats, but single floats will sort incorrectly with double floats).

Oh, but if you use a geo-hash instead of a tuple, maybe that doesn’t matter.

Right. In this case specifically, the locations are represented as a grid that rarely changes, so I had first considered using a traditional spatial index in my application, though now using Geohashes with FDB’s key sorting seems to be a better solution.

I opened a PR that updates the Go tuple package docs to contain the rest of the supported types.

1 Like

This might be of interest Xz-ordering: A space-filling curve for objects with spatial extension (1999)

There is an interesting article about the subject at Building a Spatial Index Supporting Rectangular Range Query using Hilbert Curve | Lobsters