FoundationDB

Copying data from one "table" to another


(Thomas Johson) #1

This may be incredibly basic but I’m having issues to figure it out on my own.

I have this structure
component/sub/type/key = timestamp
so for example
rack/test-1/measurement1/space = 1541879017

Now I want to copy data from one subspace to another, e.g. rack/test-1/measurement1 -> rack/test-1/measurement2/space = 1541879017

I have millions of keys in the test-1 and I want to get them to test-2. So I wrote this go code that reads through the range and copies data into the next one. Because I have 5s to finish up the transaction, I copy only smaller % of the ranges. However, I don’t know how to define the end range properly, so I use 0xFF. Works fine, but it never finishes. It essentially runs forever.

So my questions, is how to properly define the end range?

func main() {

	fdb.MustAPIVersion(510)
	db, err := fdb.Open("./fdb.cluster", []byte("DB"))
	if err != nil {
		panic(err)
	}

	dir, err := directory.CreateOrOpen(db, []string{"rack"}, nil)

	if err != nil {
		panic(err)
	}

	sub := dir.Sub("test-1")

	var lastKey string
	var i int
	for {
		_, err = db.Transact(func(tr fdb.Transaction) (ret interface{}, err error) {

			ri := tr.GetRange(fdb.KeyRange{
				Begin: sub.Pack(tuple.Tuple{"measurement1", lastKey}),
				End:   fdb.Key{0xFF},
			}, fdb.RangeOptions{Limit: 10000}).Iterator()

			for ri.Advance() {
				kv := ri.MustGet()
				r, err := sub.Unpack(kv.Key)
				if err != nil {
					return nil, err
				}
				lastKey = r[1].(string)

				tr.Set(sub.Pack(tuple.Tuple{"measurement2", lastKey}), []byte{})
			}

			return
		})
		if err != nil {
			panic(err)
		}

	}
}

(Alec Grieser) #2

The way the range is set up, it will scan from the beginning of your table all the way to the end of the database, whereas I think you only want your scan to go to the end of the table. In some of the other languages, there is a method on subspaces to get the range of keys corresponding to that subspace. That doesn’t appear to be the case in our Go bindings, for whatever reason, but something like:

begin := sub.Pack(tuple.Tuple{"measurement1", lastKey})
end := append(sub.Pack(tuple.Tuple{"measurement1"}), 0xff)

Should work to define your begin and end ranges so that it doesn’t go all the way to the end any more. Also, I think there’s a bug where keys right at the end of ranges get written twice (which isn’t the end of the world because the operation is idempotent). I think you could fix that by adding a zero byte to the end of your beginning range. Something like:

begin := append(sub.Pack(tuple.Tuple{"measurement1", lastKey}), 0x00)
end := append(sub.Pack(tuple.Tuple{"measurement1"}), 0xff)

(Thomas Johson) #3

That is very helpful. I don’t think I would figured this one on my own.

One other question.

Is there a way to scan in reverse, or from the end to the beginning?


(Alec Grieser) #4

Yeah, you can use the RangeOptions class and specify that the range should be read in reverse. Something like:

ri := tr.GetRange(fdb.KeyRange{Begin: begin, End: end},
                            fdb.RangeOptions{Limit: limit, Reverse: true}))

Then much of the logic for applying the operation across multiple transactions remains the same, except you keep Begin set to sub.Pack(tuple.Tuple{"measurement1"})) and end is the thing you set to sub.Pack(tuple.Tuple{"measurement1", lastKey})). Note that the end of the range is always exclusive, so you don’t need to do the thing where you add something to the key to make sure it doesn’t get read/added twice.


(Thomas Johson) #5

Fantastic. Thank you


(A.J. Beamon) #6

Subspace implements ExactRange, so it can be passed in as the first argument of a GetRange call. You can also call subspace.FDBRangeKeys() to get the begin and end keys of the range.