Avoiding unnecessary copies from movable futures

In the following code, populateData does an unnecessary copy of the vector:

struct A {
  vector<int> data;

  ACTOR static Future<Void> populateData(A* self, int size) {
    vector<int> _data = wait(produce(size));
    self->data = std::move(_data);
  }

  static Future<vector<int>> produce(int size) {
    return vector<int>(size, 0);
  }
};

Calling wait(function_producing_future(args)) is a common pattern in FDB code, so reducing the number of copies in this scenario could potentially lead to large performance improvements, as https://github.com/apple/foundationdb/pull/2915 did. There are also other scenarios where it is possible to move the result of a future, but the actor compiler generates code that creates unnecessary copies.

The current plan is to create a new class:

template<class T>
struct MovableSAV : public SAV<T> {
 ...
};

which overrides the finishSendAndDelPromiseRef function to move the SAV contents instead of copy them. The actor compiler can then detect scenarios where it’s safe to use a MovableSAV instead of SAV. Other changes in the actor compiler can generate code that supports perfect forwarding.

This issue has already been discussed some with various people at Snowflake and Apple, but I created this forum post as a place for people to post other ideas to help reduce the number of unnecessary copies.

1 Like

I am not 100% sure but I think this should be possible without introducing this MovableSAV type. I think the following could work:

  1. Change the contract of SAV so that the future-count is decremented when the callback is called. This would mean that the caller of the callback is not allowed to keep a future or would need to increment the future-count by one more.
  2. Change the send message to something like this:
	template <class U>
	void send(U&& value) {
		ASSERT(canBeSet());
		if (futures == 1 && Callback<T>::next != this) {
			// we have exactly one future and it is a callback:
			Callback<T>::next->fire(std::forward<U>(value));
			ASSERT(Callback<T>::next == this);
			delFutureRef();
		} else {
			new (&value_storage) T(std::forward<U>(value));
			this->error_state = Error::fromCode(SET_ERROR_CODE);
			while (Callback<T>::next != this) {
				Callback<T>::next->fire(this->value());
				delFutureRef();
			}
		}
	}

This obviously will need a change in the actor-compiler. But if done correctly one could then also do the following:

Future<vector<string>> f = someActor();
vector<string> v = wait(std::move(f));

Or (the more common case):

vector<string> v = wait(someActor());

In the above case where there is only one future and it is a callback, the future-count is already decremented in the callback with SAV<T>::unwait. It also looks like the above send function would not work in cases like:

ACTOR Future<vector<int>> listen(Promise p) {
  vector<int> v = wait(p.getFuture());
  return v;
}

ACTOR Future<Void> driver() {
  Promise<vector<int>> promise;
  Future<vector<int>> future1 = listen(promise.getFuture());
  promise.send(vector<int>(1000, 0));
  vector<int> v = wait(promise.getFuture()); // this will hang forever if send simply forwards the sent value, instead of storing it.
  printf("%d\n", v.size()); // should print 1000
  return Void();
}

As long as promise references exist, we must store the value of the SAV, because future references can be created.

In the first sample code, there is also a copy into the SAV to begin with which isn’t necessary.

In produce(), in order to make a Future<T> from a T, the source object is copied even if it is an rvalue reference because Future only has a constructor for const T & and not T &&. I verified that this is what is happening by replacing the vector with a struct that prints out move/copy/construct activity.

Adding a T && constructor to Future avoids the copy in this situation - where a Future is constructed directly from an rvalue. I’m not sure how much copying this will save in practice, but unless it is somehow incorrect it can’t hurt to add.

The constructor is defined as

	Future(T&& presentValue)
		: sav(new SAV<T>(1, 0))
	{
		sav->send(std::forward<T>(presentValue));
	}

In playing around with this, I ended up going down a very deep rabbit hole. I wanted to see what other situations would avoid copying into the SAV from an rvalue, so I wrote an actor with a bunch of return situations in which I would want to avoid copying into the SAV to see if they do. What I found is lots of extra copies I didn’t expect. I traced it to that the actor compiler generates a_callback_fire() as taking the result by value instead of const reference. So I changed the actor compiler, got the results I expected, and was happy right up until I pulled master and saw that this same fix was done 11 days ago. D’oh!

That’s okay though, because in the process I learned a lot about the actor compiler generated code. I also thought of this: In this pattern,

ACTOR static Future<B> produce() {
	state B b;
	// call actors, wait for them, add stuff to b, have a great time
	return b;
}

b will be copied into the SAV. However, using std::move(b) instead will avoid the copy, and should always be done because it should always be valid, I think. When returning from an actor, the sequence is

  • initialize the SAV
  • destroy the actor state
  • call callbacks

so at return time any actor state variable could be treated as an rvalue reference because it’s about to be destroyed anyway. Can anyone think of any reason this is not the case? If not, it would be neat for the actor compiler to automatically std::move() state variables when they are returned. At the very least it could recognize the syntax return <exactStateVarName>; because even if the state variable is masked by a same-named local var moving it shoudl still be valid.

You do not need the std::move() as long as named return values are already treated as xvalues for constructor overload resolution as of C++14*. As long as Future<T>::Future(T&&) exists, that will be chosen. In this particular case, std::move() won’t harm anything. I generally recommend against it since adding std::move() will prevent NRVO whenever the return value and return type are the same (non cv-qualified) type. There are zero cases where adding std::move() will help and not zero cases when it will hurt.

EDIT: After looking at the PR, I realized that the actor compiler is compiling the return statement to a non-return statement, so none of this applies.

* This was changed in this standard defect report: http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#1579. Because it was a defect report, GCC backported it to C++11 as of GCC 5, and clang backported it even earlier.

I have a proposal. The actor compiler would recognize two forms of wait()

const T &x = wait(f);

and

T &x = wait(f);

The first would behave like what we are used to now (and we would have to update all the code to do this). The callback generated would take a const T & for the result and everything would work as it does now.

In the second form, the callback generated would take T & so the consumer can modify or std::move() it. When this callback is called, the SAV is modified somehow to indicate that further attempts to get its value are invalid, and those attempts would cause an error to be thrown, perhaps value_already_consumed.

This would be similar to writing code today where you know for certain that some wait() is the only one for some future (even if other future/promise reference exist) and so you do

T x = wait(f);
self->y = std::move(const_cast<T &>(x));

because you “know what you’re doing”. This is dangerous of course and abstraction breaking. In my proposal, this would be safer because for the cost of some small run-time overhead the fact that you read f in a context that might modify it has been recorded so future attempts to read it can be blocked.

Some thoughts…

  • This would mean of course that we will rely on run-time errors in simulation to find bad patterns, which isn’t ideal.

  • With this change, IDEs would now correctly know the type of the return of a wait() which is neat.

  • One way of implementing this would require two kinds of callbacks that can be added to the SAV, the const & receiving kind and the & receiving kind. Only one of the latter kind is allowed to be called, the rest will have their catch blocks called instead with value_already_consumed.

-When a value is sent to the SAV, the const & callbacks would be called first, which would allow non-mutating and mutating callbacks to both receive the value of a SAV, though use this pattern might be too fragile.

Thoughts?

1 Like

I’m on board with this for the improved IDE experience alone

I think this makes sense, but it sounds like a fair bit of work. Do we have any use cases in mind where we would take advantage of this?

One example is in storageserver.actor.cpp in replaceInterface:

when(GetStorageServerRejoinInfoReply _rep = wait( ... )) {
   state GetStorageServerRejoinInfoReply rep = _rep;
   ...
}

GetStorageServerRejoinInfoReply contains a vector, so there is an unnecessary copy here. With the proposed change this could be made:

when(GetStorageServerRejoinInfoReply &_rep = wait( ... )) {
   state GetStorageServerRejoinInfoReply rep = std::move(_rep);
   ...
}

I think the two types of callbacks should be:

const T &t = wait(f);

and

T &&t = wait(f);

The only advantage of passing a non-constant reference is that we can use move, and using an rvalue reference makes in clear to the waiter that this is safe.

1 Like

Very good point, I agree.

Also, perhaps Future<T> could have a .consume() method which is like get() but returns T && and marks the underlying SAV as consumed. This would enable things like:

state std::vector<Future<T>> futures;
...
wait(waitForAll(futures));
self->thing = futures[0].consume();

Or maybe this is more clear;

self->thing = futures[0].move();

Or perhaps for clarity it should return T &, so

self->thing = std::move(futures[0].consume());

The .consume() feature can avoid a copy in getAll(), but of course only when it is safe to do so. There would perhaps have to be two versions of getAll(), though I think the way we use it is normally fine for it to consume the futures in the vector passed to it.

There are probably some other tricky cases with generics we need to think about.

Actually I guess

Future<std::vector<T>> getAll(std::vector<Future<T>> &&futures)

could use .consume() and

Future<std::vector<T>> getAll(std::vector<Future<T>> futures)

would not. When the vector passed in is a local (non state) var it should use the first one which will avoid copies into the result vector. And if the vector is a state variable (for example, because it is added to across wait boundaries) then it will have to be explicitly std::move()d.

I like the idea of having this, though I have one question:

How should this SAV<T>::consume behave if there is more than one Future? I can think of multiple possibilities and none of them strikes me intuitively as more correct:

  • It could just throw an error. While this would be easy it would mean that the client needs to check the future count which seems kind of wrong.
  • It could move the object out and then throw an exception for every other future that wants to read it. However, it is super unclear to me what this would mean for callbacks.

Consider this (rather stupid but valid) example:

ACTOR Future<int>
foo(Future<vector<int>> f) {
  wait(onReady(f));
  auto v = f.consume();
  int res = 0;
  for (val : v) { res += val; }
  return res;
}

void bar() {
  Promise<vector<int>> p;
  std::vector<Future<int>> futures{ foo(p.getFuture()), foo(p.getFuture() };
  wait(delay(1.0));
  p.send(std::vector<int>{1,2,3,4,5});
  wait(allReady(futures));
}

What should happen in the program above?

In my proposal, the second call to foo() would throw an error on consume. This is the kind of issue I was referring to regarding catching issues like this at run-time in simulation. Getting a value from a future more than once via a .consume() or a consuming wait() is just not valid to write, but I’m not sure how we could reliably detect these situations at compile time.

I don’t like this though. You say the second, but in practice there’s also a problem with priorities here. So it would be the second that gets scheduled. But to make things worse we don’t actually have priorities in simulation - so the behavior will be different in simulation than on a real cluster.

If consume would fail if there’s more than one future (or if the future was created from a promise after consume was called), at least the semantics would be well defined and consistent.

Good points. I think the most strict implementation would be this:

  • Only allow .consume() when the future reference count is 1
  • Only allow a consuming callback to be added when
    • The reference count is 1
    • There are no other callbacks presently
    • There have been no callbacks, consuming or not, in the past.
  • Only allow .get() on futures where .consume() has never been called and no consuming callback has ever been added.

I think this would have the highest chance of catching misuse in simulation, though it won’t catch everything as a promise holder could get a new future at any time and try to consume or wait for it.

Just for fun I changed vector<T> getAll(vector<Future<T>>) to be abstraction-breaking and std::move(const_cast<T &>(x)) each item out of the vector of ready futures into the result vector. I was not surprised that I can still run tests without errors because I really doubt any futures in vectors sent to getAll() have any other reference holders.

I ran an instance of APICorrectness.txt, and it used 51 seconds of CPU. In that time, these are the number of calls to getAll<T>() counted by their T types.

      1 InitializeBackupReply
      1 MasterProxyInterface
      1 ResolverInterface
      3 WorkloadInterface
      4 long int
     63 std::vector<UID>
    279 Optional<Standalone<StringRef> >
    977 ResolveTransactionBatchReply
   1892 Standalone<StringRef> (vector size up to 9)
  19319 GetReadVersionReply  (vector size always 1)

This of course doesn’t represent a great deal of work difference between copy and move, so perhaps there isn’t a lot to be gained from removing copies from getAll(). I ran a few other tests and the results were similar.

As discussed in person, I think we shouldn’t use this pattern:

T &&t = wait(f);

The T&& only indicates that wait() must return an xvalue, but the variable t is itself an lvalue. Most people forget that fact, so it’s really easy to forget a std::move() and end up with an unnecessary copy when using t.

2 Likes