Emit events according to their arrival order
Re-order events before emitting them.
The thread execution is the same, threads are now synced before their results are collected. While this guarantees ordering (local to a partition), it might introduce some latency. I don't expect this to be significant.
I created a separate MR to ask for feedback, and because I did not run extensive integration tests on this change (yet). This approach seems to work on local tests with up to a few thousands records, but I'd want to test it with longer runs.