I read the medium post (https://medium.com/replay-io/how-replay-works-5c9c29580c...

bhackett · on Sept 15, 2021

(Replay employee)

1. Rather than having to restore state to the point at the previous step, we can step backwards by replaying a separate process to the point before the step, and looking at the state there (this post talks about how that works: https://medium.com/replay-io/inspecting-runtimes-caeca007a4b...). Because everything is deterministic it doesn't matter if we step around 10 times and use 10 different processes to look at the state at those points.

2. We record the calls made by the browser, though it is the calls into the system libraries rather than the syscalls themselves (the syscall interfaces aren't stable/documented on mac or windows).

3. Maintaining ordering like this isn't normally necessary for ensuring that behavior is the same when replaying. In the case of memory locations, the access made by thread 2 to location B will behave the same regardless of accesses made by thread 1 to location A, because the values stored in locations A and B are independent from one another.

glass_of_water · on Sept 15, 2021

Thanks for the explanation! Do you ever run into performance issues with replaying from the start on each backward step or is this not really in issue in practice? I imagine for most websites and short replays it's probably fine, but for something like a game with a physics engine it sounds like it would be too expensive and you'd need snapshots or something. I guess that's a super small percentage of the market though.

For question 3 on the ordering, I was imagining the following kind of scenario: one thread maybe calls a system library function to read a cursor position and another calls a system library function to write a cursor position. So even though they're separate functions, they interact with the same state. Do you require users to manually call to the recorder library to give the recorder runtime extra info in this kind of scenario? Sorry if this is a dumb question, I haven't really done any programming at this level.

bhackett · on Sept 15, 2021

We definitely need to avoid replaying from the start every time we want to inspect the state at some point. This is kind of an internal detail, but we can avoid having to replay parts of the recording over and over again by using fork() to create new processes at points within the recording.

Ordering constraints between different library functions do crop up from time to time. In cases like this the recorder library uses ordered locks internally (basically emulating the synchronization which the system library has to do) to ensure that the calls execute in the expected order when replaying.

glass_of_water · on Sept 15, 2021

Oh that's cool, using fork() to create checkpoints. Thank you again for taking the time to explain!

IshKebab · on Sept 15, 2021

Thanks for the links to the blogs. I was wondering how it worked and the "How it works" bit on that page said nothing. Nice that they've explained it. It looks like the blog does answer your questions though:

> The interface which Replay uses for the recording boundary is the API between an executable and the system libraries it is dynamically linked to.

I assume the ordered locks use a global order.

glass_of_water · on Sept 15, 2021

As bhackett confirmed, you're right about recording at the system library call level. I wasn't sure if it was more of an analogy or only referred to a version of Replay targeting backend servers written in other languages like Go, especially since the author mentioned hooking into the JS runtime in https://medium.com/replay-io/effective-determinism-54cc91f56.... But it looks like I misunderstood, and their browser product is their generic record/replay library integrated into Firefox, rather than a reimplementation of the same concepts.