While you might be slightly right, my experience tuning windows machines leads m...

derefr · on July 4, 2020

Sure. Mind you, my argument wasn't comparing Windows to these sorts of "lightning-fast" systems—(modern) Windows isn't even a contender. Nor is macOS, nor KDE or GNOME.

My points were under the mode of thought where you look at a modern system already aimed at being "fast because it's lightweight" (e.g. XFCE), and then you ask why it still feels laggy compared to BeOS/AmigaOS/etc.

Where such "lightweight" DEs do have perceivable latency, that latency mostly comes down to operations hitting the disk where these older systems didn't. (Also the input stack, yes, but that's not universal: modern hardware still has PS/2 ports, and modern OSes still access those with rather direct reads. Many gamers swear by PS/2 peripherals—though probably mostly for cargo-cult reasons.)

Some of the minimal-est Linux WMs, e.g. Fluxbox, are run entirely from memory once started, and so are comparably fast—but need an explicit restart/reload to do anything. Plus, the apps launched in the WM aren't designed with the same paradigm, so most of your experience there is still slow.

StillBored · on July 8, 2020

BTW: One of the larger contributors to desktop lag on linux continues to be the lack of integration between the scheduler and the WM. The idea that demand paging is causing a lot of general latency doesn't ring true to me. Maybe on initial startup, but once the machine get started most OS aren't doing actual disk io to satisfy random user interactions. Things like the windows sendTo list, which is actually a bunch of links in a directory (unlike most of the rest of the shell which tends to be registry based) end up cached in ram and don't actually result in disk IO (and you should delete entries you don't use). You might argue all the user->kernel crossings build these lists on the fly are a problem, but frankly unless you have a incredibly long lists of file associations, etc a few hundred API calls don't contribute to the overall lag much.

On linux (which always seem to trail on the responsiveness metrics) a lightweight DE can help, but scheduler tuning makes a even bigger difference. Just switching the power profile to performance is far more noticeable on linux than most other OSs. If your not aware of the work of Con Kolivas you should read up on the history there. Everything is a lot better than it was 15 years ago, but a number of the core problems really haven't been solved.

vidarh · on July 4, 2020

To expand on your point, one of my favourite examples of how slow a lot of software has become: A few years back I tested load time for a default Ubuntu emacs install in a console against "booting" the Linux-hosted version of AROS (an AmigaOS reimplementation) with a customized startup-sequence (AmigaOS boot script) that started FrexxEd (an AmigaOS editor co-written by the author of Curl).

AROS goes through the full AmigaOS-style boot, registering devices, and everything.

It handily beat my Emacs startup.

Now, you can make Emacs start quicker, and FrexxEd is not by default as capable (but it's scriptable with a C-like scripting language), but there's no wonder people feel software has gotten slower, because so often the defaults assume we're prepared to wait.

E.g. one of my pet peeves with typical Emacs installations: Try misconfiguring your DNS and watch it hang until the DNS lookups fail.... It's not an inherent flaw in Emacs; you can certainly prevent it from happening, but so many systems have Emacs installations where you face a long wait. Normally of course this is not a big issue, but those setups still have a DNS lookup in the critical path for startup that adds yet one more little delay.

All of these things add up very quickly. In the cases where these things are an issue in older software, they tend to either need to be explicitly enabled, or there is concurrency.

I submitted some patches to AROS years ago, to implement scrollback buffers and cut and paste in the terminal, and one of the things it really brought back is how cautious AmigaOS was in making everything painstakingly concurrent all over the place. At the cost of throughput but cutting apparent responsiveness.

E.g. when you cut and paste from a console window on AmigaOS, data about the copied region gets passed to a daemon that will write it to a clips: device. It gets passed to a separate daemon because the clips: device that the clibboards are stored to, like everything else in AmigaOS can be "assigned" to another location. By default it is stored in T: (temporary). By default T: points to a RAM disk. However, clips: or T: could very well have been reassigned to MyClipboardFloppy:. In which case copying would prompt you to insert the floppy labeled MyClipboardFloppy: after which writing the copied section would take way too long. (That actual write to the floppy would happen in yet another task)

So copying as a high level system service is handled in a separate task (thread).

Everywhere throughout the system everything that could ever potentially be slow, and on a 7.16MHz 68k machine with floppies and limited RAM that was a lot of things, would be done in separate tasks so that at least the user could just get on with other things in the meantime. "Other things" then as a consequence often meant "just keep using the current application" because the concurrency would mean a lot of these things would lazily happen in the background.

For e.g. a typical shell session, just pressing a key will involve half a dozen tasks or so, e.g. gradually "cooking" the input from a raw keyboard event, to an event for a specific window, to a event to a specific console device attached to a window, to an event to a high level "console handler" that handles complex events such as e.g. auto-complete, to the shell itself. It is inefficient in terms of throughput, but because the system will preempt high priority tasks (including user input related ones), and react with low latency and offload less latency critical tasks to other tasks, the system feels fast.

A key to this was that on a system that slow, a lot of the things that could be slow, would regularly be slow for the developers and would get fixed so that you'd opt in to the slow behaviour. E.g. I doubt most people using Emacs are ever aware if their installation is slowed down by DNS lookups on startup for example, because most of the time it's fast enough that you won't really notice that one extra little papercut.

zozbot234 · on July 4, 2020

> Everywhere throughout the system everything that could ever potentially be slow, and on a 7.16MHz 68k machine with floppies and limited RAM that was a lot of things, would be done in separate tasks so that at least the user could just get on with other things in the meantime.

BeOS also has this design. Modern programming languages and platforms do make async and event-based programming a lot more intuitive, so we might well see this paradigm make a comeback.

pjmlp · on July 5, 2020

I guess it depends also how developers get forced upon them.

Windows NT family branch has had asynchronous and event based support since the early days, being heavily multi-threaded, yet not many cared to use it properly.

To the point that Microsoft pushed for an async only world with WinRT(UWP) and it got mixed up responses. Now with Project Reunion going forward it remains to be seen how async will faire.

However at least .NET, C++20, Rust and JS now have full asynchronous support, as the main Windows desktop stacks.

On Android, Google only had AsyncTask, with multiple caveats how to use it properly, then came the fashion with RxJava, Java executors, now it seems Kotlin co-routines are the future, assuming a #KotlinFirst world, with C++20 on the NDK.

On the Apple side we have GCD still as the main workhorse.

As for the other OSes, I guess they are pretty much still the same as they always have been, so that leaves language runtimes for better async and event-based programming.

p_l · on July 6, 2020

Win32 UI libs however were heavily optimized asynchronous systems based on callbacks from the OS, where you avoided storing bitmaps up-front or anything like that, except as optional caching.

This is what enabled fast drawing despite very primitive drawing system (essentially GDI would get you a pointer to a window of VRAM, the origin of various fun graphical bugs people remember from windows, like dragging broken dialog boxes around that leave a "trace").

WinNT OTOH, has realized async support across the whole I/O stack, essentially finishing the never-finished concurrent QIO of Digital's VMS (VMS afaik to this day haven't got fully working concurrent QIO - the API is there, but if you actually enable concurrent operation too many applications die), and has a bunch of undocumented async mechanisms as well (including mostly free-form asynchronous calls from kernel to user - again a VMS invention - which are in many ways similar to POSIX signals except you aren't limited to a small table of events to attach handlers to, and they support concurrency by default)

vidarh · on July 5, 2020

Event driven is only comparable if you explicitly makes the events go on a queue and allows preemption.

For languages like javascript for example people all too often write evented code with an expectation that events will be processed 'practically immediately'. They're writing evented code, but not code designed for concurrency.

To get this right, you need to actually have the code executed with actual preemptive multitasking, and test it with random substantial delays in processing of messages.

qubex · on July 4, 2020

I’ve been thinking of this quite a lot, and I beg everybody’s pardon if I now proceed to veer off topic, but...

I’ve been thinking about the landmark announcement by Apple that surprised absolutely nobody by informing its developers and the wider world that it will be switching to “Apple Silicon”, a yet-to-be-filled placeholder for a brandname-to-be. (The only thing it almost guarantees is that officially they won’t be any reference to ‘ARM’.)

So, famously Gil Amelio quipped that he had chosen to buy NeXT “rather than Plan Be” when, panicked by declining sales and hampered by an obsolete OS, he plonked down a sizeable amount of an on-the-verge-of-bankruptcy Apple to buy a working OS to succeed their decidedly musty old eighties tech; something they hadn’t been able to develop internally. NeXTStep had excellent developer tools, a close tie to graphic design (by way of its Display PostScript), and rather Frankenstein-ish UNIX core, though I’d hesitate to call it a “beating heart”. The core OS, the kernel, and so forth, were not the main strong point and were not great performers. It’s what stood above that had value and that has, apparently, driven and undergirded much of what Apple did since then and until now.

And yet, just as they finally abandon the charade of macOS X being “Mac Os Ten” and move forward to “macOS 11”, I can’t help but think that the path they have traced themselves will lead them to need to reimplement much of their core technology in a manner that is more akin to how BeOS’s “pervasive multithreading" was conceived almost thirty years ago. What do our devices provide us with now? Real-time media streams and lag-free interactivity whilst connected to a network and running on a fairly modest platform. When I first heard Be’s motto “One Processor Per Person Is Not Enough!” I was bewitched (to the point that my teenage-self me insisted obstinately that his next PC be a dual-processor, to experience the exotic thrill of it). But now it’s normal.

Apple’s switching to its own silicon, which probably means larger grids of the cores it already has shown are so incredibly overpowered for the likes of the iPad Pro (so much so that what every iPad Pro over has secretly suspected—that their machine could comfortably run macOS and a bunch of demanding applications—has been confirmed). I heard somewhere that the A12Z that powers the current-generation iPad Pros (and now coincidentally is pulling very honourable double duty as the SoC in the Developer Transition Kit) is a 4W part, and that the current MacBook Air is apparently designed for a 16W thermal capacity, therefore we can expect something along the lines of “four times as many of whatever will go in the A14”, and that’s a fair first-order approximation. So like with the AMD case, we’re going to have a lot of cores.

They’re going to need the BeOS ethos to master all those cores. And it’ll be interesting to watch, because our collective success with GPUs and some embarrassingly-parallelisable non-graphics corollaries have given us a collective false sense of security of having somehow ‘mastered’ parallelism. Ultimately at root our systems are built on mechanisms and assumptions that only scale favourably so far. Maybe it’s time to go back and have a good look at what those folks did in the early nineties when they delivered realtime multimedia streams on interactive devices with an absolute minimum of lag and did it all with what now we’d consider a pittance of resources.

blattimwind · on July 4, 2020

> Animations and intentional delays. It can't be said, how much faster a machine feels when something like MenuShowDelay is decreased to 0, or the piles of animations are sped up.

These animations effectively increase the input lag significantly. Even with them turned off there are extra frames of lag between a click and the updated widget fully rendering.

(Everything below refers to a 60 Hz display)

For example, opening a combo-box in Windows 10 with animations disabled takes two frames; the first frame draws just the shadow, the next frame the finished open box. With animations enabled, it seems to depend on the number of items, but generally around 15 frames. That's effectively a quarter second of extra input lag.

A menu fading in takes about ~12 frames (0.2 seconds), but at least you can interact with it partially faded in.

Animated windows? That'll be another 20 frame delay, a third of a second. Without animations you're down to six, again with some half-drawn weirdness where the empty window appears in one frame and is filled in the next. (So if you noticed pop-ups looking slightly weird in Windows, that's why).

I assume these two-frame redraws are due to Windows Widgets / GDI and DWM not being synchronized at all, much like the broken redraws you can get on X11 with a compositor.

> USB is polling with a fairly slow poll interval rate (think a hundred or so ms).

The lowest polling rate typically used by HID input devices is 125 Hz (bInterval=8), while gaming hardware usually defaults to 500 or 1000 Hz (bInterval=2 or 1). Most input devices aren't that major a cause of input lag, although curiously a number of even new products implement debouncing incorrectly, which adds 5-10 ms; rather unfortunate.

https://epub.uni-regensburg.de/40182/1/On_the_Latency_of_USB...

derefr · on July 5, 2020

> For example, opening a combo-box in Windows 10 with animations disabled takes two frames; the first frame draws just the shadow, the next frame the finished open box. With animations enabled, it seems to depend on the number of items, but generally around 15 frames. That's effectively a quarter second of extra input lag.

This isn't usually what I think of when I think of "latency." Latency is, to me, the time between when the user inputs, and when the system recognizes the action.

This becomes especially problematic in situations where events get queued up, and then the extra latency causes an event to attach to something that is now in a different state than the user perceived it to be when they did the input—e.g. double-clicking on an item in a window you're closing right after telling the system to close the window, where you saw the window as open, but your event's processing was delayed until after the window finished closing, such that now you've "actually" clicked on something that was, at the time, behind the window.

On the other hand, the type of latency you're talking about—between when the system recognizes input, and when it finishes displaying output—seems much less troublesome to me.

We're not playing competitive FPS games here. Nobody's trying to read-and-click things as fast as possible, lest something horrible happen.

And even if they were, the "reading" part of reading-and-clicking needs to be considered. Can people read fast enough that shaving off a quarter-second of display time benefits them?

And, more crucially, does cutting that animation time actually cause users to be able to read the text faster? Naively you'd assume it would; but remember that users have to move their eyes to align with the text, to start reading it. If the animated version more quickly "snaps" the user's eyes to the text than the non-animated version, then in theory the user using the animated combo-box might actually be able to select an option faster!

(And remember, none of this matters for users who are acting on reflex; without the kind of recognition latency I mentioned above, the view controller for the combo-box will be instantly responsive to e.g. keyboard input, even while the combo-box's view is still animating into existence. Users who already know what they want, don't need to see the options in order to select them. In fact, such users won't usually bother to open the combo-box at all, instead just tabbing into it and then typing a text-prefix to select an option.)

p_l · on July 6, 2020

The "latency caused incorrect handling" is IMO the worst thing in all the "modern desktop is slow" complaints.

I can deal with 9 seconds latency, if I can mentally precompute the expected path taken and the results match it - this happens just by being familiar with what you're doing, and can be compared to using Vi in edit mode with complex commands.

I can't deal with 400ms lag if the result is that a different action than the one I wanted is executed.

justin66 · on July 4, 2020

USB is polling at 1000hz now if the devices support it, it's really not bad. On the hardware side, displays are really the part that still needs some work in terms of latency, and FreeSync and G-Sync are evidence that the problems are at least being considered. The hardware in the PC itself is pretty good.

On the software side, operating systems, desktop environments, GUI SDKs, frameworks, and so on could all take the problem a lot more seriously, but I wouldn't hold my breath waiting for that. There are too many people involved who believe it's reasonable to pause and play a little animation before doing what the user asked for.

cellularmitosis · on July 4, 2020

> Disabling aero on a win7 machine does wonders to its responsiveness.

Tragically, it seems the only way to completely get rid of tearing in youtube is to re-enable Aero :( (at least with nvidia hardware). RIP classic.

StillBored · on July 4, 2020

I don't know if this still works, but nvidia had a "program settings" override, where you could select a .exe and force vsync on for particular programs. Might try messing with that. I did that for media player classic a long time ago.

zozbot234 · on July 4, 2020

The thing is, you will always have tearing if you disable vsync (and compositing) for the sake of low latency. But tearing is only really perceivable when watching full-screen videos and animations, which is quite a different use case from general computer use.