> For example the C++ threads are limited only to the features provided by the POSIX pthreads, even if most operating systems have additional features for multi-threading.
What facilities are you thinking of?
> PL/I on the other hand, already in 1965 had more powerful features than the C++ threads.
Given that the term "threads" originates in the 65/66/67 era, this is an interesting claim.
The term "thread" has become fashionable only after 1990.
Nobody used the terms "thread" or "multi-threading" when PL/I was designed. Everybody used the terms "task" and "multi-tasking", with the same meaning.
So in PL/I you have "tasks" for what are now called "threads".
The most annoying feature of the C++ thread support library, which is inherited from the POSIX pthreads specification, is the pathetic operation join which waits for the termination of a designated thread.
This is almost never what you want. The most frequent case is to wait for the termination of anyone of a group of threads, which cannot be done with join.
Already in 1965, PL/I had a wait that could do anything that can be done with WaitForMultipleEvents/WaitForSingleEvent, i.e. it could wait for any or all of any combinations of a set of events, e.g. thread terminations.
This is such a basic feature that I cannot understand how it could have been omitted from the POSIX pthreads and in consequence also from the C++ thread support library. The other features of pthreads/C++ that can be used to implement such a functionality, e.g. condition variables, also suck badly in comparison with a good wait for events.
> This is almost never what you want. The most frequent case is to wait for the termination of anyone of a group of threads, which cannot be done with join.
I also have to say that this doesn't match my experience at all. Typically, my threads run for the whole duration of the program/object and I only need to join them when the program/object is done. In fact, this pattern is so common that C++20 added std::jthread which automatically joins the thread in the destructor.
I literally never had to wait for "the termination of anyone of a group of threads". For you it's apparently the most frequent case. This only teaches us not to make too broad assumptions. The world of programming is large.
Your pattern is normal when you do not need to create a large number of threads.
When you have enough work for a very large number of threads, there are 2 main styles to do it.
Both styles avoid creating an excessive number of active threads in comparison with the number of available cores, because that can reduce the performance a lot.
One style is to have a permanent pool of working threads and to use some means of communication between threads to know when any thread has finished its current job, to give a new piece of work to that thread. This is also done in the simplest way with some kind of waiting for any of multiple events in the dispatching thread, where the event is that some thread has signaled the end of its job.
This style is appropriate for environments where thread creation and destruction is expensive, like Windows.
When the thread creation and destruction is cheap and also the indvidual pieces of work are relatively large, requiring an execution time much greater than the thread creation/deletion time, a second style of programming is much simpler.
You just create one thread for each piece of work, until you reach the limit for the maximum number of active threads. Then you wait until any thread terminates and then you create a new thread for the next piece of work. When there is no more work, you wait until all threads terminate.
In conclusion, any kind of work where you could launch an arbitrarily high number of parallel threads needs to wait for any of a group of events.
For all problems with limited parallelism, where you can create only few threads, you do not need this feature, but my work was always about problems where I could launch much more threads than the available hardware cores, where some kind of wait for multiple events is mandatory.
It's clear that you know a lot about this stuff, but I think maybe not quite enough.
The Linux kernel retains the term "task" for "execution context", and the term "thread" is used exclusively in user space purely because of programmer conventions (pthreads, specifically). This makes a limited amount of of sense because "thread" has also come to mean things that are very definitely not full execution tasks (user-space and/or so-called "green" threads). This overloading of the terminology in user-space is not reflected in kernels, but allows for developers to play with alternate implementations without changing their fundamental terminology. After all, there some things where user-space threads are precisely what you want (and these absolutely do not correspond to kernel tasks).
> This is almost never what you want.
I would beg to differ. I've been writing multithreaded code in C++ for nearly 30 years, and I don't remember any time that I wanted to join on a group of threads. My current project (20 years old) is heavily multithreaded (typically 20-60 threads) and there is nowhere that a thread primitive to join on a group of threads would make our lives easier. The only context where I could even think of using it might be to wait on a thread pool, but in general this tends to be unnecessary and if it is, can be trivially handled without a multi-join.
> Already in 1965, PL/I had a wait that could do anything that can be done with WaitForMultipleEvents/WaitForSingleEvent,
WaitForMultipleEvents/WaitForSingleEvent are not part of the C or C++ language, but Windows system calls. While their semantics are very powerful and useful (which obviously would thus apply to PL/1's wait), their absence cannot be laid at the feet of C or C++: no Unix-like OS has had this operation, ever (it is almost possible with contemporary Linux).
You could make the argument that it's not part of pthreads because this operation was never a part of posix, and so pthreads could not build on it (you can use those calls on Windows even in a single-threaded task, and that would be true in a POSIX system if the call existed). So it really has almost nothing to do with pthreads at all, other than that one could imagine a pthread wanting to block on "wait-for-something-anything".
> The Linux kernel retains the term "task" for "execution context"
This has historical roots from the first UNIX versions written for DEC computers. Even if the UNIX authors usually preferred the term "process", in the DEC documentation for their computers and operating systems the term "task" was always used for "process". So the term was also used in UNIX in various places.
> I don't remember any time that I wanted to join on a group of threads
This is not surprising, because there are many kinds of multi-threaded applications and many styles of programming them.
I do not doubt that what you say is correct for you applications, but my experience happened to be opposite. I have never encountered a case when I wanted to join a single thread, but I have encountered a lot of cases when I wanted to join any of a group of threads (e.g. for keeping a number of active threads matching the number of available cores) and also a less number of cases when I wanted to join all of a group of threads, typically at the end of an operation. The latter case is less important, because it can be done by repeating a join with a single thread, even if that is much less efficient than a wait that waits for all.
> are not part of the C or C++ language, but Windows system calls
This is precisely what I have already said in my first post, i.e. that standardized languages like C/C++ are forced to specify only the minimal features that are available on all operating systems, so they cannot include WaitForMultipleEvents, while PL/I was free of such portability concerns, so it could specify more powerful features.
> This is precisely what I have already said in my first post, i.e. that standardized languages like C/C++ are forced to specify only the minimal features that are available on all operating systems, so they cannot include WaitForMultipleEvents, while PL/I was free of such portability concerns,
I think you missed my point. WaitForMultipleEvents is not part of a thread API on any platform. It's a part of the platform API, and is used by single-threaded and multi-threaded code. There's no reason for pthreads (or any other thread API) to represent this system call, because the system call either exists, and can be used directly, or does not exist, and cannot be used.
In essence, you're really just noting that POSIX (not pthreads) never had a wait-for-just-about-anything API. That's a legitimate complaint, just not very relevant for multithreaded programming.
> This is not surprising
Well, given that you said "This is almost never what you want.", I'd count it as least a little surprising. My point was that multi-join is not "almost never what you want", but has always been "useful in certain contexts". I have never come across a multi-join API that blocks until all threads have completed (they typically return when any of the specified threads completes), and so the difference in efficiency for this version of multi-join is essentially identical to a loop+single-join.
>This has historical roots from the first UNIX versions written for DEC computers.
I don't see much evidence for this claim. task_t exists in early versions of AIX and Mach, and the terminology was already common in Multics (as you know). I don't think that Linux' use of task_t has any relationship to the Ultrix use, but maybe you have some specific insight here?
> Even if the UNIX authors usually preferred the term "process"
The OS I learned on was called CTOS, which used the term "process" to refer to "the basic unit of code that competes in the scheduler for access to the CPU". A "task" was essentially what we'd now call a program, complete with libraries and sub-processes. We didn't use the term "thread". I think CTOS dates to about 1981.
unless you've got 1000 cores OR are severely i/o bound, 1000 threads seems mostly useless. OTOH, "severely i/o bound" tends to describe client/server computing quite well, so maybe that's just what you need.
my stuff isn't client/server computing, my threads are generally compute bound, and having (lots) more than there are cores would be counter-productive.
1000 threads blocking on I/O operations is the same as an epoll() over 1000 file descriptors under the hood, except you also get pre-emption guarantees from the kernel.
Yeah, so basically this. Except that "I/O bound" usually implies critical moments of computation too, and pre-emption is really nice here.
What facilities are you thinking of?
> PL/I on the other hand, already in 1965 had more powerful features than the C++ threads.
Given that the term "threads" originates in the 65/66/67 era, this is an interesting claim.