Why is my tcp not reliable? (2009)

Animats · on April 2, 2017

It's a Berkeleyism. It's not TCP, it's old Berkeley Sockets semantics. The TCP protocol supports half-close. When you're done writing you're supposed to close your write side, then read until you get an EOF. But the Berkeley people didn't support that.

Linux does support it, with the "shutdown(2)" call.[1] When you're done writing, you shut down the write side. The other end sees an EOF, and they close. You read to EOF, and you close. If shutdown and close return normally, all data was delivered. Assuming this was implemented right.

[1] http://man7.org/linux/man-pages/man2/shutdown.2.html

cperciva · on April 2, 2017

It's a Berkeleyism. It's not TCP, it's old Berkeley Sockets semantics. [...] Linux does support it, with the "shutdown(2)" call.

The shutdown(2) system call was added to BSD on January 8th, 1983: https://svnweb.freebsd.org/csrg?view=revision&revision=10208

So... Berkeley Sockets supported this a bit more than 8 years before Linux even existed. This is totally not a Berkeley Sockets problem.

ahubert · on April 2, 2017

What you describe (the shutdown technique) is exactly what my linked blogpost describes, by the way. If this is the fault of BSD, or the sockets API definition is indeed an interesting question. Is RFC 1122 TCP/IP or a BSD document? Anyhow, use the shutdown technique or even better, explicit length or chunked.

TwoBit · on April 1, 2017

So if we could go back in time, could we create an interface that is less ridden with pitfalls?

toast0 · on April 2, 2017

The interface could be better, but depending on what you're doing, you likely want a protocol ack, not a network ack anyway. Ex: If you send a file, you probably want to know that the file write succeeded, not just that the program that's supposed to write the file received it.

In today's world of janky TCP optimizing middleboxes, an ack just means you don't need to resend it, not that it arrived at the kernel of your peer; and it never meant the program at your peer read it.

lkrubner · on April 2, 2017

Do we need to go back in time? Can't we just start now, in 2017, to build something better than what they finalized on back in the spring of 1978? Don't we know more now than we did then? What are the forces that hold us back? If those forces are political, can we fight them?

Chai-T-Rex · on April 2, 2017

There are already things like SCTP.

JdeBP · on April 2, 2017

Or we could read http://cr.yp.to/tcpip/twofd.html .

dom0 · on April 2, 2017

The nice thing is that we have applications that give us this. E.g. this is literally how ssh(1) works when used as a transport layer. (You have stdin, stdout, and also stderr). The only issue is some in-band signaling on stderr and error codes when SSH itself has problems.

benchaney · on April 2, 2017

I've always thought that this is ridiculous. Once a TCP socket reports that data has been sent, it should continue working to send it until every bytes has been acked (with the exception of an unrecoverable failure case). Also, I wish there were an API to detect how much data still hasn't been acked.

gmazza · on April 2, 2017

> I wish there were an API to detect how much data still hasn't been acked.

From example code attached to the original article (Linux only):

  int outstanding;
  ioctl(fd, SIOCOUTQ, &outstanding);

wyldfire · on April 2, 2017

You have to be pretty careful about what conclusions you draw from this info. It means relatively little about what your actual peer has seen. It tells you a some info about the channel between you and your peer, though.

If you need detailed feedback, you need to implement a feature that closes the loop.

astrobe_ · on April 2, 2017

I guess that select() already had a meaning so it couldn't be used for this?

p4bl0 · on April 1, 2017

Very interesting read. Thanks for sharing.