Transmission Control Protocol
Internet protocol suite |
---|
Application layer |
Transport layer |
Internet layer |
Link layer |
The Transmission Control Protocol (TCP) is one of the core protocols of the Internet protocol suite. Using TCP, applications on networked hosts can create connections to one another, over which they can exchange data. The protocol guarantees reliable and in-order delivery of sender to receiver data. TCP also distinguishes data for multiple, concurrent applications (e.g. web server and email server) running on the same host.
TCP supports many of the Internet's most popular application protocols and resulting applications, including the world wide web, email and Secure Shell.
Technical overview
Transmission Control Protocol (TCP) is a connection-oriented, reliable-delivery byte-stream transport layer communication protocol, currently documented in IETF RFC 793.
In the Internet protocol suite, TCP is the intermediate layer between the Internet Protocol below it, and an application above it. Applications often need reliable pipe-like connections to each other, whereas the Internet Protocol does not provide such streams, but rather only unreliable packets. TCP does the task of the transport layer in the simplified OSI model of computer networks.
+ | Bits 0 - 3 | 4 - 9 | 10 - 15 | 16 - 31 | ||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Source Port | Destination Port | ||||||||||||||||||||||||||||||
32 | Sequence Number | |||||||||||||||||||||||||||||||
64 | Acknowledgement Number | |||||||||||||||||||||||||||||||
96 | Data Offset | Reserved | Flags | Window | ||||||||||||||||||||||||||||
128 | Checksum | Urgent Pointer | ||||||||||||||||||||||||||||||
160 | Options (optional) | |||||||||||||||||||||||||||||||
192 | Options (cont.) | Padding (to 32) | ||||||||||||||||||||||||||||||
224 | Data |
Applications send streams of 8-bit bytes to TCP for delivery through the network, and TCP divides the byte stream into appropriately sized segments (usually delineated by the maximum transmission unit (MTU) size of the data link layer of the network the computer is attached to). TCP then passes the resulting packets to the Internet Protocol, for delivery through an internet to the TCP module of the entity at the other end. TCP checks to make sure that no packets are lost by giving each packet a sequence number, which is also used to make sure that the data are delivered to the entity at the other end in the correct order. The TCP module at the far end sends back an acknowledgement for packets which have been successfully received; a timer at the sending TCP will cause a timeout if an acknowledgement is not received within a reasonable round-trip time (or RTT), and the (presumably lost) data will then be re-transmitted. The TCP checks that no bytes are damaged by using a checksum; one is computed at the sender for each block of data before it is sent, and checked at the receiver.
Protocol operation in detail
TCP connections contain three phases: connection establishment, data transfer and connection termination. A 3-way handshake is used to establish a connection. A four-way handshake is used to disconnect. During connection establishment, parameters such as sequence numbers are initialized to help ensure ordered delivery and robustness.
Connection establishment (3-way handshake)
While it is possible for a pair of end hosts to initiate connection between themselves simultaneously, typically one end opens a socket and listens passively for a connection from the other. This is commonly referred to as a passive open, and it designates the server-side of a connection. The client-side of a connection initiates an active open by sending an initial SYN segment to the server as part of the 3-way handshake. The server-side should respond to a valid SYN request with a SYN/ACK. Finally, the client-side should respond to the server with an ACK, completing the 3-way handshake and connection establishment phase.
Port states
A connection progresses through a series of states: LISTEN, SYN-SENT, SYN-RECEIVED, ESTABLISHED, FIN-WAIT-1, FIN-WAIT-2, CLOSE-WAIT, CLOSING, LAST-ACK, TIME-WAIT, and CLOSED.
LISTEN: represents waiting for a connection request from any remote TCP and port.
TIME-WAIT: represents waiting for enough time to pass to be sure the remote TCP received the acknowledgment of its connection termination request. According to RFC 793 a connection can stay in TIME_WAIT for a maximum of four minutes.
Data transfer
During the data transfer phase, a number of key mechanisms determine TCP's reliability and robustness. These include using sequence numbers for ordering received TCP segments and detecting duplicate data, checksums for segment error detection, and acknowledgements and timers for detecting and adjusting to loss or delay.
During the TCP connection establishment phase, initial sequence numbers (ISNs) are exchanged between the two TCP speakers. These sequence numbers are used to identify data in the byte stream, and are numbers that identify (and count) application data bytes. There are always a pair of sequence numbers included in every TCP segment, which are referred to as the sequence number and the acknowledgement number. A TCP sender refers to its own sequence number simply as the sequence number, while the TCP sender refers to receiver's sequence number as the acknowledgement number. To maintain reliability, a receiver acknowledges TCP segment data by indicating it has received up to some location of contiguous bytes in the stream. An enhancement to TCP, called selective acknowledgement (SACK), allows a TCP receiver to acknowledge out of order blocks.
Through the use of sequence and acknowledgement numbers, TCP can properly deliver received segments in the correct byte stream order to a receiving application. Sequence numbers are 32-bit, unsigned numbers, which wrap to zero on the next byte in the stream after 232-1. One key to maintaining robustness and security for TCP connections is in the selection of the ISN.
A 16-bit checksum, consisting of the one's complement of the one's complement sum of the contents of the TCP segment header and data, is computed by a sender, and included in a segment transmission. (The one's complement sum is used because the end-around carry of that method means that it can be computed in any multiple of that length - 16-bit, 32-bit, 64-bit, etc - and the result, once folded, will be the same.) The TCP receiver recomputes the checksum on the received TCP header and data. The complement was used (above) so that the receiver does not have to zero the checksum field, after saving the checksum value elsewhere; instead, the receiver simply computes the one's complement sum with the checksum in situ, and the result should be -0. If so, the segment is assumed to have arrived intact and without error.
Note that the TCP checksum also covers a 96 bit pseudo header containing the Source Address, the Destination Address, the Protocol, and TCP length. This provides protection against misrouted segments.
The TCP checksum is a quite weak check by modern standards. Data Link Layers with a high probability of bit error rates may require additional link error correction/detection capabilities. If TCP were to be redesigned today, it would most probably have a 32-bit cyclic redundancy check specified as an error check instead of the current checksum. The weak checksum is partially compensated for by the common use of a CRC or better integrity check at layer 2, below both TCP and IP, such as is used in PPP or the Ethernet frame. However, this does not mean that the 16-bit TCP checksum is redundant: remarkably, surveys of Internet traffic have shown that software and hardware errors that introduce errors in packets between CRC-protected hops are common, and that the end-to-end 16-bit TCP checksum catches most of these simple errors. This is the end-to-end principle at work.
Acknowledgements for data sent, or lack of acknowledgements, are used by senders to implicitly interpret network conditions between the TCP sender and receiver. Coupled with timers, TCP senders and receivers can alter the behavior of the flow of data. This is more generally referred to as flow control, congestion control and/or network congestion avoidance. TCP uses a number of mechanisms to achieve high performance and avoid congesting the network (i.e. send data faster than either the network, or the host on the other end, can utilize it). These mechanisms include the use of a sliding window, the slow-start algorithm, the congestion avoidance algorithm, the fast retransmit and fast recovery algorithms, and more. Enhancing TCP to reliably handle loss, minimize errors, manage congestion and go fast in very high-speed environments are ongoing areas of research and standards development.
TCP window size
The TCP receive window size is the amount of received data (in bytes) that can be buffered during a connection. The sending host can send only that amount of data before it must wait for an acknowledgment and window update from the receiving host. The Windows TCP/IP stack is designed to self-tune itself in most environments, and uses larger default window sizes than earlier versions.
Window scaling
For more efficient use of high bandwidth networks, a larger TCP window size may be used. The TCP window size field controls the flow of data and is limited to 2 bytes, or a window size of 65,535 bytes.
Since the size field cannot be expanded, a scaling factor is used. TCP window scale, as defined in RFC 1323, is an option used to increase the maximum window size from 65,535 bytes to 1 Gigabyte. Scaling up to larger window sizes is a part of what is necessary for TCP Tuning.
The window scale option is used only during the TCP 3-way handshake. The window scale value represents the number of bits to left-shift the 16-bit window size field. The window scale value can be set from 0 (no shift) to 14.
Connection termination
The connection termination phase uses a four-way handshake, with each side of the connection terminating independently. When an endpoint wishes to stop its half of the connection, it transmits a FIN packet, which the other end acknowledges with an ACK. Therefore, a typical teardown requires a pair of FIN and ACK segments from each TCP endpoint.
A connection can be "half-open", in which case one side has terminated its end, but the other has not. The side which has terminated can no longer send any data into the connection, but the other side can.
TCP ports
TCP uses the notion of port numbers to identify sending and receiving applications. Each side of a TCP connection has an associated 16-bit unsigned port number assigned to the sending or receiving application. Ports are categorized into three basic categories: well known, registered and dynamic/private. The well known ports are assigned by the Internet Assigned Numbers Authority (IANA) and are typically used by system-level or root processes. Well known applications running as servers and passively listening for connections typically use these ports. Some examples include: FTP (21), TELNET (23), SMTP (25) and HTTP (80). Registered ports are typically used by end user applications as ephemeral source ports when contacting servers, but they can also identify named services that have been registered by a third party. Dynamic/private ports can also be used by end user applications, but are less commonly so. Dynamic/private ports do not contain any meaning outside of any particular TCP connection. There are 65535 possible ports officially recognized.
TCP development
TCP is both a complex and evolving protocol. However, while significant enhancements have been made and proposed over the years, its most basic operation has not changed significantly since RFC 793, published in 1981. RFC 1122, Host Requirements for Internet Hosts, clarified a number of TCP protocol implementation requirements. RFC 2581, TCP Congestion Control, one of the most important TCP related RFCs in recent years, describes updated algorithms to be used in order to avoid undue congestion. In 2001, RFC 3168 was written to describe explicit congestion notification (ECN), a congestion avoidance signalling mechanism. In the early 21st century, TCP is typically used in approximately 95% of all Internet packets. Common applications that use TCP include HTTP/HTTPS (World Wide Web), SMTP/POP3/IMAP (e-mail) and FTP (file transfer). Its widespread use is testimony to the original designers that their creation was exceptionally well done.
The original TCP congestion control was called TCP Reno, but recently, several alternative congestion control algorithms have been proposed:
- High Speed TCP proposed by Sally Floyd in RFC 3649
- TCP Vegas by Brakmo and Peterson at University of Arizona
- TCP Westwood by UCLA
- BIC TCP by Injong Rhee at North Carolina State University
- H-TCP] by Hamilton Institute
- Fast TCP (Fast Active queue management Scalable Transmission Control Protocol) by Caltech.
There have been many studies comparing the fairness and TCP performance of the different algorithms.
TCP Over Wireless
TCP was optimized for wired networks. Any packet loss is considered as a congestion and hence window size is reduced dramatically as a precaution. However wireless links are known to experience sporadic and usually temporary losses due to fading, shadowing, handoff etc. which cannot be considered as congestion. Erroneous back-off the window size due to wireless packet loss is followed by a congestion avoidance phase with a conservative increase which causes the radio link to be underutilized. Note that radio resources are extremely valuable in wireless communications. Extensive research has been done over this subject to combat these harmful effects. Suggested solutions can be categorized as end-to-end solutions (which require modifications at the client and/or server), link layer solutions (such as RLP in CDMA2000), or proxy based solutions (which require some changes in the network without modifying end nodes).
Hardware TCP Implementations
TCP Offload Engines
One way to overcome the processing power requirements of TCP is building hardware implementations of it, widely known as TCP Offload Engine (TOE). The main problem of TOEs is that they are hard to integrate into computing systems, requiring extensive changes in the Operating System of the computer or device. The first company to develop such a device was Alacritech.
Alternatives to TCP
TCP is not appropriate for many applications. The big problem (at least with normal implementations) is that the application cannot get at the packets coming after a lost packet until the lost packet is retransmitted. This causes problems for real-time applications such as streaming multimedia (such as Internet radio), real-time multiplayer games and voice over IP (VoIP) where it is sometimes more useful to get most of the data in a timely fasion than it is to get all of the data in order.
Also for embedded systems the complexity of TCP can be a problem. The best known example of this is netbooting which generally uses TFTP. Finally some tricks such as transmitting data between two hosts that are both behind NAT (using STUN or similar systems) is far simpler if you don't have a complex protocol like TCP in your way.
Generally where TCP is unsuitable the User Datagram Protocol (UDP) is used, This provides the application multiplexing and checksums that TCP does but does not handle building streams or retransmission giving the application developer the ability to code those in a way suitable for the situation and/or to replace them with other methods like forward error correction or interpolation.
SCTP is another IP protocol which provides reliable stream oriented services not so dissimilar from TCP. It is newer and considerably more complex than TCP so has not yet seen widespread deployment, however it is especially designed to be used in situations where reliability and near-realtime considerations are important.
See also
- TCP congestion avoidance algorithms for more on TCP Reno, TCP Vegas, TCP Westwood, BIC and Hybla
- Port numbers for a complete (growing) list of ports/services
- Connection-oriented protocol
- TCP Tuning for high performance networks
External links
- RFC793 in plain-text format
- RFC793 in HTML format
- RFC1122 some error-corrections
- RFC1323 TCP-Extensions
- IANA Port Assignments
- Sally Floyd's homepage
- John Kristoff's Overview of TCP (Fundamental concepts behind TCP and how it is used to transport data between two endpoints)
- When The CRC and TCP Checksum Disagree
- Introduction to TCP/IP - with some pictures
- The basics of Transmission Control Protocol
- Tcp/Ip port numbers. Information for Unix based system administrators
- TCP, Transmission Control Protocol
- Network World - Tech Insider
- EE Times - TCP Accelerators
- Understanding TCP Reset Attacks
- TCP/IP Sequence Diagrams
- TCP/IP
- TCP Traffic Shaper for Windows