Computer Networking Chapter 3 Transport Layer

3.1 Introduction and Transport-Layer Services

Transport services and protocols

  • provide logical communication between app process running on different hosts
  • transport protocols run in end systems
    • send side: break app messages into segments, passes to network layer
    • rcv side: reassembles segments into messages, passes to app layer
  • more than one transport protocol available to apps
    • Internet: TCP and UDP ### 3.1.1 Relationship Between Transport and Network Layers
  • network layer: logical communication between hosts

  • transport layer: logical communication between processes

    • relies on, enhances, network layer services
  • household analogy

    ​ 12 kids in Ann's house sending letters to 12 kids in Bill's house:

    • hosts = houses
    • processes = kids
    • app messages = letters in envelopes
    • transport protocol = Ann and Bill who demux to in-house siblings
    • network-layer protocol = postal service
  • services that transport protocol provides are often constrained by the service that network-layer protocol provides

3.1.2 Overview of the Transport Layer in the Internet

  • reliable, in-order delivery (TCP)
    • congestion control
    • connection setup
    • reliable data transfer
      • flow control, sequence numbers, acknowledgements, timers
  • unreliable, unordered delivery: UDP
    • no-frills extension of “best-effort” IP
    • data delivery
    • error checking
  • services not available:
    • delay guarantees
    • bandwidth guarantees

3.2 Multiplexing and Demultiplexing

multiplexing at sender: handle data from multiple sockets, add transport header (later used for demultiplexing)

demultiplexing at receiver: use header info to deliver received segments to correct socket

  • host uses IP addresses & port numbers to direct segment to appropriate socket
    • each datagram has source IP address, destination IP address
    • each datagram carries one transport-layer segment
    • each segment has source, destination port number
Figure 3.1 Source and destination port-number fields in a transport-layer segment
Figure 3.1 Source and destination port-number fields in a transport-layer segment

Connectionless Multiplexing and Demultiplexing

1
2
3
4
5
# random port from 1024 to 65535 which not been used
clientSocket = socket(socket.AF_INET, socket.SOCK_DGRAM)

# a specific port number
clientSocket.bind(('', 19157))
  • when host receives UDP segment
    • checks destination port # in segment
    • directs UDP segment to socket with that port #
    • IP datagrams with same dest. port #, but different source IP addresses and/or source port numbers will be directed to same socket at dest

Connection-Oriented Multiplexing and Demultiplexing

  • TCP socket identified by 4-tuple
    • source IP address
    • source port number
    • dest IP address
    • dest port number
  • demux: receiver uses all four values to direct segment to appropriate socket
  • server host may support many simultaneous TCP sockets:
  • each socket identified by its own 4-tuple
  • web servers have different sockets for each connecting client
    • non-persistent HTTP will have different socket for each request

3.3 Connectionless Transport: UDP

  • "no frills", "bare bones" Internet transport protocol
  • "best effort" service, UDP segments may be:
    • lost
    • delivered out-of-order to app
  • connectionless:
    • no handshaking between UDP sender, receiver
    • each UDP segment handled independently of others
  • UDP use:
    • streaming multimedia apps (loss tolerant, rate sensitive)
    • DNS
    • SNMP
  • reliable transfer over UDP:
    • add reliability at application layer
    • application-specific error recovery
  • Why UDP?
    • finer application-layer control
    • no connection establishment
    • no connection state
    • small packet header overhead

3.3.1 UDP Segment Structure

Figure 3.2 UDP structure
Figure 3.2 UDP structure
  • defined in RFC768
  • UDP header has four fields (two bytes per field)
  • length: in bytes of UDP segment, including header

3.3.2 UDP Checksum

Goal: detect "errors" (e.g., flipped bits) in transmitted segment

sender:

  • treat segment contents, including header fields, as sequence of 16-bit integers
  • checksum: addition (one’s complement sum) of segment contents
  • sender puts checksum value into UDP checksum field

receiver:

  • compute checksum of received segment
  • check if computed checksum equals checksum field value:
    • NO - error detected
    • YES - no error detected. But maybe errors nonetheless?

3.4 Principles of Reliable Data Transfer

Figure 3.3 Reliable data transfer: Service model and service implementation
Figure 3.3 Reliable data transfer: Service model and service implementation

We'll:

  • incrementally develop sender, receiver sides of reliable data transfer protocol (rdt)
  • consider only unidirectional data transfer
    • but control info will flow on both directions!
  • use finite state machines (FSM) to specify sender, receiver

3.4.1 Building a Reliable Data Transfer Protocol

Reliable Data Transfer over a Perfectly Reliable Channel: rdt1.0

Figure3.4 rdt1.0 A protocol for a completely reliable channel
Figure3.4 rdt1.0 A protocol for a completely reliable channel
  • underlying channel perfectly reliable
    • no bit errors
    • no loss of packets
  • separate FSMs for sender, receiver:
    • sender sends data into underlying channel
    • receiver reads data from underlying channel

Reliable Data Transfer over a Channel with Bit Errors: rdt2.0

Figure 3.5 rdt2.0-A protocol for a channel with bits errors
Figure 3.5 rdt2.0-A protocol for a channel with bits errors
  • underlying channel may flip bits in packet
    • checksum to detect bit errors
  • the question: how to recover from errors
    • acknowledgements (ACKs): receiver explicitly tells sender that pkt received OK
    • negative acknowledgements (NAKs): receiver explicitly tells sender that pkt had errors
    • sender retransmits pkt on receipt of NAK
  • new mechanisms in rdt2.0 (beyond rdt1.0):
    • error detection
    • feedback: control msgs (ACK,NAK) from receiver to sender
  • a fatal flaw
    • if ACK/NAK corrupted
      • sender doesn't know what happened at receiver
      • can't just retransmit possible duplicate
    • handling duplicates:
      • sender retransmits current pkt if ACK/NAK corrupted
      • sender adds sequence number to each pkt
      • receiver discards (doesn't deliver up) duplicate pkt

Reliable Data Transfer over a Channel with Bit Errors: rdt2.1

Figure 3.6 rdt2.1 sender
Figure 3.6 rdt2.1 sender
Figure 3.7 rdt2.1 receiver
Figure 3.7 rdt2.1 receiver

sender:

  • seq # added to pkt
  • two seq. #’s (0,1) will suffice. Why?
  • must check if received ACK/NAK corrupted
  • twice as many states
    • state must “remember” whether “expected” pkt should have seq # of 0 or 1

receiver:

  • must check if received packet is duplicate
    • state indicates whether 0 or 1 is expected pkt seq #
  • note: receiver can not know if its last ACK/NAK received OK at sender

a NAK-free protocol: rdt2.2

Figure 3.8 rdt2.2 sender
Figure 3.8 rdt2.2 sender
Figure3.9 rdt2.2 receiver
Figure3.9 rdt2.2 receiver
  • same functionality as rdt2.1, using ACKs only
  • instead of NAK, receiver sends ACK for last pkt received OK
    • receiver must explicitly include seq # of pkt being ACKed
  • duplicate ACK at sender results in same action as NAK: retransmit current pkt

Reliable Data Transfer over a Lossy Channel with Bit Errors: rdt3.0

Figure3.10 rdt3.0 sender
Figure3.10 rdt3.0 sender

new assumption: underlying channel can also lose packets (data, ACKs)

  • checksum, seq. #, ACKs, retransmissions will be of help … but not enough

approach: sender waits "reasonable" amount of time for ACK

  • retransmits if no ACK received in this time
  • if pkt (or ACK) just delayed (not lost):
    • retransmission will be duplicate, but seq. #’s already handles this
    • receiver must specify seq # of pkt being ACKed
  • requires countdown timer

3.4.2 Pipelined Reliable Data Transfer Protocols

Figure3.11 Operation of rdt3.0, the alternating-bit protocol
Figure3.11 Operation of rdt3.0, the alternating-bit protocol

Performance of rdt3.0

  • rdt3.0 is correct, but performance stinks

  • e.g.: 1 Gbps link, 15 ms prop. delay, 8000 bit packet: \(D_{trans}=\frac{L}R=\frac{8000\ bits}{10^9\ bits/sec}=8\ microsecs\)

    • Usender: utilization - fraction of time sender busy sending

      \(U_{sender}=\frac{L/R}{RTT+L/R}=\frac{0.008}{30.008}=0.00027\)

    • if RTT = 30 msec, 1 KB pkt every 30 msec: 33 kB/sec throughput over 1 Gbps link

  • network protocol limits use of physical resources

Pipelined protocols

​ (acked = acknowledged)

  • pipelining: sender allows multiple, “in-flight”, yet-to-be-acknowledged pkts
    • range of sequence numbers must be increased
    • buffering at sender and/or receiver
  • two generic forms of pipelined protocols: GBN, SR

3.4.3 Go-Back-N (GBN)

Figure 3.12 Sender's view of sequence numbers in Go-Back-N
Figure 3.12 Sender's view of sequence numbers in Go-Back-N
  • sender can have up to N unacked packets in pipeline
  • receiver only sends cumulative ack
    • doesn’t ack packet if there’s a gap
  • sender has timer for oldest unacked packet
  • when timer expires, retransmit all unacked packets
  • k-bits seq # in pkt header
  • "window" of up to N, consecutive unacked pkts allowed
  • ACK(n): ACKs all pkts up to, including seq # n - "cumulative ACK"
    • may receive duplicate ACKs (see receiver)
  • timer for oldest in-flight pkt
  • timeout(n): retransmit packet n and all higher seq # pkts in window
Figure 3.13 Extended FSM description of GBN sender
Figure 3.13 Extended FSM description of GBN sender
Figure 3.14 Extended FSM description of GBN receiver
Figure 3.14 Extended FSM description of GBN receiver
  • ACK-only: always send ACK for correctly-received pkt with highest in-order seq #
    • may generate duplicate ACKs
    • need only remember expectedseqnum
  • out-of-order pkt:
    • discard (don't buffer): no receiver buffering
    • re-ACK pkt with highest in-order seq #
Figure 3.15 GBN in action
Figure 3.15 GBN in action

3.4.4 Selective Repeat (SR)

Figure3.16 SR sender and receiver views of sequence-number space
Figure3.16 SR sender and receiver views of sequence-number space
  • sender can have up to N unacked packets in pipeline
  • rcvr sends individual ack for each packet
  • sender maintains timer for each unacked packet
    • when timer expires, retransmit only that unacked packet
  • receiver individually acknowledges all correctly received pkts
    • buffers pkts, as needed, for eventual in-order delivery to upper layer
  • sender only resends pkts for which ACK not received
    • sender timer for each unACKed pkt
  • sender window
    • N consecutive seq #’s
    • limits seq #s of sent, unACKed pkts
  • sender:
    • data from above: if next vavailable seq # in window, send pkt
    • timeout(n): resend pkt n, restart timer
    • ACK(n) in [sendbase,sendbase+N]:
      • mark pkt n as received
      • if n smallest unACKed pkt, advance window base to next unACKed seq #
  • receiver:
    • pkt n in [rcvbase, rcvbase+N-1]
      • send ACK(n)
      • out-of-order: buffer
      • in-order: deliver (also deliver buffered, in-order pkts), advance window to next not-yet-received pkt
    • pkt n in [rcvbase-N,rcvbase-1]
      • ACK(n)
    • otherwise:
      • ignore
Figure3.17 SR operation
Figure3.17 SR operation

3.5 Connection-oriented Transport: TCP

defined in RFCs: 793, 1122, 1323, 2018, 2581

3.5.1 The TCP Connection

  • point-to-point: one sender, one receiver
  • reliable, in-order byte steam: no "message boundaries"
  • pipelined: TCP congestion and flow control set window size
  • full duplex data:
    • bidirectional data flow in same connection
    • MSS: maximum segment size
  • connection-oriented: handshaking (exchange of control msgs) inits sender, receiver state before data exchange
  • flow controlled: sender will not overwhelm receiver

3.5.2 TCP Segment Structure

Figure3.18 TCP segment structure
Figure3.18 TCP segment structure
  • 32-bit sequence number field and 32-bit acknowledgement number
    • reliable data transfer
  • 16-bit receive window
    • flow control
  • 4-bit header length field
    • the length of the TCP header in 32-bit words
  • optional and variable-length options filed
    • negotiate the maximum segment size (MSS)
  • 6-bit flag field
  • ACK: ACK # valid
  • RST, SYN and FIN: connection establishment (setup, teardown)
  • PSH: push data now (generally not used)
  • URG: urgent data (generally not used)

Sequence Numbers and Acknowledgement Numbers

  • Sequence numbers:

    • byte stream "number" of first byte in segment's data

      Figure 3.19 Dividing file into TCP segments
      Figure 3.19 Dividing file into TCP segments
  • Acknowledgement

    • seq # of next byte expected from other side
    • cumulative ACK
    • how receiver handles out-of-order segments
      • A: TCP spec doesn’t say, - up to implementor

Telnet: A Case Study Sequence and Acknowledgment Numbers

Figure 3.20 Sequence and acknowledgment numbers for a simple Telnet application over TCP
Figure 3.20 Sequence and acknowledgment numbers for a simple Telnet application over TCP

3.5.3 Round-Trip Time Estimation and Timeout

Estimating the Round-Trip Time

  • how to set TCP timeout value
    • longer than RTT, but RTT varies
    • too short: premature timeout, unnecessary retransmissions
    • too long: slow reaction to segment loss
  • how to estimate RT
    • SampleRTT: measured time from segment transmission until ACK receipt
      • ignore retransmissions
    • SampleRTT will vary, want estimated RTT "smoother"
      • average sevveral recent measurements, not just current SampleRTT
  • \(EstimatedRTT = (1-\alpha)*EstimatedRTT + \alpha*SampleRTT\)
    • exponential weighted moving average (EWMA)
    • influence of past sample decreases exponentially fast
    • typical value: α = 0.125
  • \(DevRTT = (1-\beta)*DevRTT+\beta*|SampleRTT-EstimatedRTT|\)

    • RTT variation, the variability of the RTT
    • typically value: β = 0.25
    Figure 3.20 RTT samples and RTT estimates
    Figure 3.20 RTT samples and RTT estimates

Setting and Managing the Retransmission Timeout Interval

  • timeout interval:
    • EstimatedRTT plus "safety margin"
    • large variation in EstimatedRTT -> larger safety margin
    • \(TimeoutInterval = EstimatedRTT + 4*DevRTT\)
    • initial TimeoutInterval: 1

3.5.4 Reliable Data Transfer

  • TCP creates rdt service on top of IP's unreliable service

    • pipelined segments
    • cumulative acks
    • single retransmission timer
  • retransmissions triggered by:

    • timeout events
    • duplicate acks
  • simplified TCP sender:

    • ignore duplicate acks
    • ignore flow control, congestion control
    Figure 3.21 Simplified TCP sender
    Figure 3.21 Simplified TCP sender

A Few Interesting Scenarios

  • Scenario 1:Figure 3.23 lost ACK scenario
    • the ACK from B to A gets lost
    • Timeout occurs, A retransmit the same segment
  • Scenario 2:Figure 3.24 premature timeout
    • neither of the ACKs arrives at HOST A before the timeout
    • Timeout occurs, HOST A retransmit the first segment and restart the timer
    • If Segment 100 arrives before the new timeout, won't be resent
  • Scenario 3: Figure 3.25 cumulative ACK
    • the ACK of the first segment is lost
    • HOST A receives ACK 120, knows HOST B received everything up through byte 119
    • HOST doesn't resend

Doubling the Timeout Interval

  • each time TCP retransmits, it sets the next timeout interval to twice the previous value

Fast Retransmit

Figure 3.26 TCP ACK Generation Recommendation
Figure 3.26 TCP ACK Generation Recommendation
  • timeout- period often relatively long
    • long delay before resending lost packet
  • detect lost segments via duplicate ACKs
    • sender often sends many segments back-to-back
    • if segment is lost, there will likely be many duplicate ACKs
  • if sender receives 3 ACKs for same data ("triple duplicate ACKs"), resend unacked segment with smallest seq #
    • likely that unacked segment lost, so don’t wait for timeout
Figure 3.27 fast retransmit after sender receipt of triple duplicate ACK
Figure 3.27 fast retransmit after sender receipt of triple duplicate ACK

3.5.5 Flow Control

receiver controls sender, so sender won't overflow receiver's buffer transmitting too much, too fast

  • receiver "advertises" free buffer space by including rwnd value in TCP header of receiver-to-sender segments
    • RcvBuffer size set via socket options (typical default is 4096 bytes)

    • many operating systems auto adjust RcvBuffer

  • sender limits amount of unacked ("in-flight") data to receiver’s rwnd value

  • guarantees receive buffer will not overflow

Figure3.28 rwnd and RCVBuffer
Figure3.28 rwnd and RCVBuffer

3.5.6 TCP Connection Management

Before exchanging data, sender/receiver "handshake":

Figure 3.29 TCP three-way handshake: segment exchange
Figure 3.29 TCP three-way handshake: segment exchange
  • agree to establish connection (each knowing the other willing to establish connection)

  • agree on connection parameters

  • Step 1: client-side sends a special segment (no application data, SYN = 1, random client_isn), a SYN segment

  • Step 2: server-host extracts the TCP SYN segment from datagram, allocates the TCP buffers and variables to the connection, sends a connection-granted segment (SYN = 1, client_isn += 1, own server_isn), a SYNACK segment

  • Step 3: client allocates buffers and variables to the connection, sends a segment to acknowledge the server's connection-granted segment (server_isn += 1, SYN = 0)

    Figure 3.30 A typical sequence of TCP states visited by a client TCPFigure 3.31 A typical sequence of TCP states visited by a server-side TCP

client, server each close their side of connection

  • send TCP segment with FIN bit = 1
  • respond to received FIN with ACK
  • on receiving FIN, ACK can be combined with own FIN
  • simultaneous FIN exchanges can be handled

3.6 Principles of Congestion Control

too many sources sending too much data too fast for network to handle

3.6.1 The Causes and the Costs of Congestion

Scenario 1: Two Senders, a Router with Infinite Buffers

Figure 3.32 Two connections sharing a single hop with infinite buffers
Figure 3.32 Two connections sharing a single hop with infinite buffers
Figure 3.33 Throughput and delay as a function of host sending rate
Figure 3.33 Throughput and delay as a function of host sending rate

Scenario 2: Two Senders, a Router with Finite Buffers

Figure 3.34 Two hosts (with retransmissions) and a router with finite buffers
Figure 3.34 Two hosts (with retransmissions) and a router with finite buffers
Figure 3.35 performance with finite buffers
Figure 3.35 performance with finite buffers
  • λ'in : also offered load
  • sender retransmission of time-out packet
    • application-layer input = application-layer output: λ'in = λin
    • transport-layer input includes retransmission: λ'in ≥ λin
  • unrealistic case: Host A sends a packet only when a buffer is free
    • no loss
    • λ'in = λin = λout
  • more realistic case: Host A resent a packet only when a packet is known for certain to be lost
  • costs of congestion:
    • more work (retrans) for given “goodput”
    • unneeded retransmissions: link carries multiple copies of pkt
    • decreasing goodput

Scenario 3: Four Senders, Routers with Finite Buffers, and Multihop Paths

Figure 3.36 Four Senders, Routers with Finite Buffers, and Multihop Paths
Figure 3.36 Four Senders, Routers with Finite Buffers, and Multihop Paths

as red λin increases, all arriving blue pkts at upper queue are dropped, blue throughput -> 0

Figure 3.37 performance with finite buffers and multihop paths
Figure 3.37 performance with finite buffers and multihop paths

when packet dropped, any upstream transmission capacity used for that packet was wasted

3.6.2 Approaches to Congestion Control

  • end-to-end congestion control
    • no explicit feedback from network
    • congestion inferred from end-system observed loss, delay
    • approach taken by TCP
  • network-assisted congestion control
    • routers provide feedback to end systems
    • single bit indicating congestion (SNA, DECbit, TCP/IP ECN, ATM)
    • explicit rate for sender to send at

3.7 TCP Congestion Control

  • approach: sender increases transmission rate (window size), probing for usable bandwidth, until loss occurs
    • additive increase: increase cwnd by 1 MSS every RTT until loss detected
    • multiplicative decrease: cut cwnd in half after loss
    • \(LastByteSent-LastByteAcked≤min\{cwnd,rwnd\}\)
    • TCP sending rate: send cwnd bytes, wait RTT for ACKS, then send more bytes
  • three guiding principles for TCP sender's rate
    • A lost segment implies congestion, and hence, the TCP sender's rate should be decreased when a segment is lost
    • An acknowledgement segment indicates that the network is delivering the sender's segments to the receiver
    • Bandwidth probing

TCP congestion control algorithm

  1. Slow Start

    • when connection begins, increase rate exponentially until first loss event:
      • initially cwnd = 1 MSS
      • double cwnd every RTT
      • done by incrementing cwnd for every ACK received
      • initial rate is slow but ramps up exponentially fast
    • loss indicated by timeout:
      • cwnd set to 1 MSS
      • window then grows exponentially (as in slow start) to threshold, then grows linearly
      • ssthresh set to cwnd/2
    • when cwnd = ssthresh, slow start ends and TCP transmission into congestion avoidance mode
    • loss indicated by 3 duplicate ACKs: TCP RENO
      • dup ACKs indicate network capable of delivering some segments
      • cwnd is cut in half window then grows linearly
    Figure 3.38 FSM description of TCP congestion control
    Figure 3.38 FSM description of TCP congestion control
  2. Congestion Avoidance

    • cwnd is roughly half its value

    • increase the value of cwnd by just a single MSS every a new ACK arrives rather than doubling the cwnd

    • TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

    • when timeout occurs, congestion avoidance behaves the same

      • cwnd set to 1 MSS
      • ssthresh set to half the value of cwnd
      Figure 3.39 Evolution of TCP's congestion window
      Figure 3.39 Evolution of TCP's congestion window
  3. Fast Recovery

    • cwnd increase 1 MSS every duplicate ACK received
    • if a timeout occurs, behaves the same as in the slow-start and congestion avoidance
    • when the loss occured, cwnd set to 1 MSS, ssthresh set to half cwnd

TCP Congestion Control: Retrospective

  • linear increase in cwnd of 1 MSS per RTT

  • a halving cwnd on a triple duplicate-ACK event

  • AIMD (additive -increase, multiplicative-decrease)

    Figure 3.40 additive -increase, multiplicative-decrease congestion control
    Figure 3.40 additive -increase, multiplicative-decrease congestion control

Macroscopic Description of TCP Throughput

  • avg. TCP throughput as function of window size, RTT
    • ignore slow start, assume always data to send
  • W: window size (measured in bytes) where loss occurs
    • avg. window size (# in-flight bytes) is ¾ W
    • avg. throuughput is 3/4W per RTT
    • avg TCP throughput = \(\frac34\frac{W}{RTT}bytes/sec\)

3.7.1 Fairness

fairness goal: if K TCP sessions share same bottleneck link of bandwidth R, each should have average rate of R/K

Figure 3.41 Two TCP connections sharing a single bottleneck link
Figure 3.41 Two TCP connections sharing a single bottleneck link

Why fair? - additive increase gives slope of 1, as throughout increases

  • multiplicative decrease decreases throughput proportionally
Figure 3.42 Throughput realized by TCP connections 1 and 2
Figure 3.42 Throughput realized by TCP connections 1 and 2
  • loss: decrease window by factor of 2
  • congestion avoidance: additive increase

Fairness and UDP

  • multimedia apps often do not use TCP
    • do not want rate throttled by congestion control
  • instead use UDP:
    • send audio/video at constant rate, tolerate packet loss

Fairness and Parallel TCP Connections

  • application can open multiple parallel connections between two hosts
  • web browsers do this
  • e.g., link of rate R with 9 existing connections:
    • new app asks for 1 TCP, gets rate R/10
    • new app asks for 11 TCPs, gets R/2

Comments

Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×