SIP sets up the call. It never carries the voice.
The single most important idea on this page. SIP is a signaling protocol (RFC 3261): it finds the other party, negotiates how to talk, and ends the session. The audio itself travels separately over RTP, usually on a completely different port — and often a completely different network path.
SIP — the handshake
Text messages over UDP, TCP, or TLS, usually port 5060 (5061 for TLS). Says who wants to talk to whom, and carries an SDP body describing how.
RTP — the voice
Binary audio packets over UDP, on high ports negotiated in the SDP (e.g. 10000–20000). If a call connects but nobody hears anything, signaling worked and media failed.
Who's who in a SIP network
User Agent (UA)
Anything that originates or answers calls: a Yealink desk phone, a softphone, a WebRTC client. Acts as UAC (client) when sending a request, UAS (server) when receiving one.
Registrar
Accepts REGISTER requests and remembers where each user currently is ("alice is at 10.0.1.24:5060"). This address book is the location service.
Proxy
Routes requests toward their destination. Doesn't answer calls itself — it forwards. Hosted platforms like NetSapiens play this role (plus much more).
B2BUA
Back-to-back user agent: terminates the call on one side and re-originates it on the other, giving it full control (hold, transfer, recording). Most PBXes — 3CX included — are B2BUAs, not pure proxies.
SBC
Session Border Controller: sits at the network edge handling security, NAT traversal, and protocol normalization between carriers and the platform.
SIP Trunk
A carrier connection (Telnyx, Twilio…) delivering PSTN calls to the platform over SIP instead of physical phone lines.
Tap any line of this INVITE
A SIP request is a start line, a stack of headers, a blank line, and (sometimes) a body. This is a real INVITE — tap each line to decode it. Six headers — Via, From, To, Call-ID, CSeq, Contact — appear in every message and answer 90% of trace-reading questions.
The six core verbs (and the extensions you'll meet)
Every SIP request starts with a method — a verb saying what the sender wants. RFC 3261 defines six; extensions added more that you'll see daily in PBX work.
Core methods
INVITE
Start (or modify) a session. Carries the SDP offer. A mid-call INVITE is a "re-INVITE" — used for hold, codec changes, or moving media.
ACK
Confirms receipt of the final response to an INVITE. Completes the three-way handshake. The only request that never gets a response.
BYE
Ends an established session. Either side can send it. Answered with 200 OK.
CANCEL
Aborts a pending INVITE that hasn't received a final response yet — the caller hung up while it was still ringing. Triggers a 487 on the INVITE.
REGISTER
Tells the registrar where you are. Refreshed periodically (the expires value). Phone shows "offline"? Check registration first.
OPTIONS
Asks "what do you support?" — but in practice it's SIP's ping. Trunk keep-alives and monitoring almost always use OPTIONS.
Extensions you'll actually see
REFER
"Please call this other party" — the mechanism behind call transfer. The Refer-To header carries the target.
SUBSCRIBE / NOTIFY
Event packages: BLF lamp state, voicemail message-waiting (MWI), presence. SUBSCRIBE asks, NOTIFY delivers.
UPDATE
Modify session parameters before the call is fully answered (early dialog), without a full re-INVITE.
PRACK
Reliable provisional responses (RFC 3262): acknowledges a 180/183 the way ACK acknowledges a 200.
INFO
Mid-call application info. Historically used for DTMF (now mostly RFC 4733 telephone-events inside RTP instead).
MESSAGE
Instant messaging over SIP — pager-mode text without setting up a session.
Read the first digit, then the rest
Responses are three-digit codes, HTTP-style. The first digit is the class — it tells you instantly whether things are progressing, done, or broken. Filter by class; these codes cover nearly every trace you'll read.
Watch a call happen
Pick a scenario and step through it. Each arrow is a real SIP message — tap any arrow (or just step forward) to inspect the actual packet below the ladder. This is exactly how you'll read traces in NetSapiens, sngrep, or Wireshark.
Press Play or Next to begin.
The body that negotiates the media
SIP carries a Session Description Protocol body (RFC 8866) to negotiate media as an offer/answer exchange: the INVITE offers ("here's where to send my audio, and the codecs I speak"), the 200 OK answers. Tap each line.
How a phone proves who it is
SIP never sends the password. It uses digest authentication: the server issues a one-time challenge (a nonce), and the client returns a hash that could only be computed with the password. Run the "Registration with auth" scenario in Module 05 to watch it live.
The 401 dance
First REGISTER goes out bare → registrar replies 401 Unauthorized with a WWW-Authenticate header carrying the realm and nonce → client computes the digest and resends REGISTER with an Authorization header → 200 OK. Proxies do the same dance on INVITEs using 407 and Proxy-Authenticate.
The math (MD5 digest)
HA2 = MD5( method : request-uri )
response = MD5( HA1 : nonce : HA2 )
# with qop=auth, a nonce-count and client nonce join the final hash —
# that's what stops an attacker replaying a captured response.
Two registration facts that save support tickets: the expires value is how long the binding lives (phones re-register at roughly half of it), and a 403 Forbidden after correct credentials usually means a policy block — not a wrong password (a wrong password just gets re-challenged with 401, forever).
Why SIP and NAT hate each other
SIP writes IP addresses inside its messages — and a NAT router only rewrites the packet headers, not the SIP payload. So a phone behind NAT confidently tells the world to reach it at 192.168.1.50. The world cannot.
The three lies a NATed phone tells
Via header
Responses route back via this address. Private IP here → responses go nowhere → retransmissions and dropped calls.
Contact header
Where in-dialog requests (ACK, BYE) should go. Private IP here → BYEs never arrive → ghost calls that won't hang up.
SDP c= / m= lines
Where to send RTP. Private IP here → audio sent into the void → the classic one-way audio ticket.
The toolbox
rport (RFC 3581)
The client adds ;rport to its Via; the server replies to the actual source IP and port it observed, not the address written in the header.
STUN
"What's my public address?" — the client asks an outside server and writes the public mapping into its messages. Works for most NATs; fails on symmetric NAT.
TURN
When direct media is impossible, relay RTP through a server. Costs bandwidth but always works. The fallback of last resort.
ICE
Tries every candidate pair — host, STUN-derived, TURN relay — and picks the best path that actually works. Mandatory in WebRTC.
SBC / far-end NAT traversal
The platform's border element ignores the addresses written in SIP/SDP and latches signaling and media to wherever packets actually arrive from.
Keep-alives
NAT mappings expire when idle. Short registration expirations, OPTIONS pings, or CRLF keep-alives hold the pinhole open so inbound calls still reach the phone.