[asterisk-dev] PJSIP and RTP address selection

Discussion:

Jaco Kroon

2018-09-11 18:51:16 UTC

Hi,

I've got a scenario where (when using PJSIP, using chan_sip does what I
expect) PJSIP will advertise one address in the SDP during a
conversation but then start transmitting from another. In my case PJSIP
is advertising 197.96.209.1 in the SDP, but 197.96.209.251 is being used
to send.

I can manipulate that by altering the IPv4 routing table to influence
address selection.

This is due to when using PJSIP the RTP socket is bound against ANY
([::] specifically so that both IPv4 and IPv6 will function). chan_sip
on the other hand has the RTP port bound to the same address as the
transport. After discussion with Joshua on IRC it became clear that the
PJSIP behaviour may be preferred in many cases, and that things are
plainly more complicated than one would hope.

I have two potential fixes (and two that aren't practical options I
don't think but might be with knowledge I don't have) both with
advantages and disadvantages:

1. Bind the socket against the advertised address.
2. Upon receiving the first rtp, "narrow" the socket listening address
to the received "to" address.
(3.) Have the RTP sent to my primary address to begin with, not the
socket address as for PJSIP transport.
(4.) Update the rtp engine to be able to have multiple socket pairs and
switch between them as the remote side does.

The first has various disadvantages as I understand from Joshua. Most
of them over my head. The advantage is that the source address would be
(more) deterministic upon sending RTP. This can be done by passing the
transport address to rtp instance, presumably similar to what chan_sip
does. This would in some cases break things like signaling on ipv4 and
rtp on ipv6 if pjsip transport is not bound to ANY. This was as I
understood one of Joshua's bigger concerns.

The second option has the advantage that unless the address to which the
remote side sends changes things should just work. This can be
implemented by creating a new socket, binding it to the more specific
address and then using dup2() to replace the old socket file descriptor,
before closing the newly creating file descriptor. It can be returned
to "ANY" in a similar manner if required. RTCP ports will need to be
re-bound as well.

This should probably be a configurable option either way, and one could
add a transport option "bind_rtp_to_transport_address", and/or a
"narrow_rtp_address" (the latter would make no sense if the former is
active, unless the bind address is an ANY of sorts). These can be
implemented in conjunction or separately.

The third option basically involves binding the socket to ANY and
pretending to send data to the known addresses for the peer and using
those addresses in the SDP (if we've seen SDP for the conversation
already, those addresses, otherwise for the remote address of the SIP
communication - this would break a number of things potentially, thus
likely not a serious option. For example, if we're sending an INVITE to
a web-socket transport, then potentially the web-socket connection has
been proxied and the remote address of the web socket connection isn't
actually where the remote side is, for example, if proxying via
httpd/apache to localhost:8088 then asterisk sees 127.0.0.1 as the
"rermote".

I'm tending towards option 2. This would perhaps also have a side
effect of minimizing attack surface for things like RTP bleed.

I suspect this has not come to light before since most setups is likely
to only have a single IPv4 and single IPv6 global address, or in the
case of multi-homing would have one on each interface with the kernel
RPF filter getting rid of traffic from a source other than where it
would route back to, basically forcing an IP match based on route-based
address selection.

Joshua suggested that before coding on this is started all use-cases
should be explored and documented, which I think is a good idea. I'd be
happy to drive that process, I'd however need to understand where this
should be documented. So in this respect this email servers as a
request for pointers.

DISCLAIMER: As I've realized I'm no SIP expert and anything beyond
what's available in chan_sip currently is for me a massive learning
curse. A challenge I'm quite enjoying.

For further explanation, my setup is explained below. This perhaps just
gives more background information to the problem I'm experiencing, and
may or may not be useful to other people reading this.

My setup is a bit convoluted (but no more so than required for my
needs). I do run multiple asterisk instances on a single host. For
each instance I assign a unique IP to the host (one IPv4 and one IPv6
where the IPv6 is of the form pre:fix::i.p.v.4 (And I have a /64 prefix
delegated for this purpose). Currently IPv6 is NOT advertised in DNS
until such time as I can get everything else working.

On the HOST I thus have the following addresses assigned for the host:

    inet 197.96.209.251/24 brd 197.96.209.255 scope global bond0
    inet6 2c0f:f720:0:2:21e:67ff:fea0:671e/64 scope global dynamic
mngtmpaddr

My system has these IPs assigned for my asterisk test instance:

    inet 197.96.209.1/32 scope global bond0
    inet6 2c0f:f720:0:2::c560:d101/128 scope global

IPv6 address selection works differently than IPv4 in the case of ANY,
but I suspect (untested) the same problem will occur. For IPv4 the
problem lies in:

197.96.209.0/24 proto kernel scope link src 197.96.209.251
default via 197.96.209.252 metric 6

So when the default route is selected, the default src for the local LAN
aplies, which is .251. I do have a mechanism that can work around this,
which I call rtdaemon. It's basically given a pcap filter, and it will
dynamically add routes to the routing table to influence the source
address selection, eg:

ip ro ad 165.16.203.126/32 via 197.96.209.252 src 197.96.209.1

I'd prefer to avoid 1500+ routes in my routing table if possible, which
is what I currently have on systems where that is deployed (completely
different use case, and the below "concern" doesn't apply.

Assuming that 165.16.203.126 only needs to communicate with a single IP
address on my side this works. Unfortunately ... I really am starting
to develop a severe distaste for NAT and ISPs that won't bother giving
their clients publicly routable IPs, but I do understand the IPv4
depletion problem too so won't be too harsh on them.

My PJSIP config has ten transports declared (IPv4+IPv6) x (udp, tcp,
tls, ws, wss), of which at the moment I'm only using IPv4 udp + tcp,
I'll only post the UDP and TCP ones here:

[pjsip-udp](!)
type=transport
protocol=udp
allow_reload=yes

[pjsip-tcp](!)
type=transport
protocol=tcp
allow_reload=yes

[pjsip-4]
local_net=192.168.0.0/16
local_net=10.0.0.0/8
local_net=172.16.0.0/12

[pjsip-udp6](pjsip-udp)
bind=[2c0f:f720:0:2::197.96.209.1]:5060

[pjsip-tcp6](pjsip-tcp)
bind=[2c0f:f720:0:2::197.96.209.1]:5060

[pjsip-udp4](pjsip-udp,pjsip-4)
bind=197.96.209.1:5060

[pjsip-tcp4](pjsip-tcp,pjsip-4)
bind=197.96.209.1:5060

chan_sip is only bound to the IPv4 address:

udpbindaddr=197.96.209.1:5059
tcpbindaddr=197.96.209.1:5059

So in my use case things are actually pretty simple:

I always want exactly two candidate addresses for any given instance:
197.96.209.1 for IPv4, or 2c0f:f720:0:2::197.96.209.1 for IPv6. ANY is
not an option due to address selection at kernel routing level picking
the wrong addresses unless I manipulate the routing table, which will
break (existing) use cases where I've got contact from the same external
address to multiple addresses on my side.

--
_____________________________________________________________________
-- Bandwidth and Colocation Provided by http://www.api-digital.com --

Astricon is coming up October 9-11! Signup is available at: https://www.asterisk.org/community/astricon-user-conference

asterisk-dev mailing list
To UNSUBSCRIBE or update options visit:
http://lists.digium.com/mailman/li

Matt Fredrickson

2018-09-13 22:00:04 UTC

Permalink

Post by Jaco Kroon
Hi,
I've got a scenario where (when using PJSIP, using chan_sip does what I
expect) PJSIP will advertise one address in the SDP during a
conversation but then start transmitting from another. In my case PJSIP
is advertising 197.96.209.1 in the SDP, but 197.96.209.251 is being used
to send.
I can manipulate that by altering the IPv4 routing table to influence
address selection.
This is due to when using PJSIP the RTP socket is bound against ANY
([::] specifically so that both IPv4 and IPv6 will function). chan_sip
on the other hand has the RTP port bound to the same address as the
transport. After discussion with Joshua on IRC it became clear that the
PJSIP behaviour may be preferred in many cases, and that things are
plainly more complicated than one would hope.

Ugh. This sounds like it's in the belly of the address selection code
of PJSIP and squarely in Josh's territory.

Post by Jaco Kroon
I have two potential fixes (and two that aren't practical options I
don't think but might be with knowledge I don't have) both with
1. Bind the socket against the advertised address.

That seems interesting, although I'm not sure what that means in a
multi-homed world with multiple address/media streams (IPv4 + IPv6).
Also, I wonder how this works with ICE/STUN/TURN across many
interfaces and address families. Multi-home is hard to get right for
all scenarios. I can't help but wonder if instead of binding to the
wildcard address we should be explicitly binding to each
interface/address and making our own source address selection rather
than letting the kernel decide. Sometimes the kernel will decide in a
way that surprises you and I think that's what you're hitting.

Post by Jaco Kroon
2. Upon receiving the first rtp, "narrow" the socket listening address
to the received "to" address.

That also doesn't seem unreasonable, but I'd rather hear what Josh
thinks since he spent lots of time with his head in this code.

Post by Jaco Kroon
(3.) Have the RTP sent to my primary address to begin with, not the
socket address as for PJSIP transport.
(4.) Update the rtp engine to be able to have multiple socket pairs and
switch between them as the remote side does.

That seems "most right", and matches my idea solution from above. But
then again, I'm curious how it would affect our ICE/STUN/TURN stack.

Post by Jaco Kroon
The first has various disadvantages as I understand from Joshua. Most
of them over my head. The advantage is that the source address would be
(more) deterministic upon sending RTP. This can be done by passing the
transport address to rtp instance, presumably similar to what chan_sip
does. This would in some cases break things like signaling on ipv4 and
rtp on ipv6 if pjsip transport is not bound to ANY. This was as I
understood one of Joshua's bigger concerns.

Yeah....

Post by Jaco Kroon
The second option has the advantage that unless the address to which the
remote side sends changes things should just work. This can be
implemented by creating a new socket, binding it to the more specific
address and then using dup2() to replace the old socket file descriptor,
before closing the newly creating file descriptor. It can be returned
to "ANY" in a similar manner if required. RTCP ports will need to be
re-bound as well.
This should probably be a configurable option either way, and one could
add a transport option "bind_rtp_to_transport_address", and/or a
"narrow_rtp_address" (the latter would make no sense if the former is
active, unless the bind address is an ANY of sorts). These can be
implemented in conjunction or separately.

I'd hate having to add another options for this behavior. It seems
like there should be a path forward that gets most of the right cases
most of the time without it being an optional behavior.

Post by Jaco Kroon
The third option basically involves binding the socket to ANY and
pretending to send data to the known addresses for the peer and using
those addresses in the SDP (if we've seen SDP for the conversation
already, those addresses, otherwise for the remote address of the SIP
communication - this would break a number of things potentially, thus
likely not a serious option. For example, if we're sending an INVITE to
a web-socket transport, then potentially the web-socket connection has
been proxied and the remote address of the web socket connection isn't
actually where the remote side is, for example, if proxying via
httpd/apache to localhost:8088 then asterisk sees 127.0.0.1 as the
"rermote".
I'm tending towards option 2. This would perhaps also have a side
effect of minimizing attack surface for things like RTP bleed.

It might be the lowest friction way forward (without rewriting the
RTP/ICE/STUN/TURN layers).

Post by Jaco Kroon
I suspect this has not come to light before since most setups is likely
to only have a single IPv4 and single IPv6 global address, or in the
case of multi-homing would have one on each interface with the kernel
RPF filter getting rid of traffic from a source other than where it
would route back to, basically forcing an IP match based on route-based
address selection.

Multiple IPv4 address are not very common among non-carriers.

Post by Jaco Kroon
Joshua suggested that before coding on this is started all use-cases
should be explored and documented, which I think is a good idea. I'd be
happy to drive that process, I'd however need to understand where this
should be documented. So in this respect this email servers as a
request for pointers.

Post by Jaco Kroon
DISCLAIMER: As I've realized I'm no SIP expert and anything beyond
what's available in chan_sip currently is for me a massive learning
curse. A challenge I'm quite enjoying.
For further explanation, my setup is explained below. This perhaps just
gives more background information to the problem I'm experiencing, and
may or may not be useful to other people reading this.
My setup is a bit convoluted (but no more so than required for my
needs). I do run multiple asterisk instances on a single host. For
each instance I assign a unique IP to the host (one IPv4 and one IPv6
where the IPv6 is of the form pre:fix::i.p.v.4 (And I have a /64 prefix
delegated for this purpose). Currently IPv6 is NOT advertised in DNS
until such time as I can get everything else working.
inet 197.96.209.251/24 brd 197.96.209.255 scope global bond0
inet6 2c0f:f720:0:2:21e:67ff:fea0:671e/64 scope global dynamic
mngtmpaddr
inet 197.96.209.1/32 scope global bond0
inet6 2c0f:f720:0:2::c560:d101/128 scope global
IPv6 address selection works differently than IPv4 in the case of ANY,
but I suspect (untested) the same problem will occur. For IPv4 the
197.96.209.0/24 proto kernel scope link src 197.96.209.251
default via 197.96.209.252 metric 6
So when the default route is selected, the default src for the local LAN
aplies, which is .251. I do have a mechanism that can work around this,
which I call rtdaemon. It's basically given a pcap filter, and it will
dynamically add routes to the routing table to influence the source
ip ro ad 165.16.203.126/32 via 197.96.209.252 src 197.96.209.1
I'd prefer to avoid 1500+ routes in my routing table if possible, which
is what I currently have on systems where that is deployed (completely
different use case, and the below "concern" doesn't apply.
Assuming that 165.16.203.126 only needs to communicate with a single IP
address on my side this works. Unfortunately ... I really am starting
to develop a severe distaste for NAT and ISPs that won't bother giving
their clients publicly routable IPs, but I do understand the IPv4
depletion problem too so won't be too harsh on them.
My PJSIP config has ten transports declared (IPv4+IPv6) x (udp, tcp,
tls, ws, wss), of which at the moment I'm only using IPv4 udp + tcp,
[pjsip-udp](!)
type=transport
protocol=udp
allow_reload=yes
[pjsip-tcp](!)
type=transport
protocol=tcp
allow_reload=yes
[pjsip-4]
local_net=192.168.0.0/16
local_net=10.0.0.0/8
local_net=172.16.0.0/12
[pjsip-udp6](pjsip-udp)
bind=[2c0f:f720:0:2::197.96.209.1]:5060
[pjsip-tcp6](pjsip-tcp)
bind=[2c0f:f720:0:2::197.96.209.1]:5060
[pjsip-udp4](pjsip-udp,pjsip-4)
bind=197.96.209.1:5060
[pjsip-tcp4](pjsip-tcp,pjsip-4)
bind=197.96.209.1:5060
udpbindaddr=197.96.209.1:5059
tcpbindaddr=197.96.209.1:5059
197.96.209.1 for IPv4, or 2c0f:f720:0:2::197.96.209.1 for IPv6. ANY is
not an option due to address selection at kernel routing level picking
the wrong addresses unless I manipulate the routing table, which will
break (existing) use cases where I've got contact from the same external
address to multiple addresses on my side.

Yeah, I don't see a great way around the kernel address selection
problem without dropping the wildcard binding approach and doing
individual binding on the required interfaces.

Maybe one of your alternatives could get us a little further down the
road though.

Best wishes!
--
Matthew Fredrickson
Digium - A Sangoma Company | Asterisk Project Lead
445 Jan Davis Drive NW - Huntsville, AL 35806 - USA

Joshua Colp

2018-09-13 22:16:11 UTC

Permalink

On Thu, Sep 13, 2018, at 7:00 PM, Matt Fredrickson wrote:

<snip - see previous messages for full context, cause this thing is big>

Post by Matt Fredrickson

Indeed, the other problematic area of binding to the advertised address is that due to asynchronous DNS resolution what you end up going out on may not be what you thought - so the RTP instance and SDP has to be updated or else you could get IPv4 SDP but the traffic going out over IPv6, which is technically acceptable but things sometimes don't like it. In a pure environment where you know with greater certainty ahead of time it's easier to choose early in the process and use one.

Post by Matt Fredrickson

Post by Jaco Kroon
2. Upon receiving the first rtp, "narrow" the socket listening address
to the received "to" address.

That also doesn't seem unreasonable, but I'd rather hear what Josh
thinks since he spent lots of time with his head in this code.

The problem is getting this information. You'd need to read in the full IP packet from the socket, parse the IP header itself, and look at that information. It should be possible but it's not something that has been done in Asterisk, and I'm not sure if it alters the underlying permissions required if running as a user.

Post by Matt Fredrickson

That seems "most right", and matches my idea solution from above. But
then again, I'm curious how it would affect our ICE/STUN/TURN stack.

Indeed - that is the most right. A question arises though - which one do you use for sending early media if you haven't received any media yet?

Post by Matt Fredrickson

Yeah....

Indeed, and we (both Matt and I as well as others) actually use this every day for meetings. Our video conference server (using Asterisk of course) has both IPv4 and IPv6 ICE candidates. Matt ends up using IPv4, I use IPv6.

Post by Matt Fredrickson

I'd hate having to add another options for this behavior. It seems
like there should be a path forward that gets most of the right cases
most of the time without it being an optional behavior.

I don't think it's possible to please every scenario without an option, short of the major rework of having multiple sockets which I'd only be comfortable with in master.

Post by Matt Fredrickson

It might be the lowest friction way forward (without rewriting the
RTP/ICE/STUN/TURN layers).

It would be.

Post by Matt Fredrickson

Multiple IPv4 address are not very common among non-carriers.

We could grant you wiki access if you'd like to make a wiki page there to organize things in an easier fashion.

As for an over all response I've been seeing if I could come up with any other alternative options which are less invasive but would still be effective. I'm continuing to research and look.

Cheers,
--
Joshua Colp
Digium - A Sangoma Company | Senior Software Developer
445 Jan Davis Drive NW - Huntsville, AL 35806 - US
Check us out at: www.digium.com & www.asterisk.org

Joshua Colp

2018-09-16 19:18:05 UTC

Permalink

Post by Joshua Colp
<snip - see previous messages for full context, cause this thing is big>

Post by Jaco Kroon
I have two potential fixes (and two that aren't practical options I
don't think but might be with knowledge I don't have) both with

I gave some further thought over this weekend to any other alternative approaches which would have less of an impact but sadly came up empty. I think the list you provided is indeed the available options.
--
Joshua Colp
Digium - A Sangoma Company | Senior Software Developer
445 Jan Davis Drive NW - Huntsville, AL 35806 - US
Check us out at: www.digium.com & www.asterisk.org

Jaco Kroon

2018-09-18 09:20:18 UTC

Permalink

This post might be inappropriate. Click to display it.