Jump to content

IP-based authentication not working with Telnyx TLS / FQDN trunk


mcbsys

Recommended Posts

12 hours ago, mcbsys said:

 

So it sounds like the FIN-ACK coming from Telnyx would be a reply (ACKnowledgement) to a FIN from the PBX. But I don't see the FIN. Maybe it was encrypted?

Hmm why would the PCAP miss the FIN? Maybe some firewall in between did it? FIN (still, not being the expert) is not encrypted because its TCP not TLS.

Link to comment
Share on other sites

10 hours ago, Vodia PBX said:

Hmm why would the PCAP miss the FIN? Maybe some firewall in between did it? FIN (still, not being the expert) is not encrypted because its TCP not TLS.

No idea. Firewall seems unlikely--this is a pretty basic Linux machine running on Azure with no outbound restrictions, only inbound port forwarding. Also, I'm using the following command on that machine to capture the trace, which is logging "any" interface, so it should capture anything coming from the PBX, even if a firewall blocked something coming back to the PBX:

sudo tcpdump -i any -nn '(host 192.76.120.10 or host 64.16.250.10)' -v -w telnyx.pcap

The trace ran about 11 hours before the "FIN, ACK" was logged. If this is the correct Wireshark filter (also not an expert!), there are no pure FIN packets in the 11 hours preceding the "FIN, ACK":

20230717.Registrationtimeout.FINpackets.png.b240f7346c5228869e2854656d9254c1.png

Unless of course Microsoft sent the FIN outside the machine because ... they wanted to reconnect the interface or something.

I thought TCP would mean more stable connections for a PBX, with TLS a nice security bonus, but it seems to be just the opposite... TCP/TLS, at least as provided by Azure, reduces stability and reliability for long-running connections.

Link to comment
Share on other sites

17 hours ago, mcbsys said:

Unless of course Microsoft sent the FIN outside the machine because ... they wanted to reconnect the interface or something.

I thought TCP would mean more stable connections for a PBX, with TLS a nice security bonus, but it seems to be just the opposite... TCP/TLS, at least as provided by Azure, reduces stability and reliability for long-running connections.

For me, the bottom line is that Microsoft and others choose to establish a new connection for incoming calls because they also found out that keeping a TCP connection alive for a long time is hard to achieve and it is more reliable to establish a new connection (possibly, retry if it does not work on first attempt). After all, browsers literally connect billions of times every hour and it works very reliable. The price for this is that the client cannot be hidden behind NAT. 

In other words, as far as I can see it, if you want to stay behind NAT and avoid manual setup of the PBX address, you will have to live with the fact that the connection drops from time to time for a short while and incoming calls will not make it.

BTW the Teams clients are also living with that fact. I am pretty sure that if you watch their connection stability, it will also have their ups and downs. But it is limited to one extension, and not the whole organization.

Link to comment
Share on other sites

Finally got another timeout. This time, when I started the trace, I disabled and re-enabled the trunk, hoping to force a handshake, which I guess Wireshark needs to be able to decrypt the traffic. Didn't work--all of the traffic before the timeout is encrypted. After the connection dropped and was re-established, I do see some decrypted SIP traffic, but I also see lots of encrypted traffic.

Why would NAT not allow TCP connections on the fly? Isn't that what was working when I tested with a FQDN back at the beginning of this thread? The problem wasn't establishing the connection; the problem was that "Azure closes idle connections after 4 minutes without sending a TCP Reset (TCP RST) to let the other party know." But the idea of making a PBX like a bi-directional web server, allowing potentially dozens of short-lived connections... seems like that could bring its own issues.

Here's the new trace. I don't see much different other than it got three FIN,ACKs instead of two, like it had a problem re-establishing the connection.

 

20230722.Registration timeout.png

Link to comment
Share on other sites

  • 2 weeks later...

Thanks for the investigation, it's quite interesting not only for the Vodia PBX but generally on how to keep 24/7 up. 

For me, the bottom line is that we will never be able to achieve 100 % connectivity with a client and there will always be spots in availability. It's the case for SIP trunk clients, but it's also true for VoIP phones. For VoIP phones, it's probably just a generally fact that they have their ups and downs when working from home and nobody complains.

It would be possible to address this with the OPTIONS architecture where the UAC initiates a connection and the UAS receives it, depending on the call direction. And maybe in situations where inbound calls are very critical, this should be the preferred setup. But even there we will still have the problem that the clients also have their time when they are offline, and calls would still not connect. I would dare to say, that those VoIP client spots are much more significant that the connection between the PBX and a SIP trunk (it would be interesting to have some real numbers). 

What speaks for the SIP trunk TCP/TLS connection is that it is easy to setup, it can be secure, the stability is probably better than with the VoIP phone connection when working from home and maybe one more thing to keep in mind: The PBX can report the availability, e.g. through emails when the trunk status changes. That is a lot harder when using UDP or TCP/TLS with OPTIONS where it might be very hard to find out when the trunk is actually down.

Link to comment
Share on other sites

3 hours ago, Vodia PBX said:

Thanks for the investigation, it's quite interesting not only for the Vodia PBX but generally on how to keep 24/7 up. 

Thanks for thinking about it.

The TLS trunk dropped twice today. I guess I’m giving up. I’ve never had these kinds of problems before, but I’ve never tried TLS before. I’ll change it back to UDP registered or maybe UDP with FQDN.

I have to think that others have solved the persistent TCP connection issue. I’m setting up a Grandstream PBX appliance for another customer. It’s based on Asterisk. There are various keep-alive options, e.g. when setting up a registered UDP trunk, you can enable “heartbeat detection”:  “If enabled, the PBX will regularly send SIP OPTIONS to check if the the device is online.” For (Grandstream) phones, you can set a policy:  “Keep Alive Interval. Specify how often the phone will send a blank UDP packet to the SIP server to keep the ‘ping hole’ on the NAT router open.”

Another example of persistent TCP is surveillance cameras. Synology Surveillance Station uses OPTIONS as the default keep-alive method.

And what about site-to-site VPNs? They have to keep a TCP tunnel open…

The best summary I’ve seen in the VoIP world is the article I cited earlier:

https://www.asterisk.org/wanted-dead-or-alive/

Link to comment
Share on other sites

20 hours ago, mcbsys said:

I’ll change it back to UDP registered or maybe UDP with FQDN.

I am afraid this might actually be a pragmatic solution. Maybe someone should dig out the original decision to favor UDP for SIP — who knows maybe Henning Schulzrinne & Co hat the connection stability actually in mind.

I am not aware anyone using DTLS for adding security, but that might be the next step. 

All that said, UDP connections are also not necessarily stable, depending on the behavior of the firewall. When the port changes on the NAT bindings, you'll have the same short window where inbound calls will bounce (unless the firewall keeps both ports open during that time, which I doubt). 

Firewall manufacturers could care more about VoIP and what a persistent connection means. Especially the manufacturers that don't have the slightest clue about and/or interest in SIP and their interference does not add anything in terms of security. 

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...