Jump to content

IP-based authentication not working with Telnyx TLS / FQDN trunk


mcbsys

Recommended Posts

I've set up a TLS / FQDN Telnyx trunk pretty much following https://doc.vodia.com/docs/telnyx-secure-sip-trunking. However I'm trying to use IP-based rather than REGISTER-based authentication:

TelnyxIP-based.png.d300bb4ce809613c4d4357a7c327db2b.png

VodiaIPbasedwithoutnames.annotated.png.7eeaa05eda59cd6d51a7360c72ab5a0d.png

What happens is that the first outbound call works fine, but a second call to the same number five minutes later fails.

Machine-level tracing shows me the first call has a full TLS handshake, but the next call just sends an INVITE with no handshake, which Telnyx ignores. According to Telnyx support, "As far as I know Every calls requires a new TLS handshake which is failing here."

I've tried this with the trunk configured as a SIP Proxy and a SIP Gateway and still see the same behavior.

Is there some way to get Vodia to start a TLS handshake on each call? Or do I just need to use REGISTER-based authentication with TLS?

Thanks.

 

Link to comment
Share on other sites

The PBX generally pools connections, which is for example also necessary for Teams. Usually there is some keep-alive traffic, so that the one TLS connection stays up for days, weeks and hopefully many months and then there is only one TLS handshake in the beginning. This is also the only way to keep TLS working behind NAT. 

5 minutes sound to me like the connection idles and then gets torn down. If there is no REGISTER, what would still be an option would be — OPTION. This would generate keep alive traffic if there is no REGISTER in use.

Link to comment
Share on other sites

This is a single-tenant install with 5-15 calls per day, mostly one at a time. There isn't much to pool!

So is OPTION something I can add to a trunk and if so, how? The documentation says that the Keep-alive time setting applies to registration, which is not in use here.

I guess I can switch to registration; I just didn't think it was necessary if I have a static IP.

Link to comment
Share on other sites

On 4/14/2023 at 5:10 PM, mcbsys said:

So is OPTION something I can add to a trunk and if so, how? The documentation says that the Keep-alive time setting applies to registration, which is not in use here.

I guess I can switch to registration; I just didn't think it was necessary if I have a static IP.

In the type from down for the trunk, there is an option for "option". You can try that. 

But even if there are only few calls and the amount of keep-alive might look ludicrous it makes sense to use the registration model. If you compare with how much traffic is being generated by watching a video, the keep-alive again looks more like a rounding error again. Actually depending on the firewall setup or if there is no firewall, you might be good with refreshing the connection every 5 minutes or so, which would really keep the overall overhead small. 

Link to comment
Share on other sites

  • 1 month later...

Came back to this issue in the last few days.

I believe I've uncovered the main issue:  Azure closes idle connections after 4 minutes without sending a TCP Reset (TCP RST) to let the other party know. The timeout can be extended up to 30 minutes on a static IP address, but sending a TCP RST is only possible if you add another layer, a load balancer:

https://learn.microsoft.com/en-us/azure/load-balancer/load-balancer-tcp-idle-timeout

So without TCP Reset, we need some kind of keep-alive in under four minutes to keep the connection open. There’s a pretty good article on that here:  https://www.asterisk.org/wanted-dead-or-alive/. I tried the network-level keep-alives suggested here https://serverfault.com/a/851251/166311 and documented here https://tldp.org/HOWTO/TCP-Keepalive-HOWTO/usingkeepalive.html, but I had an issue with the standard registered UDP trunk where inbound calls were failing. It may have been unrelated, but I backed out the Linux changes.

Also, I’m thinking that the TLS connection is established in each direction, so even if I kept the outbound connection open from the PBX to Telnyx, unless Telnyx is also sending keep-alives, inbound calls will fail after four idle minutes. Telynx support says their TCP timeout is 3604 seconds, so just over an hour, after which I assume they would re-do the TLS handshake.

I tried the Vodia trunk of type OPTIONS, but I didn’t see an OPTIONS traffic. Is there some way to set its frequency?

Last I tried using TLS with registration:

  • If I left inbound as FQDN (as defined at Telnyx) but did registration for outbound, outbound worked after a five-minute delay, but inbound failed after a delay—confirming that Telnyx didn’t restart the TLS conversation.
  • If I set up the connection at Telnyx for registration in both directions, and set the proposed duration in Vodia to 6 minutes with re-registrations at 50% of expiry time, outbound worked but inbound failed immediately. I then realized that Telnyx doesn’t allow setting the inbound SIP Transport Protocol to TLS on a registered connection—that’s only available on FQDN connections.

I'm slowly coming to the conclusion that TLS won't work with Azure hosting the virtual machine unless I set up a load balancer and turn on TCP RST. Theoretically that would gracefully close the connection so the PBX would re-handshake for outbound and Telnyx would re-handshake for inbound.

 

Link to comment
Share on other sites

Thanks for the great analysis. 

We had a similar problem many years ago with UDP and missing SBC on the SIP trunk provider side. OPTIONS work only for SIP trunks that don't register. If the SIP trunk does register, you can instead the "Keep-alive time" and force the PBX to re-register every 3.5 minutes or so.

Link to comment
Share on other sites

5 hours ago, Vodia PBX said:

OPTIONS work only for SIP trunks that don't register.

When I tried OPTIONS, it was without registration, with TLS. But it looked the same as SIP Proxy--I never saw an OPTIONS packet sent in the machine-level trace. How often are the OPTIONS packets supposed to go out and can I control that?

5 hours ago, Vodia PBX said:

If the SIP trunk does register, you can instead the "Keep-alive time" and force the PBX to re-register every 3.5 minutes or so.

Unfortunately, it seems Telnyx doesn't allow TLS on a registered connection. I may need the "keep-alive time" if I try TCP without TLS. UDP seems fine with the default 1-hour timeout, I guess because it doesn't depend on keeping the connection open.

Link to comment
Share on other sites

You would definitively see the OPTIONS packets, e.g. by filtering SIP packets by the IP address. If you don't see them—something is wrong. Originally we did that for Teams (not sure why, probably just the "Microsoft way"), but other vendors are also starting to use OPTIONS as well and this seems to work fine. 

Link to comment
Share on other sites

5 hours ago, Vodia PBX said:

You would definitively see the OPTIONS packets, e.g. by filtering SIP packets by the IP address. If you don't see them—something is wrong. Originally we did that for Teams (not sure why, probably just the "Microsoft way"), but other vendors are also starting to use OPTIONS as well and this seems to work fine. 

I changed the FQDN/TLS trunk from SIP Proxy to Options and traced it for 2.5 hours, decrypting with the keylogfile. Not a single OPTIONS packet. I tried one outbound call, which failed. That shows up as one encrypted packet (in spite of the keylogfile) and 15 failed retransmissions. Maybe it's waiting on the other side to send OPTIONS first?

I was struggling to understand this setting of Options as a trunk type instead of Registration, Gateway, or Proxy. Then the mention of Teams took me to this article:

https://learn.microsoft.com/en-us/microsoftteams/direct-routing-protocols-sip

"Before an incoming or outbound call can be processed, OPTIONS messages are exchanged between SIP Proxy and the SBC. These OPTIONS messages allow SIP Proxy to provide the allowed capabilities to SBC."

That does, in fact, sound like Microsoft's approach to SIP conversations. It may be a legitimate use of OPTIONS (to discover capabilities of the other party). In that case, maybe the registration type should be called "Teams TLS with Options".

The rest of the world seems to use OPTIONS basically as a SIP "ping." Like the Asterisk article says, "OPTIONS messages are full SIP messages that a user agent can send to a peer and expect to get a standard SIP response. You can configure Asterisk to send these messages to a peer by setting the 'qualify_frequency' parameter in the peer’s aor object.   At that interval, Asterisk will send the OPTIONS and will mark the peer’s contact as available and record the round trip time if it gets a response.  If not, the contact is marked as unavailable..."

From what I can tell, the standard doesn't require TLS or even TCP for OPTIONS. To solve this kind of issue, where we need a keep-alive, we would need to be an setting on any trunk, "Send OPTIONS every x seconds" and then, if there's no response and TCP/TLS was active, close the connection (TCP RST).

What I still don't quite understand is whether a TLS conversation, once opened from the PBX for outbound calls, can (and will) be used by the vendor (Telnyx) for inbound calls. Can Telnyx "find" the open connection and put the inbound call on it? If so, a recurring OPTIONS would help. If not, it doesn't matter how many OPTIONS the PBX sends if the vendor always tries to start their own TLS handshake.

Link to comment
Share on other sites

Good news:  I was mistaken. Telnyx does support TLS over a registered connection. You don't specify it; you just send the REGISTER over TLS and it accepts it. It does allow specifying that the media be encrypted and I've confirmed that that works with SRTP.

By setting registration to 6 minutes with a 50% renewal, I think I’m maintaining the connection okay. (I’ll want to do some more testing to confirm.) The one thing I don’t get is why some of the packets in my trace are still encrypted even after applying the keylogfile.

Link to comment
Share on other sites

Going back to my longer post above:  am I seeing this correctly, that Vodia will not let me set up OPTIONS as a kind of keep-alive SIP "ping"? That would seem preferable over re-registering every three minutes, but I'd need to be able to set the frequency that OPTIONS are sent.

Link to comment
Share on other sites

Right. I was thinking of SIP Proxy with the option of adding OPTIONS every x seconds. I never got Options to work as a trunk type (no OPTIONS are sent in 2.5 hours). I was guessing that that trunk type may be specific to Teams, or at least to exchange OPTIONS for information purposes, not keep-alive purposes.

I can probably live with re-registering for TLS, just trying to understand all the options, so to speak.

 

Link to comment
Share on other sites

  • 5 weeks later...

Still having trouble with this under Vodia 68.0.28 running on Azure.

You'll recall that Azure closes idle connections after 4 minutes without sending a TCP Reset (TCP RST) to let the other party know. The suggested workaround was to use a Registered trunk with frequent re-registrations. So I set it up like this:

VodiaregisteredTLS.png.64f28f90f4726b9515704b5d2cace847.png

Calls work fine; call quality is good. But I'm finding that registration fails every two or three days, at random times, and takes the system offline for a minute. I get this email from Vodia:

Trunk Telynx - Registered - TLS (2) changed to "408 Request Timeout"
(Registration failed, retry after 60 seconds). This is a notification email. Do not reply.

The re-registration succeeds, and life goes on. But it shouldn't be this unstable.

In the trunk, I see there is a Routing/Redirection > Request timeout that is not set. Does that control the registration timeout? What is it defaulting to?

I'm wondering if I just need to go back to the FQDN, non-TCP, non-TLS, UDP-based connection that has always been reliable with 3CX. But I don't know if 3CX is doing some kind of keep-alive--I think that's not needed for UDP?

Thanks for any suggestions to get this back to a reliable state.

Link to comment
Share on other sites

19 minutes ago, Vodia support said:

Hi, Can you review this blog on Telnyx TLS, I've had no problem with TLS registration since I wrote this blog, might be worth a shot. 

https://blog.vodia.com/SRTP_Vodia_Telnyx

Thanks. My settings match that very closely. Caller ID presentation is a bit different (maybe this was the default?) but seems to work:

CallerIDPresentation.png.9bb3c2c1c27a17c82b192cedda8bd555.png

The main difference is the Proposed duration. You've left the default as 1 hour/50%. I set it to 6 minutes/50% to overcome the Azure limitation that I described above (June 5):

Quote

Azure closes idle connections after 4 minutes without sending a TCP Reset (TCP RST) to let the other party know.

Calls with TLS and SRTP are working, so the setup seems okay. Except the REGISTER times out sometimes. Can I set the timeout?

Link to comment
Share on other sites

I upgraded to 68.0.30 at about 3pm yesterday. Registration timed out at 7:42pm:

   Trunk Telynx - Registered - TLS (2) changed to "408 Request Timeout"
   (Registration failed, retry after 60 seconds). This is a notification email. Do not reply.

What timeout is Vodia using and how do I adjust it?

 

Link to comment
Share on other sites

5 hours ago, mcbsys said:

What timeout is Vodia using and how do I adjust it?

If you can, set up a Wireshark trace on your server (you can filter by IP address and port to keep it small). The question is if there is any TCP reset and who is sending it.

It could be the firewall, but it could also be that the refresh traffic is slower than the TCP timeout. If its random, its probably not the firewall because it would enforce it always at the same time. Randomness could come from the lack of other traffic, e.g. call-related SIP traffic. It could even be something on OS level, e.g. in Linux there are parameters in the /proc/net on how long missing TCP traffic will be tolerated. 

Link to comment
Share on other sites

I kept a tcpdump running most of the weekend, aborting and re-starting it every few hours when there was no registration failure. Finally this morning at 12:36am there was a 408 failure. Unfortunately even with the KEYLOGFILE, everything in the PCAP is encrypted until the re-registration at 12:38am. I assume this is because Wireshark needs the handshake to decrypt the following traffic? For some reason, even after the re-registration (and associated TLS handshake), much of the traffic is encrypted.

However, just before the re-registration, I am able to see a FIN coming from Telnyx to the PBX, which seems to kick off the disconnect/reconnect/re-registration. There's plenty of traffic just before the FIN, so it wasn't a lack of traffic. Is it normal to get a FIN periodically? [Update a bit later:  well really it's a FIN, ACK coming from Telnyx to the PBX. So does that mean the PBX sent a FIN first?]

10.2.0.4 is the internal IP of the PBX. 192.76.120.10 is sip.telnyx.com. Click for full size image.

20230717.Registrationtimeout.Tracenotes.thumb.png.d033847fa3de1146c5b07c91295644f4.png

 

Link to comment
Share on other sites

On 7/17/2023 at 8:26 PM, mcbsys said:

Is it normal to get a FIN periodically? [Update a bit later:  well really it's a FIN, ACK coming from Telnyx to the PBX. So does that mean the PBX sent a FIN first?]

It should not be normal... The FIN, ACK looks good to me (not being a huge TCP/IP expert), I guess the ACK is kind of preemptive. Anyhow the PBX sends the same ting back and that means the TCP connection is closed. 

Which raises the question what Telnyx (or, any provider) is doing to keep a TCP connection alive — for weeks, months, years. How do they do software updates? UDP is much easier in that respect, you would not see anything in the PCAP. Microsoft Teams is opening TCP connections in both ways, which does solve that problem: For an inbound call it opens a new TCP connection; however that approach has the disadvantage that the PBX needs to be on a public IP address. Maybe the time of disconnect is the price to pay for TLS with a device behind NAT.

Link to comment
Share on other sites

According to https://ipwithease.com/what-is-tcp-fin-packet/:

Quote
  • FIN-ACK — Indicates acknowledgment of FIN packet.
  • FIN — Indicates no more data will be transmitted from the sender.

So it sounds like the FIN-ACK coming from Telnyx would be a reply (ACKnowledgement) to a FIN from the PBX. But I don't see the FIN. Maybe it was encrypted?

I've had a trace running for three days, have captured 33,005 packets, but no timeout has occurred. I'm going to try to reset it now, and force a re-registration, which hopefully will capture the initial handshake.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...