ndemou Posted October 15, 2018 Report Share Posted October 15, 2018 We had problems with calls not connecting or getting no audio today and when I examined the logs I show many serious errors like these: [1] ... Could not open cdr....bin for writing [1] ... Port ...: Could not allocate sockets! [0] ... UDP(IPv4): Could not open socket (EMFILE) lsof | grep ^pbxctrl | wc -l showed more that 8160 lines. Looking at the output of lsof I noticed that 99% of the lines are like this: COMMAND PID TID USER FD TYPE DEVICE SIZE/OFF NODE NAME pbxctrl 1280 1307 root 1005u IPv6 10107481 0t0 UDP *:49689 I wonder if so many open files(UDP sockets) is normal. It doesn't seem to be so I restarted pbxctrl and the problems went away (after a painful 5 minutes). I'm monitoring the output of `lsof | grep ^pbxctrl | wc -l` and it shows me anywhere from 1200 to 1500 open files (with moderate traffic). I've raised my soft & hard limit of open files from 1024 & 4096 to 4000 & 8000 with prlimit -n -p`pidof pbxctrl` --nofile=4000:8000 Some basic facts about my system: Software-Version: 60.0.3 (CentOS32) Build Date: May 1 2018 04:42:47 uname -a Linux ... x86_64 x86_64 x86_64 GNU/Linux Questions: Have we hit a bug or is such an amount of open sockets normal? If it's normal what's the recommended configuration for CentOS? Is the 5 minutes to restart normal? How could we cut it down? Is it OK that I'm running 64bit CentOS but pbxctrl reports CentOS32? Quote Link to comment Share on other sites More sharing options...
Vodia PBX Posted October 15, 2018 Report Share Posted October 15, 2018 There were a couple of problems in that area, especially for very busy systems. If you can move to 61.0 or maybe even 61.1 then those problems should be fixed. Quote Link to comment Share on other sites More sharing options...
ndemou Posted October 15, 2018 Author Report Share Posted October 15, 2018 Thanks! Upgrading a production system is always tough but "a man's got to do...". Any ideas about my other 2 questions: Is the 5 minutes to restart normal? How could we cut it down? Is it OK that I'm running 64bit CentOS but pbxctrl reports CentOS32? Quote Link to comment Share on other sites More sharing options...
Support Posted October 15, 2018 Report Share Posted October 15, 2018 Hi, Maybe this helps for one of your questions. If the open files are the issue then the restart time will not come down unless this gets fixed, but if you had a restart time of 5 minutes even before this issue occurred then you will have to upgrade your cores on your server running the PBX. Increasing the RAM, and / or space on the server should also help if you have a lot of CDR's or PCAPs or recordings that might be pulling the processing speed down. Quote Link to comment Share on other sites More sharing options...
Vodia PBX Posted October 15, 2018 Report Share Posted October 15, 2018 When the PBX restarts, it needs to read the configuration files. Especially if there are a lot of CDR this can take its time. If you have a lot, well that can take time (the newer versions keep an index for the CDR tables which make this a lot faster). Quote Link to comment Share on other sites More sharing options...
ndemou Posted October 15, 2018 Author Report Share Posted October 15, 2018 Well keeping the number of CDRs left on the PBX low is definitely something we can accomplish. Is there a way to have the PBX clean up old recordings (e.g. remove recordings older than 48 hours). I already copy them to an ftp server anyway so I don't mind. Quote Link to comment Share on other sites More sharing options...
Support Posted October 15, 2018 Report Share Posted October 15, 2018 Hi, On the latest version of the PBX 61.0 and above you can set the recordings to get cleaned up after specified interval of times from this page: /reg_recording.htm If you're not on the latest then you can manually delete them from the recordings folder under the PBX directory for now. Quote Link to comment Share on other sites More sharing options...
Vodia PBX Posted October 15, 2018 Report Share Posted October 15, 2018 Some Linux distributions keep the PBX from binding to port 80/443 for some time, which could be 5 minutes. When you restart the PBX check top if it CPU is actually busy. Quote Link to comment Share on other sites More sharing options...
ndemou Posted October 15, 2018 Author Report Share Posted October 15, 2018 Thanks everybody for the advice. Quote Link to comment Share on other sites More sharing options...
ndemou Posted June 7, 2019 Author Report Share Posted June 7, 2019 I've upgraded to 63.0.1 about 12 days ago and the output of lsof -n | grep pbxctrl has grown from about ~2000 lines to about ~3300 lines (IPv4 and IPv6 connections). I've set a new warning level at 4000 lines and will let you know if/when I'll hit it Quote Link to comment Share on other sites More sharing options...
Vodia PBX Posted June 9, 2019 Report Share Posted June 9, 2019 Well 2000 s quite a big number - but possible if you have a lot of registrations over TCP or TLS. UDP should be around the number of active calls x 4. Quote Link to comment Share on other sites More sharing options...
ndemou Posted June 11, 2019 Author Report Share Posted June 11, 2019 Unfortunately within a week 'lsof -n | grep pbxctrl' did give more than 4000 lines of output. Usage of the system is stable so I don't like that this number keeps climbing up week by week and I'm afraid of the system crashing again. Looking at today's graphs I see these PEAKS: ~770 Registrations+Subscriptions, 52 call legs 112 http(s) 112. ~ 10 for TFTP & LDAP together In case it helps: Examining the output of ngrep on port 5060 I see that I'm sending packets to 310 IP:port couples. The output of lsof | grep pbxctrl is along the lines of this: pbxctrl 1486 1536 root 442u IPv4 18445564 0t0 UDP *:49460 pbxctrl 1486 1536 root 443u IPv4 18437010 0t0 TCP 1.2.3.4:https->9.10.11.12:53054 (ESTABLISHED) pbxctrl 1486 1536 root 444u IPv4 18445565 0t0 UDP *:49461 pbxctrl 1486 1536 root 452u IPv6 18472707 0t0 UDP *:49869 pbxctrl 1486 1536 root 453u IPv4 18506734 0t0 UDP *:62612 pbxctrl 1486 1536 root 456u IPv6 18445566 0t0 UDP *:49460 pbxctrl 1486 1536 root 457u IPv6 18445567 0t0 UDP *:49461 pbxctrl 1486 1536 root 458u IPv4 18472815 0t0 TCP 1.2.3.4:https->5.6.7.8:34917 (ESTABLISHED) pbxctrl 1486 1536 root 461u IPv4 18506735 0t0 UDP *:62613 pbxctrl 1486 1536 root 472u IPv4 18505204 0t0 UDP *:52692 pbxctrl 1486 1536 root 473u IPv4 18505205 0t0 UDP *:52693 pbxctrl 1486 1536 root 478u IPv4 18505208 0t0 UDP *:52192 pbxctrl 1486 1536 root 479u IPv4 18505209 0t0 UDP *:52193 pbxctrl 1486 1536 root 482u IPv4 18505038 0t0 UDP *:58870 pbxctrl 1486 1536 root 483u IPv4 18505039 0t0 UDP *:58871 pbxctrl 1486 1536 root 484u IPv4 18505042 0t0 UDP *:61862 pbxctrl 1486 1536 root 485u IPv4 18505043 0t0 UDP *:61863 What's your opinion? Would you like more data? Quote Link to comment Share on other sites More sharing options...
Vodia PBX Posted June 11, 2019 Report Share Posted June 11, 2019 Well that does look like there are a lot of dangling calls. Are you doing anything unusual? Like multicast page a lot? Or is the system under attack all the time? Seems there are also IPv6 ports open which is a sign that the call gets allocated but is not used later. Quote Link to comment Share on other sites More sharing options...
ndemou Posted June 11, 2019 Author Report Share Posted June 11, 2019 We have a few hundred extensions in small companies. Nothing unusual I would guess (some inter-domain, plenty external calls and a conference call from time to time). Surely no multicast paging. Regarding attacks, I've run for a few minutes ngrep (a packet sniffer) looking for SIP replies that are not "SIP 200 OK" and I see nothing unusual. Regarding IPv6 ports in lsof's output they account for1352 out of 3300 lines. But I should note that they contain only 169 unique port numbers. E.g.: lsof -n| grep ^pbxctrl | grep -i ipv6 |grep 49460 pbxctrl 1486 root 456u IPv6 18445566 0t0 UDP *:49460 pbxctrl 1486 1530 root 456u IPv6 18445566 0t0 UDP *:49460 pbxctrl 1486 1531 root 456u IPv6 18445566 0t0 UDP *:49460 pbxctrl 1486 1532 root 456u IPv6 18445566 0t0 UDP *:49460 pbxctrl 1486 1533 root 456u IPv6 18445566 0t0 UDP *:49460 pbxctrl 1486 1534 root 456u IPv6 18445566 0t0 UDP *:49460 pbxctrl 1486 1535 root 456u IPv6 18445566 0t0 UDP *:49460 pbxctrl 1486 1536 root 456u IPv6 18445566 0t0 UDP *:49460 Should I send you log files or anything else? Quote Link to comment Share on other sites More sharing options...
Vodia PBX Posted June 11, 2019 Report Share Posted June 11, 2019 Looks like those port all belong to the same process - what is the "TID" in the lsof output? Quote Link to comment Share on other sites More sharing options...
ndemou Posted June 12, 2019 Author Report Share Posted June 12, 2019 Quote Looks like those port all belong to the same process They certainly belong to the same process: pbxctrl with PID 1486. But that's to be expected since I'm looking specifically for pbxctrl. Quote what is the "TID" in the lsof output? Not sure what you're asking here so I'll provide anything I can think of: "man lsof" reads: "TID is the task (thread) IDentification number". In the example output above the 3rd column is the TID column (empty for the first line and 1530 to 1536 for the next 7 lines). B.T.W. I see the same pattern for almost all other ports: each port appears 8 times, once for the main process and another 7 times for threads. I've restarted pbxctrl and now I'm running lsof -nP | grep pbxctrl every hour and logging the full output on a series of timestamped files. Just in case it helps. Quote Link to comment Share on other sites More sharing options...
Vodia PBX Posted June 12, 2019 Report Share Posted June 12, 2019 I was just wondering if there is a misinterpretation of the data - sockets are available on all threads and maybe it just shows those sockets multiple times. Maybe we need to filter them by the process ID and count only unique sockets. Quote Link to comment Share on other sites More sharing options...
ndemou Posted June 12, 2019 Author Report Share Posted June 12, 2019 Hmmm good thinking. I'll have to read a bit more on lsof and linux internals. Will let you know Quote Link to comment Share on other sites More sharing options...
ndemou Posted July 7, 2019 Author Report Share Posted July 7, 2019 I'm logging the output of lsof for the pbxctrl process every hour for about a month. It is clear that the number of lines of output grows by about 20% per week. I've just restarted pbxctrl as a precaution. I'm attaching a graph. The y-axis is the number of lines of the output of the command: lsof | grep pbxctrl . Weekends and holidays are painted in green color. Most of my customers are businesses so I have many calls with a peak at about midday during working days and very few calls during weekends and holidays. I confess I haven't dived into the details of the output of lsof. However I'm almost sure that no matter the exact meaning it doesn't seem right for the number of these lines to grow by ~4% every working day. Most likely a bug in pbxctrl and less likely one in lsof. I need your help for further debugging though. I have kept all the log files and the full output of lsof for this period. Anything else I can do? Quote Link to comment Share on other sites More sharing options...
Vodia PBX Posted July 8, 2019 Report Share Posted July 8, 2019 I don't think there is a bug in lsof. In the previous post it looked like there were multiple listings for the same port - which is of course raising questions. What would make sense is to check /proc/xxx/fd if that matches the output of lsof in terms of open sockets. Quote Link to comment Share on other sites More sharing options...
ndemou Posted July 8, 2019 Author Report Share Posted July 8, 2019 First of all many thanks for the help Now, regarding the correlation between the output of lsof and the contents of /proc/`pidof pbxctrl`/fd: No they don't match and the difference is clearly on the IP sockets (see [3]).The reason for the difference is that lsof gives 7 lines of output for each socket. So if I count only unique socket ids then both outputs match: $ lsof -nP | grep 'pbxctrl' | grep 'IPv' | awk -F' ' '{print $7 }' | sort -n | uniq | wc -l 116 $ ls -l /proc/`pidof pbxctrl`/fd | grep 'socket' | wc -l 116 A quick check[1] on my saved lsof outputs during the past few weeks reveals that this factor (x7) is constant. And another check [2] reveals that: The unique socket ids at the output of lsof (that match the number of sockets in /proc/.../fd) are increasing with the pattern mentioned in my previous post (i.e. ~4% increase during working days and no increase during weekends and holidays). By the way: when you exclude sockets the number of "files" left is more-or-less constant during these weeks that I'm testing (97 open "files" with a rare peak to at most 137 now and then). Here's a plot of unique socket ids per hour for two weeks. Weekends are colored green (it's of-course the same as above but scaled down by a factor of 7 but I like pictures :-) ______________________________ [1] ls lsof.2019-0*|sort | while read FILE; do cat $FILE | grep 'IPv' | awk -F' ' '{print $7 }' | grep -v '0t0' | sort -n | uniq -c | awk -F' ' '{print $1 }' | uniq -c | sort -nr | head -1; done [2] ls lsof.2019-0*|sort | while read FILE; do echo -n "$FILE "; cat $FILE | grep 'IPv' | awk -F' ' '{print $7 }' | grep -v '0t0' | sort -n | uniq | wc -l; done [3] $ ls -l /proc/`pidof pbxctrl`/fd | grep 'socket'|wc -l 125 $ lsof -nP | grep 'pbxctrl' | grep 'IPv'|wc -l 992 Quote Link to comment Share on other sites More sharing options...
ndemou Posted July 10, 2019 Author Report Share Posted July 10, 2019 Any ideas at all? Quote Link to comment Share on other sites More sharing options...
Vodia PBX Posted July 10, 2019 Report Share Posted July 10, 2019 I do trust in /proc/xxx/fd. There must be some kind of operation that is causing this - e.g. some problem with a FAX or some problem with streaming MoH that is causing this. Other services do not seem to have this problem. Is there any clue in this direction? Do you have "expired" files in the working directors? The ports seem to be on RTP ports, so this must be something media related. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.