ndemou Posted September 20, 2015 Report Share Posted September 20, 2015 Today the service went down. ps output was showing the process still running and netstat -anp was showing it bind to the network interface. However we could place no call and we could not open the web interface. The logs had hundrends of messages like these: [5] 20150920121742: Did not receive ACK, disconnecting call 2684293046@192.168.1.23[5] 20150920121742: Did not receive ACK, disconnecting call 2e3d18e0-15eb03@127.0.0.1[4] 20150920121742: 11 registration messages pending[4] 20150920121748: Last message repeated 13 times A simple /etc/init.d/pbx restart was enough to restore the service. We're happy it did but we feel kind of scared. This was version 5.2.4 on CentOS 32bit. We need your help to find the root cause. Quote Link to comment Share on other sites More sharing options...
Vodia PBX Posted September 20, 2015 Report Share Posted September 20, 2015 Is this on a snom ONE plus appliance? In that case check if the file system is full. The PSTN gateway that was installed there was writing log files without deleting them and this way creating a "time bomb". Quote Link to comment Share on other sites More sharing options...
ndemou Posted September 21, 2015 Author Report Share Posted September 21, 2015 No problem with a full file system (we've plenty of space available). Note that the only thing we did to solve the issue was to restart the service and it keeps working until now. What about the log lines I've pasted? Do they reveal something? Would the full log help you and if yes can I share them privately? Do you want me to open a ticket for this? _______________________ System Status OverviewPlease use the information shown on this web page when you request help from the support team.Software-Version: 5.2.4 (CentOS32)Build Date: Aug 22 2014 06:11:46License Status: Vodia PBX Hosted 5AY-T9L-MWT-13ALicense Duration: Active subscription Quote Link to comment Share on other sites More sharing options...
Vodia PBX Posted September 21, 2015 Report Share Posted September 21, 2015 Did you install the software using the install script? There must be a script for the PBX in /etc/init.d otherwise every reboot will require a manual start of the PBX service (which is extremely risky). Quote Link to comment Share on other sites More sharing options...
ndemou Posted September 21, 2015 Author Report Share Posted September 21, 2015 Sure there is an /etc/init.d/pbxsrv script. In fact we used it to restart the service (/etc/init.d/pbxsrv restart). But what does the existence of the init script has to do with this issue? Sorry if I'm missing something obvious here. Quote Link to comment Share on other sites More sharing options...
Vodia PBX Posted September 21, 2015 Report Share Posted September 21, 2015 Ahhhh okay sorry I now got it. You are right this has nothing to do with the restart script; it seems that the PBX simply got unresponsive. I can happen for a short time when the non-realtime thread is crunching numbers (or do a large table lookup); however it should eventually after a few seconds max resume operations. If that is not the case, then this would be a lot more serious. If it should happen again, it would be great if you can generate a core dump, so that we can see what the problem is. Also, there is a reason why we are building 64 bit versions. Maybe you have just exhausted the memory size limit. The problem is mainly that each thread takes up a lot of virtual memory space (not even physical), so that memory allocation eventually fails and then things go down south pretty quick, with all sorts of effects. You can check with ps how many threads you have and how much virtual memory has been taken already, and depending on how it looks, upgrade to 64 bit. Quote Link to comment Share on other sites More sharing options...
ndemou Posted September 22, 2015 Author Report Share Posted September 22, 2015 Thanks for the detailed explanation. I only have 1GB of RAM so I've plenty of room to grow before moving to 64bit. I've installed atop with the default logging options and I'll keep an eye on memory usage. It already seems that memory is tight: One observation of the system during not so high load shows that pbxctrl has allocated 610M of virtual memory with 480M of them taking up real memory (resident in memory). Adding everything else (the kernel slab being considerable) I only have about 250MB of free memory. Unfortunately I didn't have atop or something similar before to check the logs and the default graphs the system sends don't include the RAM usage -- you could consider it as a nice-to-have addition. What's your opinion based on this preliminary data? (I'm attaching the output of atop and free bellow) $ atop PRC | sys 0.41s | user 1.02s | #proc 100 | #tslpi 109 | #tslpu 0 | #zombie 0 | #exit 0 | CPU | sys 4% | user 11% | irq 0% | idle 82% | wait 2% | curf 3.09GHz | curscal ?% | CPL | avg1 0.46 | avg5 0.38 | avg15 0.15 | csw 24571 | intr 13795 | | numcpu 1 | MEM | tot 1.0G | free 31.2M | cache 129.1M | dirty 0.8M | buff 90.0M | slab 250.9M | | SWP | tot 1.0G | free 991.7M | | | | vmcom 367.1M | vmlim 1.5G | LVM | vgpbx-lvpbx1 | busy 2% | read 0 | write 114 | MBr/s 0.00 | MBw/s 0.04 | avio 1.46 ms | LVM | pbxrec-lvrec | busy 1% | read 0 | write 251 | MBr/s 0.00 | MBw/s 0.10 | avio 0.54 ms | LVM | xcdrt-lvcdrt | busy 1% | read 0 | write 19 | MBr/s 0.00 | MBw/s 0.01 | avio 4.53 ms | LVM | roup-lv_root | busy 0% | read 0 | write 29 | MBr/s 0.00 | MBw/s 0.01 | avio 1.24 ms | LVM | pbxlog-lvlog | busy 0% | read 0 | write 4 | MBr/s 0.00 | MBw/s 0.00 | avio 6.25 ms | DSK | vdb | busy 2% | read 0 | write 99 | MBr/s 0.00 | MBw/s 0.04 | avio 1.69 ms | DSK | vde | busy 1% | read 0 | write 11 | MBr/s 0.00 | MBw/s 0.10 | avio 12.3 ms | DSK | vdd | busy 1% | read 0 | write 9 | MBr/s 0.00 | MBw/s 0.01 | avio 9.56 ms | DSK | vda | busy 0% | read 0 | write 26 | MBr/s 0.00 | MBw/s 0.01 | avio 1.38 ms | DSK | vdc | busy 0% | read 0 | write 4 | MBr/s 0.00 | MBw/s 0.00 | avio 6.25 ms | NET | transport | tcpi 150 | tcpo 113 | udpi 12333 | udpo 12347 | tcpao 2 | tcppo 0 | NET | network | ipi 12504 | ipo 12465 | ipfrw 0 | deliv 12488 | icmpi 2 | icmpo 3 | NET | eth0 ---- | pcki 12406 | pcko 11962 | si 1173 Kbps | so 986 Kbps | erri 0 | erro 0 | NET | eth1 ---- | pcki 100 | pcko 504 | si 15 Kbps | so 85 Kbps | erri 0 | erro 0 | PID TID MINFLT MAJFLT VSTEXT VSLIBS VDATA VSTACK VSIZE RSIZE VGROW RGROW SWAPSZ RUID AMEM CMD 1/1 11463 - 0 0 6316K 3008K 600.1M 88K 609.4M 478.4M 0K 0K 25852K root 48% pbxctrl 26864 - 465 0 184K 2236K 2392K 88K 4980K 4972K 0K 0K 0K root 0% atop $ free -m total used free shared buffers cached Mem: 1006 987 18 0 90 142 -/+ buffers/cache: 754 251 Swap: 1023 34 989 Quote Link to comment Share on other sites More sharing options...
ndemou Posted September 22, 2015 Author Report Share Posted September 22, 2015 Also regarding your recomendation it would be great if you can generate a core dump As far as I know I should give this commands to force pbxctrl to crash and generate a coredump: ulimit -c unlimited killall -SIGSEGV pbxctrl am I correct or do you need something else? Quote Link to comment Share on other sites More sharing options...
Vodia PBX Posted September 22, 2015 Report Share Posted September 22, 2015 I think you need to set the ulimit before starting the service. If you have gcore you can use that one to generate the core dump. Quote Link to comment Share on other sites More sharing options...
ndemou Posted September 22, 2015 Author Report Share Posted September 22, 2015 Thanks Admin. Didn't know about gcore, I've tested and kill -STOP `pgrep pbxctrl` && gcore `pgrep pbxctrl` did the job in CentOS (after a yum install gdb ofcourse). I'd still like to hear your opinion about post #7. Quote Link to comment Share on other sites More sharing options...
Vodia PBX Posted September 22, 2015 Report Share Posted September 22, 2015 Yea the process size is not too big for 32 bit. I think the only thing we can do at this point is have that gcore stuff ready if it should happen again. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.