Jump to content

ndemou

Members
  • Posts

    171
  • Joined

  • Last visited

Everything posted by ndemou

  1. Oh! I see..I hope your problems end sooner than later Not sure what you meant by "Password changes should work now". Does it have something to do with my posts being left waiting for moderator approval? Should I change my password? SNMP? Sorry I lost you. Anyway the behaviour is exactly as described in posts #1 and #3 As stated, I tested on v5.5.0 -- I think it is the latest. It's extremely easy to test on any version. Set 127.0.0.1 as the mailserver, set debug level at 7, start the pbx and look at the logs.
  2. Just noticed that this bug is still an issue on v5.5.0. (oh! and the annoying "This post will need approval from a moderator before this post is shown." still appears above the text box where I'm writing this)
  3. I'm trying to download the backup of a domain that is ~68MB and it always stops at 8.4M. Firefox says "Failed" and I can find nothing relevant in the log files (at least until debug level 7). My version is 5.3.2a (CentOS32). I'm trying to move the domain from my production to my test server on the latest version. Any ideas on how to either complete the backup or any other way to move the domain to another server?
  4. Yes it's a straight forward procedure and everything goes as you describe: You used a gmail address in the email field? [yes] <---- do note however that the email WAS NOT at the gmail.com domain Then you pressed "Get Auth Code" button, which opened a new window [yes] ... ...where you entered your login email and password? [yes] It entered your account where you pressed the button Allow Access [yes], ... ...which opens another page with the Authorization code, asking you to paste it in your application? [yes] Then you pasted that code in the "Paste Auth Code Here:" field and pressed "Login/Refresh"? [yes] There are many more log lines filled with hexadecimal noise. If it is safe to paste them here and could help you just ask. Same if there is anything else I can do to help you help me.
  5. OK thanks. Indeed they are less than 1% of the calls.
  6. I've tried to sync my address book from two different google accounts without 2 stage authentication and both times it failed with these lines in the logs: [7] 20160619212105: https:accounts.google.com:443: TCP disconnect [7] 20160619212105: https:accounts.google.com:443: Return code 500 [9] 20160619212105: Carddav: token response: code=500 [5] 20160619212105: Carddav: Failed to get token. Error code = 500 [7] 20160619212105: Closing connection https:accounts.google.com:443 Tried with v5.3.2 and 5.4.1 Your help is welcome.
  7. One of our customers is getting a few of those "Other redirected calls" in his daily reports. What are they? Is there any documentation for the metrics of the Qstatus report?
  8. Help! Where do my posts go? I've replied within hours of your question but my post doesn't show up after many days. Then and now I see a green label "This post will need approval from a moderator before this post is shown." This is the second thread with the same problem.
  9. Again!? This is my second thread with this obstacle when posting. Is it only me or it happens to everyone? Anyway, back to your questions: Regarding the setting: it's 127.0.0.1 Regarding the logs -- yes it does appear there also: tail /opt/pbxnsip/pbx/log/2016-03-04.txt [7] 20160304150933: 127.0.0.1: Connect to 127.0.0.1:25 [7] 20160304150933: 127.0.0.1: Connected [7] 20160304150933: 127.0.0.1: Connect to 127.0.0.1:25 [7] 20160304150933: 127.0.0.1: Connected [7] 20160304150933: 127.0.0.1: Connect to 127.0.0.1:25 [7] 20160304150933: 127.0.0.1: Connected [7] 20160304150933: 127.0.0.1: Connect to 127.0.0.1:25 [7] 20160304150933: 127.0.0.1: Connected [7] 20160304150933: 127.0.0.1: Connect to 127.0.0.1:25 [7] 20160304150933: 127.0.0.1: Connected # root @ testpbx.zanadoo.gr ~ grep "Connect to 127.0.0.1" /opt/pbxnsip/pbx/log/2016-03-04.txt |head -1 [7] 20160304150851: 127.0.0.1: Connect to 127.0.0.1:25 # root @ testpbx.zanadoo.gr ~ grep "Connect to 127.0.0.1" /opt/pbxnsip/pbx/log/2016-03-04.txt |tail -1 [7] 20160304150933: 127.0.0.1: Connect to 127.0.0.1:25 # root @ testpbx.zanadoo.gr ~ grep "Connect to 127.0.0.1" /opt/pbxnsip/pbx/log/2016-03-04.txt |wc -l 353085
  10. When we tried to start v5.4.0 on our development edition without a local SMTP server active we noticed that the pbx was not starting, the CPU was at 99% usage and strace revealed 5000 thousand requests per second to connect to the localhost SMTP port: 0.000199 connect(1, {sa_family=AF_INET, sin_port=htons(25), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EINPROGRESS (Operation now in progress) 0.000277 connect(1, {sa_family=AF_INET, sin_port=htons(25), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EINPROGRESS (Operation now in progress) 0.000177 connect(1, {sa_family=AF_INET, sin_port=htons(25), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EINPROGRESS (Operation now in progress) 0.000176 connect(1, {sa_family=AF_INET, sin_port=htons(25), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EINPROGRESS (Operation now in progress) 0.000175 connect(1, {sa_family=AF_INET, sin_port=htons(25), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EINPROGRESS (Operation now in progress) 0.000175 connect(1, {sa_family=AF_INET, sin_port=htons(25), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EINPROGRESS (Operation now in progress) 0.000284 connect(1, {sa_family=AF_INET, sin_port=htons(25), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EINPROGRESS (Operation now in progress) I'm noting it because failure in one part of a system (the SMTP service) will cause the PBX to fail to start and there is no clear clue as to what is happening. BTW just accepting the connection by a dummy process (ncat -l 25 --keep-open --exec /bin/cat) fixes the problem -- no need for an actual SMTP response.
  11. Can you also remove the requirement for moderator approval from this thread?
  12. Unfortunately the 5.3.2a .dat file didn't fix this neither in the development version which has no customization nor in the production version with light customization. What now? BTW when I'm typing this reply I see above the editor box a green warning "This post will need approval from a moderator before this post is shown."
  13. Can you please look at this issue. We don't mind waiting for the WebUI issue but the emails that are sent every day to our customers keep reminding them of the issue. We can handle the low level of calls to our support centre but the fact that we are unable to solve this issue that is perceived as "easy" by our customers is hurting our image.
  14. We've upgraded from 5.2.4 to 5.3.2a and in the Web UI and in emails from the PBX we see empty space in the place where you would expect to see the caption of a field that was introduced after 5.2.4. An image is worth a thousand words, here are two (one from the web UI and one from an email report about an ACD queue). Yellow lines are marking the empty spaces:
  15. Sorry but I can't tell for sure which statement of your comment corresponds to which one of the 3 errors I've mentioned. I guess that the first paragraph is about the call-id already exists error -- am I right? Anyway during the last 3 days I had ~10.000 unique call-ids recorded in my logs and 384 "call-id already exists" errors. The call-ids are like random@IP. I noticed that 80% of the errors originate from only 6.5% of the IPs. The top offender (@192.168.10.237) appears in 11% of the errors. So it appears to be something the has to do with only a limited number of phones. If however I understand your comment correctly you suggest that a reboot of the VoIP phone is required to trigger the behaviour that you think causes the event in the logs. It's highly unlikely that about 10-15 phones get rebooted so often. What will be the result for the caller? Am I right in guessing that his call will be rejected and he'll have to try again? I can't guess what the 2nd paragraph is about. Please clarify.
  16. To fresh your memory you asked me to install 5.3.2a in order to investigate the errors "Writing to row that does not exist, ignoring request" that appear in the logs. I did so and I still got one error per day on busy (non weekend) days. What's next? grep -B5 -A5 "Writing to row that does not exist" /opt/pbxnsip/pbx/log/2016-01-08.txt [7] 20160108150939: Port 305: Determine pass-through mode after receiving response [7] 20160108150939: Port 305: RTP pass-through mode [7] 20160108150939: Port 304: RTP pass-through mode [6] 20160108150939: Port 305: Different Codecs (local G729/8000, remote PCMA/8000), falling back to transcoding [6] 20160108150939: Port 304: Different Codecs (local PCMA/8000, remote G729/8000), falling back to transcoding [2] 20160108150940: Writing to row that does not exist, ignoring request [2] 20160108150940: Last message repeated 2 times [3] 20160108150940: lookup_index error: Call port index is -1 [3] 20160108150940: Last message repeated 2 times [5] 20160108150940: Not setting dialog state of non-existing call port (call-id=) [5] 20160108150941: set codec: codec PCMA/8000 is set to call-leg 307 And by the way on busy days, I also get hundreds of lines with the error "[1]...Call-ID xxxxx already used before" and "[2] No billing email available for domain xxxxx". I don't know if this is serious and/or related.
  17. OK after testing yesterday we've finally felt confident to upgrade the production server to 5.3.2a. What will you need to proceed? Anyway here are today's errors: grep -h "^\[[0-2]" /opt/pbxnsip/pbx/log/2016-01-07.txt|grep -v 'Reject call because\|connection refused\|Hangup reason:\|expired call\|already used before\|No billing email available for domain' [2] 20160107123707: Writing to row that does not exist, ignoring request [2] 20160107123707: Last message repeated 2 times [0] 20160107162723: SMTP: Illegal state and here are a few lines before and after the "Writing to row that does not exist," grep -B5 -A5 "Writing to row that does not exist" /opt/pbxnsip/pbx/log/2016-01-07.txt [7] 20160107123706: Port 432: Clear last request [5] 20160107123706: BYE Response: Terminate 28df04006923-568e3d9e-434d6383-2e953778-1f7428@127.0.0.1 [6] 20160107123706: 4 more requests pending to 10.13.37.102:8080 [6] 20160107123706: Port 279: Sending RTP to 91.138.139.181:45962, codec G729/8000 [3] 20160107123707: lookup_index error: Call port index is -1 [2] 20160107123707: Writing to row that does not exist, ignoring request [2] 20160107123707: Last message repeated 2 times [3] 20160107123707: lookup_index error: Call port index is -1 [5] 20160107123707: Not setting dialog state of non-existing call port (call-id=) [6] 20160107123707: 5 more requests pending to 10.13.37.102:8080 [7] 20160107123707: Port 244: Media-aware pass-through mode By the way, I also had 157 lines with the error "[1]...Call-ID xxxxx already used before" and 332 "[2] No billing email available for domain xxxxx"
  18. @Vodia: We're running CentOS 6, 32bit @ahennis: I think that your issue is unrelated to the subject of this thread. If it really is, then please begin another thread to make it easier for me, vodia and other interested readers to follow the discussion on both issues.
  19. Regarding the loss of CDRs issue: I don't quite understand your comment (English is not my native language). Am I right in my understanding that there is nothing to worry about? Regarding "writing to a row that does not exist": I'll have to discuss it with management and get back to you on this. To help me make a my point I'd like to know a) what is it that we worry about here if you feel confident running the suggested version on a heavily loaded production system that currently runs 5.2.4 (we have a second server standing by in case of trouble but the trouble and the switch-over will not go by unnoticed by our clients) Regarding the SMTP logs: by "get us a separate log for the email client" you mean you want me to set "Log email events:" to level #9 and send you the resulting log events when I hit the error message?
  20. Looking at directories cdrt and cdri the numbering of files is not continuous. cd /opt/pbxnsip/pbx/cdrt ls | grep "337114..xml" 3371140.xml 3371141.xml 3371142.xml <---|___where's 3,4? 3371145.xml <---| 3371146.xml 3371147.xml 3371148.xml 3371149.xml It's not very common of course: ls | grep "336.....xml"|wc -l 9986 that's two CDRs out of 10000... I've checked my backups for the last two months and I have the same rare but reoccurring jumps in numbering I feel like cold water is running down my spine... P.S. Checked the logs of the last 3 days for severity levels [0]...[2] and I don't see something that I can correlate: grep -h "^\[[0-2]" /opt/pbxnsip/pbx/log/2015-1[12]-*|grep -v 'Reject call because\|connection refused\|Hangup reason:\|expired call' [0] 20151130102621: SMTP: Illegal state [2] 20151130134339: Writing to row that does not exist, ignoring request [1] 20151130225655: Timeout: Call 153 not found [2] 20151201095035: Writing to row that does not exist, ignoring request [2] 20151201095608: Writing to row that does not exist, ignoring request [0] 20151201112834: SMTP: Illegal state [2] 20151201120229: Writing to row that does not exist, ignoring request [2] 20151201120229: Last message repeated 2 times [0] 20151201131824: SMTP: Illegal state [0] 20151201142546: SMTP: Illegal state
  21. Thanks Admin. Didn't know about gcore, I've tested and kill -STOP `pgrep pbxctrl` && gcore `pgrep pbxctrl` did the job in CentOS (after a yum install gdb ofcourse). I'd still like to hear your opinion about post #7.
  22. Also regarding your recomendation As far as I know I should give this commands to force pbxctrl to crash and generate a coredump: ulimit -c unlimited killall -SIGSEGV pbxctrl am I correct or do you need something else?
  23. Thanks for the detailed explanation. I only have 1GB of RAM so I've plenty of room to grow before moving to 64bit. I've installed atop with the default logging options and I'll keep an eye on memory usage. It already seems that memory is tight: One observation of the system during not so high load shows that pbxctrl has allocated 610M of virtual memory with 480M of them taking up real memory (resident in memory). Adding everything else (the kernel slab being considerable) I only have about 250MB of free memory. Unfortunately I didn't have atop or something similar before to check the logs and the default graphs the system sends don't include the RAM usage -- you could consider it as a nice-to-have addition. What's your opinion based on this preliminary data? (I'm attaching the output of atop and free bellow) $ atop PRC | sys 0.41s | user 1.02s | #proc 100 | #tslpi 109 | #tslpu 0 | #zombie 0 | #exit 0 | CPU | sys 4% | user 11% | irq 0% | idle 82% | wait 2% | curf 3.09GHz | curscal ?% | CPL | avg1 0.46 | avg5 0.38 | avg15 0.15 | csw 24571 | intr 13795 | | numcpu 1 | MEM | tot 1.0G | free 31.2M | cache 129.1M | dirty 0.8M | buff 90.0M | slab 250.9M | | SWP | tot 1.0G | free 991.7M | | | | vmcom 367.1M | vmlim 1.5G | LVM | vgpbx-lvpbx1 | busy 2% | read 0 | write 114 | MBr/s 0.00 | MBw/s 0.04 | avio 1.46 ms | LVM | pbxrec-lvrec | busy 1% | read 0 | write 251 | MBr/s 0.00 | MBw/s 0.10 | avio 0.54 ms | LVM | xcdrt-lvcdrt | busy 1% | read 0 | write 19 | MBr/s 0.00 | MBw/s 0.01 | avio 4.53 ms | LVM | roup-lv_root | busy 0% | read 0 | write 29 | MBr/s 0.00 | MBw/s 0.01 | avio 1.24 ms | LVM | pbxlog-lvlog | busy 0% | read 0 | write 4 | MBr/s 0.00 | MBw/s 0.00 | avio 6.25 ms | DSK | vdb | busy 2% | read 0 | write 99 | MBr/s 0.00 | MBw/s 0.04 | avio 1.69 ms | DSK | vde | busy 1% | read 0 | write 11 | MBr/s 0.00 | MBw/s 0.10 | avio 12.3 ms | DSK | vdd | busy 1% | read 0 | write 9 | MBr/s 0.00 | MBw/s 0.01 | avio 9.56 ms | DSK | vda | busy 0% | read 0 | write 26 | MBr/s 0.00 | MBw/s 0.01 | avio 1.38 ms | DSK | vdc | busy 0% | read 0 | write 4 | MBr/s 0.00 | MBw/s 0.00 | avio 6.25 ms | NET | transport | tcpi 150 | tcpo 113 | udpi 12333 | udpo 12347 | tcpao 2 | tcppo 0 | NET | network | ipi 12504 | ipo 12465 | ipfrw 0 | deliv 12488 | icmpi 2 | icmpo 3 | NET | eth0 ---- | pcki 12406 | pcko 11962 | si 1173 Kbps | so 986 Kbps | erri 0 | erro 0 | NET | eth1 ---- | pcki 100 | pcko 504 | si 15 Kbps | so 85 Kbps | erri 0 | erro 0 | PID TID MINFLT MAJFLT VSTEXT VSLIBS VDATA VSTACK VSIZE RSIZE VGROW RGROW SWAPSZ RUID AMEM CMD 1/1 11463 - 0 0 6316K 3008K 600.1M 88K 609.4M 478.4M 0K 0K 25852K root 48% pbxctrl 26864 - 465 0 184K 2236K 2392K 88K 4980K 4972K 0K 0K 0K root 0% atop $ free -m total used free shared buffers cached Mem: 1006 987 18 0 90 142 -/+ buffers/cache: 754 251 Swap: 1023 34 989
  24. The above script is nice indeed and we use it for many years. Here it is again, properly formatted and with a spell correction for anyone that maybe interested. #!/bin/bash # ### BEGIN INIT INFO # Provides: pbx # Required-Start: $local_fs $network # Required-Stop: $local_fs $network # Default-Start: 3 4 5 # Default-Stop: 0 1 2 6 # Short-Description: start and stop pbx # Description: Init script for pbxnsip. ### END INIT INFO # source function library . /etc/init.d/functions RETVAL=0 # Installation location INSTALLDIR=/opt/pbxnsip/pbx PBX=pbxctrl PID_FILE=/var/run/$PBX.pid LOCK_FILE=/var/lock/subsys/$PBX PBX_OPTIONS="--dir $INSTALLDIR --pidfile $PID_FILE" [ -x $INSTALLDIR/$PBX ] || exit 5 start() { echo -n "Starting PBX: " daemon --pidfile $PID_FILE $INSTALLDIR/$PBX $PBX_OPTIONS RETVAL=$? echo [ $RETVAL -eq 0 ] && touch $LOCK_FILE return $RETVAL } stop() { echo -n "Stopping PBX: " killproc -p $PID_FILE $PBX RETVAL=$? echo [ $RETVAL -eq 0 ] && rm -f $LOCK_FILE return $RETVAL } case "$1" in start) start ;; stop) stop ;; restart) stop start ;; force-reload) stop start ;; status) #/bin/netstat -anp|grep pbxctrl|grep LISTEN|grep -v ':::' status -p $PID_FILE $PBX RETVAL=$? ;; *) echo $"Usage: $0 {start|stop|restart|force-reload|status}" exit 2 esac exit $RETVAL
×
×
  • Create New...