Jump to content

Am I missing CDRs?!!


ndemou

Recommended Posts

Looking at directories cdrt and cdri the numbering of files is not continuous.

cd /opt/pbxnsip/pbx/cdrt
ls | grep "337114..xml"
3371140.xml
3371141.xml
3371142.xml <---|___where's 3,4?
3371145.xml <---|
3371146.xml
3371147.xml
3371148.xml
3371149.xml

It's not very common of course:

ls | grep "336.....xml"|wc -l
   9986

that's two CDRs out of 10000...

 

I've checked my backups for the last two months and I have the same rare but reoccurring jumps in numbering

 

I feel like cold water is running down my spine...

 

P.S.

Checked the logs of the last 3 days for severity levels [0]...[2] and I don't see something that I can correlate:

grep -h "^\[[0-2]"  /opt/pbxnsip/pbx/log/2015-1[12]-*|grep -v 'Reject call because\|connection refused\|Hangup reason:\|expired call'
[0] 20151130102621: SMTP: Illegal state
[2] 20151130134339: Writing to row that does not exist, ignoring request
[1] 20151130225655: Timeout: Call 153 not found
[2] 20151201095035: Writing to row that does not exist, ignoring request
[2] 20151201095608: Writing to row that does not exist, ignoring request
[0] 20151201112834: SMTP: Illegal state
[2] 20151201120229: Writing to row that does not exist, ignoring request
[2] 20151201120229: Last message repeated 2 times
[0] 20151201131824: SMTP: Illegal state
[0] 20151201142546: SMTP: Illegal state
Link to comment
Share on other sites

LOL no that can really happen. For example, the number of CDR per account can be limited, and then the PBX will bit those "dents" into the index.

 

The writing to a row that does not exist is a little bit more serious. Next version will include the table name so that we get a better idea which table is affected. Let me know if you want to have a 5.3.2a build (which OS), which is pretty much the same like 5.3.2 at the moment.

 

The SMTP log messages are also a little concerning. Maybe you can get us a separate log for the email client (attachment is okay) so that we can see if there is anything serious.

Link to comment
Share on other sites

Regarding the loss of CDRs issue: I don't quite understand your comment (English is not my native language). Am I right in my understanding that there is nothing to worry about?

 

Regarding "writing to a row that does not exist": I'll have to discuss it with management and get back to you on this. To help me make a my point I'd like to know a) what is it that we worry about here B) if you feel confident running the suggested version on a heavily loaded production system that currently runs 5.2.4 (we have a second server standing by in case of trouble but the trouble and the switch-over will not go by unnoticed by our clients)

 

Regarding the SMTP logs: by "get us a separate log for the email client" you mean you want me to set "Log email events:" to level #9 and send you the resulting log events when I hit the error message?

Link to comment
Share on other sites

Regarding the loss of CDRs issue: I don't quite understand your comment (English is not my native language). Am I right in my understanding that there is nothing to worry about?

 

Regarding "writing to a row that does not exist": I'll have to discuss it with management and get back to you on this. To help me make a my point I'd like to know a) what is it that we worry about here B) if you feel confident running the suggested version on a heavily loaded production system that currently runs 5.2.4 (we have a second server standing by in case of trouble but the trouble and the switch-over will not go by unnoticed by our clients)

 

Regarding the SMTP logs: by "get us a separate log for the email client" you mean you want me to set "Log email events:" to level #9 and send you the resulting log events when I hit the error message?

 

The missing records are generally no reason for concern. We have already added the new log to the latest version, it would be great if you can upgrade to 5.3.2a but before you do that please let us know what OS you are running.

 

For the SMTP you can just send me a private message with the SMTP log (level 9). In that message you can also include the OS version.

Link to comment
Share on other sites

What does 5.3.2a include?

 

So far not much. There was a problem with conference rooms not showing the dial plan in the web interface, a problem with the new snom provisioning server, and some changed for the web interface for PnP. But the 5.3.2a will be a moving target as we work on features and fixes.

Link to comment
Share on other sites

  • 1 month later...

OK after testing yesterday we've finally felt confident to upgrade the production server to 5.3.2a. What will you need to proceed?

 

Anyway here are today's errors:

grep -h "^\[[0-2]"  /opt/pbxnsip/pbx/log/2016-01-07.txt|grep -v 'Reject call because\|connection refused\|Hangup reason:\|expired call\|already used before\|No billing email available for domain'
[2] 20160107123707: Writing to row that does not exist, ignoring request
[2] 20160107123707: Last message repeated 2 times
[0] 20160107162723: SMTP: Illegal state

and here are a few lines before and after the "Writing to row that does not exist,"

grep -B5 -A5 "Writing to row that does not exist"  /opt/pbxnsip/pbx/log/2016-01-07.txt
[7] 20160107123706: Port 432: Clear last request
[5] 20160107123706: BYE Response: Terminate 28df04006923-568e3d9e-434d6383-2e953778-1f7428@127.0.0.1
[6] 20160107123706: 4 more requests pending to 10.13.37.102:8080
[6] 20160107123706: Port 279: Sending RTP to 91.138.139.181:45962, codec G729/8000
[3] 20160107123707: lookup_index error: Call port index is -1
[2] 20160107123707: Writing to row that does not exist, ignoring request
[2] 20160107123707: Last message repeated 2 times
[3] 20160107123707: lookup_index error: Call port index is -1
[5] 20160107123707: Not setting dialog state of non-existing call port (call-id=)
[6] 20160107123707: 5 more requests pending to 10.13.37.102:8080
[7] 20160107123707: Port 244: Media-aware pass-through mode

By the way, I also had 157 lines with the error "[1]...Call-ID xxxxx already used before" and 332 "[2] No billing email available for domain xxxxx"

Link to comment
Share on other sites

To fresh your memory you asked me to install 5.3.2a in order to investigate the errors "Writing to row that does not exist, ignoring request" that appear in the logs. I did so and I still got one error per day on busy (non weekend) days. What's next?

grep -B5 -A5 "Writing to row that does not exist"  /opt/pbxnsip/pbx/log/2016-01-08.txt
[7] 20160108150939: Port 305: Determine pass-through mode after receiving response
[7] 20160108150939: Port 305: RTP pass-through mode
[7] 20160108150939: Port 304: RTP pass-through mode
[6] 20160108150939: Port 305: Different Codecs (local G729/8000, remote PCMA/8000), falling back to transcoding
[6] 20160108150939: Port 304: Different Codecs (local PCMA/8000, remote G729/8000), falling back to transcoding
[2] 20160108150940: Writing to row that does not exist, ignoring request
[2] 20160108150940: Last message repeated 2 times
[3] 20160108150940: lookup_index error: Call port index is -1
[3] 20160108150940: Last message repeated 2 times
[5] 20160108150940: Not setting dialog state of non-existing call port (call-id=)
[5] 20160108150941: set codec: codec PCMA/8000 is set to call-leg 307

And by the way on busy days, I also get hundreds of lines with the error "[1]...Call-ID xxxxx already used before" and "[2] No billing email available for domain xxxxx". I don't know if this is serious and/or related.

Link to comment
Share on other sites

Could it be that some of your phones are using the same Call-ID? For example VoIP phones that don't have the clock set don't have any random input and always generate the same Call-ID after a reboot. In that case I don't think it would be a serious problem.

 

We have changed something in the subsystem recently that should make the PBX more robust against timeouts when calls disappear. Although I have problems making the direct connection, it might cause something like that.

Link to comment
Share on other sites

Sorry but I can't tell for sure which statement of your comment corresponds to which one of the 3 errors I've mentioned.

 

I guess that the first paragraph is about the call-id already exists error -- am I right? Anyway during the last 3 days I had ~10.000 unique call-ids recorded in my logs and 384 "call-id already exists" errors. The call-ids are like random@IP. I noticed that 80% of the errors originate from only 6.5% of the IPs. The top offender (@192.168.10.237) appears in 11% of the errors. So it appears to be something the has to do with only a limited number of phones. If however I understand your comment correctly you suggest that a reboot of the VoIP phone is required to trigger the behaviour that you think causes the event in the logs. It's highly unlikely that about 10-15 phones get rebooted so often. What will be the result for the caller? Am I right in guessing that his call will be rejected and he'll have to try again?

 

I can't guess what the 2nd paragraph is about. Please clarify.

Link to comment
Share on other sites

Yea it is sometimes hard to keep the message threads separated. But that can also be the beauty of a forum!

 

In the old days we had the same problem: The Call-ID generated from the PBX were not random. Actually not all. The reason is that the C-library function rand() is actually deterministic, and it always starts with the same value after a reboot. That is when we discovered the function srand. I guess that device has the same problem. If it is a soft phone, it might actually happen if it gets started up, which may happen a lot on a PC. Who ever has written the software, send them a link to the srand library function.

 

The 2nd paragraph was about the question what the PBX should do about it. The answer is, it should still work. The Call-ID is only a part of the call identifier; only if the IP address and the To-tag are the same, the PBX will have problems telling the calls apart. This might happen if the softphone gets restarted quickly; in that case a couple of wrong CDR entries are the least problem.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...