Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Section Peer with allow=!all,amr creates an Encoder with Format (nil) #8

Open
mtryfoss opened this issue Oct 24, 2016 · 49 comments
Open

Comments

@mtryfoss
Copy link
Contributor

Hello!

This is related to issue #7

Please listen to this recording done on the Asterisk transcoding AMR-WB to G.722. The same applies when transcoding to G.711 too. Seems like the destination codec is irrelevant since the noise exists when recording the caller-side in Asterisk.

In the wav file, the noise starts around 0:26 into the call. It stops, and then starts about 30 seconds after that again. Seems to happen when there's silence in the room I'm sitting in.

Attaching both recording and pcap. Source ip 185.97.84.21.
pcap-and-recording.zip

Please let me know if you need anything more.

@mtryfoss
Copy link
Contributor Author

Actually, the clicking noise seems to be there at all times. It's just not very loud. Trying to debug a little bit more myself.

@traud
Copy link
Owner

traud commented Oct 24, 2016

Please, go through this guide how to write a qualified bug report. I do not know which VoIP/SIP client is involved. I do not know its release version. I do not know its settings regarding VAD/DTX/CN. Furthermore, I do not know what was negotiated on the SDP layer. If available by your VoIP/SIP client, I recommend to force the octet-aligned mode, because then Wireshark is able to help you decoding the AMR packets.

Every 20ms, a data packet should arrive. 185.97.84.21 is not your Asterisk server, I guess(!). Looking at the PCAP, the client is sending much less packets (around factor 8 = 160ms). Furthermore, your client is sending SID packets (CMR = 2, F = 0, FT = 9). Both scenarios were not tested by me, because I was not able to create such a scenario.

Please, double-check whether amrtolin_framein is creating the right amount of samples in your case. It should be 8 times more than usual, because not 20ms but 160ms (not 320 samples but 2560 samples) were covered by those SID data packets. Furthermore, I am not sure why 185.97.84.21 sends AMR-WB mode 2 and Asterisk (158.58.154.47) sends AMR-WB mode 0. Asterisk should follow the remote request and switch over to mode 2 as well. Please double-check/debug this and report (beside the information, which AMR-WB sender you are using in this case; i.e. CounterPath Bria supports AMR-WB as well).

@mtryfoss
Copy link
Contributor Author

Thanks for your reply.
I'm still trying to debug this myself. Thanks for the input on what to check.

158.97.84.21 is our Ericsson MSC.

You're right. It's those SID packets creating this noise.
A simple if(type == 9) return 0; in amrtolin_framein removed the clicking noise, but then there's no RTP sent to the other side.

I wonder what's the best approach for this? Start some kind of CNG, send empty frames..?

Regarding the different modes. I will do some more testing here and let you know.

@traud
Copy link
Owner

traud commented Oct 25, 2016

What's the best approach for this?

First, we have to understand what D_IF_decode does in case of SID. Did you look into the source code of OpenCore AMR already?

Regarding the different modes. I will do some more testing here and let you know.

Who started that call: Ericsson MSC or Asterisk? Please, put a debug statement into lintoamr_frameout and double-check whether attr is NULL. If that is NULL, you found a bug. Is that still Asterisk 13.8 with chan_sip?

@mtryfoss
Copy link
Contributor Author

I've not looked into the source code of OpenCore AMR yet.

Asterisk started the call. It's still Asterisk 13.8 and chan_sip.
I've already debugged that attr is null in that case.

Tried putting these lines in the end of amr_getjoint:
ast_format_set_attribute_data(format1, attr_res);
ast_format_set_attribute_data(format2 attr_res);

It gives a compiler warning (warning: passing argument 1 of ‘ast_format_set_attribute_data’ discards qualifiers from pointer target type), but still works. I don't know if that would break anything else.

@mtryfoss
Copy link
Contributor Author

mtryfoss commented Oct 25, 2016

I've looked a bit at pjsip's implementation of this.

Check out https://github.com/chebur/pjsip/blob/27dc7a969434c80c95a5c86fb3a36d07bce4df74/build/pjproject/src/pjmedia/include/pjmedia-codec/amr_helper.h

They have a predecode function which handles the SID frame specially on line 812:
} else if (in_info->frame_type == SID_FT) {

This function is called before D_IF_decode.

    input_.buf = &bitstream[1];
    /* AMR max frame size */
    input_.size = (amr_data->dec_setting.amr_nb? 31: 60);
    pjmedia_codec_amr_predecode(input, &amr_data->dec_setting, &input_);
    info = (pjmedia_codec_amr_bit_info*)&input_.bit_info;

    /* VA AMR decoder requires frame info in the first byte. */
    bitstream[0] = (info->frame_type << 3) | (info->good_quality << 2);
    TRACE_((THIS_FILE, "AMR decode(): mode=%d, ft=%d, size=%d",
        info->mode, info->frame_type, input_.size));

    /* Decode */
    if (amr_data->dec_setting.amr_nb) {
#ifdef USE_AMRNB
        Decoder_Interface_Decode(amr_data->decoder, bitstream,
                                 (pj_int16_t*)output->buf, 0);
#endif
    } else {
#ifdef USE_AMRWB
        D_IF_decode(amr_data->decoder, bitstream,
                    (pj_int16_t*)output->buf, 0);
#endif
    }

@traud
Copy link
Owner

traud commented Oct 25, 2016

I've not looked into the source code of OpenCore AMR.

Please, do so. That is the first step to add SID handling (does it create silence, in-media comfort noise, how many samples). Because that feature does not tackle my scenarios, you have to drive that issue. If you have any questions, please, say so. The code and a mailing list can be found on SourceForge…

PJSIP

There were so many bugs in PJSIP, I had to fix before I could use AMR-WB. That indicated that AMR-WB was not used for years – if ever. Therefore, I would rather start with D_IF_decode and how/whether that handles SID.

I do not know if that would break anything else.

You should not change an existing format because you never know where that format is used or even re-used (for example on a subsequent different channel). For AMR, this format is used to communicate the CMR between the two call legs (incoming and outgoing RTP traffic). Now, you have to debug why amr_getjoint is called with a format which was
A) not a cached format but
B) created before the attribute module was loaded (and therefore has no attribute data).

If you documented your setup, I could try to reproduce your issue and help you in that regard. For example, how is your Ericsson MSC connected: UDP, TCP, TLS, or WebSocket.
A) Is Asterisk registered at that Ericsson MSC (SIP-REGISTER and SIP-INVITE)? Or
B) Is that Ericsson MSC just an outbound proxy (just SIP-INVITE)? Or
C) Is that Ericsson MSC used as VoIP/SIP client and registered at your Asterisk?

@mtryfoss
Copy link
Contributor Author

I'll try to dig a little more.

The setup is like this (all UDP):
Cisco SPA504 (any phone would probably do) <- (G.722) -> Asterisk 13.8 <- (AMR-WB) -> Kamailio in front of the two MSC's.
No registrations. Only a defined peer in Asterisk against Kamailio's IP, which routes the call to the MSC's.

If you would like access, I could whitelist the IP of your Asterisk on the Kamailio. The debugging is then straight forward. Just start a call and stay silent :)

@mtryfoss
Copy link
Contributor Author

One thing I notice in Wireshark. Every AMR packet coming from the MSC has CMR field set to 2, even though I'm now (for debugging purposes) forcing mode 2 that direction. I also have confirmed that in Wireshark.

Is this normal?

@mtryfoss
Copy link
Contributor Author

Just updating this ticket with things I find.

I've debugged opencore decoder and I see that all SID frames are being decoded as SID_FIRST, instead of SID_FIRST and subsequent SID_UPDATE. I notice in Wireshark that the SID packets are different:

Payload in first SID packet:
20 4C 00 00 00 00 02
Payload in second, third (etc) SID packet:
20 4C E3 DF 3D E0 12

Could there be some bug in opencore-amr related to this?

@mtryfoss
Copy link
Contributor Author

Here's the logic for decoding (mime_io.cpp):

        case MRSID:

            if (quality)
            {
                if (temp & 0x80)
                {
                    *frame_type = RX_SID_UPDATE;
                }
                else
                {
                    *frame_type = RX_SID_FIRST;
                }
            }

@mtryfoss
Copy link
Contributor Author

The generation of comfort noise seems to stop way too early, and this is causing the clicking (generator stops between each SID).

Earlier you mention that a packet should arrive every 20 ms. Do this apply to SID packets as well? Isn't the whole point to save bandwidth? When there's voice activity the rate speeds up.

@traud
Copy link
Owner

traud commented Oct 26, 2016

packets should arrive every 20 ms

Here in this repository, my module expects a frame every 20 ms. Consequently, VAD/DTX/CN and SID packets are not handled correctly. Or stated differently: In amrtolin_framein, the variable frame_size is 320 samples for AMR-WB and 160 samples for AMR always. Consequently, pvt->samples and pvt->datalen get just 20 ms even in case of SID. This is incorrect because you saw 160 ms on the wire. You found a bug for sure. SID handling is missing. However, the follow-up questions are: Does OpenCore AMR handle SID? How? Or stated differently: What/how much does D_IF_decode write to pvt->outbuf.i16 in case of a SID packet?

all SID frames are being decoded as SID_FIRST, instead of SID_FIRST and subsequent SID_UPDATE

In mime_io.cpp, do you know how that variable temp is derived? If I read TS 26.201 sub-clause 4.2.3 correctly, that should be the STI. Is there a bug in the library itself? Or perhaps am I calling the API incorrectly: Do I have to fill seven NO_DATA frames for the remaining 160 ms - 20 ms. Or should I use that parameter bfi of D_IF_decode?

I could whitelist the IP of your Asterisk on the Kamailio. The debugging is then straight forward.

I do not have a static IP. Anyway, I have found an AMR implementation, which does VAD and I am able to reproduce that SID issue. You are ahead of me right now. And this is not a use-case in my scenario. Therefore, I would be more than happy if you could debug this. At least, I am able to double-check your findings at the end.

Every AMR packet coming from the MSC has CMR field set to 2, even though I'm now (for debugging purposes) forcing mode 2 that direction. I also have confirmed that in Wireshark. Is this normal?

Yes, that is normal. My module sends a ‘do not change’. Some implementations repeat the the current state all the time. You could do the same by replacing the two occurrences of 15 << 4 with attr->mode_current << 4 in codecs/codec_amr.c.

I've already debugged that attr is null in that case.

I am not able to reproduce this issue, yet. That is a separate one. Please create a new issue for that, copying over all bits required to reproduce that. Before you go for that, are you able to update to Asterisk 13.10? That is the version, I am using right now. Or are you on the ‘certified’ branch and therefore using Asterisk 13.8?

@mtryfoss
Copy link
Contributor Author

Thanks for the reply. I'm only answering what I've been able to check right now.

Here in this repository, my module expects a frame every 20 ms. Consequently, VAD/DTX/CN and SID packets are not handled correctly. Or stated differently: In amrtolin_framein, the variable frame_size is 320 samples for AMR-WB and 160 samples for AMR always. Consequently, pvt->samples and pvt->datalen get just 20 ms even in case of SID. This is incorrect because you saw 160 ms on the wire. You found a bug for sure. SID handling is missing. However, the follow-up questions are: Does OpenCore AMR handle SID? How? Or stated differently: What/how much does D_IF_decode write to pvt->outbuf.i16 in case of a SID packet?

The same size as the in variable seems to be written to pvt->outbuf.i16 (8 bytes). Based on what I find on the net, it may seem like the SID frames are only meant to keep a timer based comfort noise generator going - not to be decoded. The payload shall contain characteristics of the background noise which is going to be generated. Correct me if I'm wrong.

traud added a commit that referenced this issue Nov 1, 2016
for Asterisk as receiver
For this, the library OpenCORE-AMR must be patched. If you cannot patch your library, please, instruct your AMR-WB senders to disable their Voice Activation Detection (VAD).
issue found via report #8
@traud
Copy link
Owner

traud commented Nov 1, 2016

all SID frames are being decoded as SID_FIRST, instead of SID_FIRST and subsequent SID_UPDATE

Excellent catch. You found a bug which existed more than 9 years in OpenCORE-AMR, not in the wrapper but the original library from PacketVideo. Up to 5 speech bits got lost per frame. I reported that issue upstream in the Android Open Source Project (AOSP). When it is accepted in AOSP, I am going to report it directly with OpenCORE-AMR.

With a SID frame, the last three bits got lost. Therefore, every SID frame was decoded as SID_FIRST. For convenience, I added that change here in this repository. Please, patch your OpenCORE-AMR. With that patch, I was not able to reproduce those noises anymore. Do you still face any noise?

@mtryfoss
Copy link
Contributor Author

mtryfoss commented Nov 2, 2016

Thanks for you reply!

I've tried this patch, and now the frames seems to be decoded correctly. However, I'm still experiencing this clicking sound. I've been hacking a lot in my local repository, so I'm going to dig a little bit more after this change.

However, I think the issue with not enough frames is still there. The decoder (or an external CNG feature) should fill the gap of 140ms.

The Opencore library seems to have some noise generation features, but I don't know if this is suitable to be repeated/duplicated (multiplied by 8?) when the rate slow down? This raises a new question - how to detect the receive rate of the SID frames? Could we try using something similar to a PLC feature which repeats the last set of bytes if no new?

What I read regarding this is the frame does not contain the noise itself, but the noise characteristics.

For example:
First SID frame -> Prepare noise generator
SID update -> start CNG with charateristics from the frame
Subsequent SID update -> keep CNG going
Normal voice frame -> stop CNG

@traud
Copy link
Owner

traud commented Nov 2, 2016

The first step was to solve that SID and speech bit corruption. We have to go through this symptom step by step.

The decoder (or an external CNG feature) should fill the gap of 140ms.

Do you have any reference for that? As of today, I am not aware of such a requirement. Instead, a VoIP/SIP user agent has to deal with DTX. I looked at the RTP packets send, and all had the expected timestamp and sequence number. I recommend to try another SIP user agent whether it gives that clicking as well.

By the way, you are about the very next step already again. First, VAD/DTX must work correctly. When that works, the next step would be Comfort Noise. Does your Ericsson MSC send Comfort Noise at all? As of today, Asterisk does not have a Comfort Noise Generator (CNG; RFC 3389), see ASTERISK-140. Olle E. Johansson added CNG in Asterisk 1.8, based on this patch. However, that work was not merged into the master branch and therefore does not exist in Asterisk 13. Or stated differently, Comfort Noise is not the direction to look at, because nothing exists in Asterisk for that.

the frame does not contain the noise itself, but the noise characteristics.

Where did you read that? According to the API of OpenCORE-AMR, 320 samples are written after each SID. If that is not correct, you have found another issue in OpenCORE-AMR. However, for that, I need a starting point to reproduce that. Can you provide another example with a clicking sound?

@mtryfoss
Copy link
Contributor Author

mtryfoss commented Nov 2, 2016

I do not think this is related to the client. Here's a recording done at the G.722-side of Asterisk. Adjust the volume up a little bit.

I know Asterisk do not have CNG. However, I'm not sure which of Opencore library or Asterisk that's supposed to do generation of this noise?

noise.wav.zip

The Ericsson switch do not send CN, other than the SID-frames at a much lower rate than when there's audio activity.

Regarding generation of noise:
https://tools.ietf.org/html/rfc3389
I'm not sure if this is how AMR-WB should work, or if it's just a generic specification of CNG.

For example, the CN packet may be sent periodically or only when there is a significant change in the background noise characteristics. The CNG algorithm at the receiver uses the information in the CN payload to update its noise generation model and then produce an appropriate amount of comfort noise.

So, Asterisk may either convert the SID to a CN frame for the G.722 client, or generate noise and send as normal RTP? Neither is supported at this time (as you wrote in the latest post).

Maybe I've just misunderstood the whole thing.

@mtryfoss
Copy link
Contributor Author

mtryfoss commented Nov 2, 2016

Take a look at this (RTP sent from Asterisk to my G.722 phone while receiving SID frames):

[Nov 2 12:47:16] VERBOSE[10745][C-00000003] res_rtp_asterisk.c: Sent RTP packet to 158.58.154.97:58920 (type 09, seq 000594, ts 149448, len 000160)
[Nov 2 12:47:16] VERBOSE[10745][C-00000003] res_rtp_asterisk.c: Sent RTP packet to 158.58.154.97:58920 (type 09, seq 000595, ts 150728, len 000160)
[Nov 2 12:47:16] VERBOSE[10745][C-00000003] res_rtp_asterisk.c: Sent RTP packet to 158.58.154.97:58920 (type 09, seq 000596, ts 152008, len 000160)
[Nov 2 12:47:17] VERBOSE[10745][C-00000003] res_rtp_asterisk.c: Sent RTP packet to 158.58.154.97:58920 (type 09, seq 000597, ts 153288, len 000160)
[Nov 2 12:47:17] VERBOSE[10745][C-00000003] res_rtp_asterisk.c: Sent RTP packet to 158.58.154.97:58920 (type 09, seq 000598, ts 154568, len 000160)
[Nov 2 12:47:17] VERBOSE[10745][C-00000003] res_rtp_asterisk.c: Sent RTP packet to 158.58.154.97:58920 (type 09, seq 000599, ts 155848, len 000160)
[Nov 2 12:47:17] VERBOSE[10745][C-00000003] res_rtp_asterisk.c: Sent RTP packet to 158.58.154.97:58920 (type 09, seq 000600, ts 157128, len 000160)

ts increases by 1280. Is that correct when len is 160?

@traud
Copy link
Owner

traud commented Nov 2, 2016

ts increases by 1280 but each frame only contains 160ms of data?

Yes, that is how DTX works. It is the job of the receiving user agent to conceal that. What you hear in your recoding is a missing concealment. When you enable Generic PLC in Asterisk, that should go away for your recording. Please, give that a try. When you use another user agent, that should go away, too.

RFC 3389 does not cover AMR(-WB) because these codecs send Comfort Noise via SID frames. The question is whether your Ericsson MSC describes the noise correctly or uses SID just to keep the RTP stream alive.

Here, the transcoding module cannot create any packets on its own because those packets would arrive 140 ms too late at your receiver. I could discard the SID decoding but then Asterisk might not create frames for a too long time, resulting in a closure of the used port, because of a NAT or Firewall.

@mtryfoss
Copy link
Contributor Author

mtryfoss commented Nov 2, 2016

Generic PLC is in fact on.

I've tested to just throw away the SID-frames and that results in complete silence. I do not think the stream will be dropped because of the rtp keepalive feature in Asterisk. I consider this as a possible workaround if I do not solve the problem before we need this functionality in production.

I will continue debugging this and let you know what I find.

@traud
Copy link
Owner

traud commented Nov 2, 2016

  1. Which RTP keep-alive are you about?
  2. Did you try without Generic PLC? Perhaps there is a bug in Asterisk PLC. Furthermore, I am not sure whether Generic PLC is done before recording. Please, double-check. By the way, how do you record?
  3. Complete silence is not the correct approach because your customers are going to suffer from that. Listening at your sample, I would prefer that above complete silence.

attr is null

I recommend to debug this first. This is a severe issue and could lead to the wildest issues on the long run. I tried a lot over the weekend, but was not able to reproduce that. Therefore, you have to create at least a step-by-step scenario, so I am able to reproduce that. Or debug this on your own, printing the pointer addresses of each ast_format whenever one is copied.

@mtryfoss
Copy link
Contributor Author

mtryfoss commented Nov 2, 2016

  1. rtpkeepalive=X in sip.conf could be used.
  2. I will try without. I'm doing recording using "mixmonitor start sip/" on the console.
  3. If that's the only noise, yes. But at some interval the noise volume rises a lot, before it drops again. This could be some issue with the client. I've not looked into this yet.

I think this happens for every call here. At least when the the caller is not using AMR. Will test some more and let you know.

@mtryfoss
Copy link
Contributor Author

mtryfoss commented Nov 2, 2016

FYI: Still the same issue with Generic PLC disabled.

@traud
Copy link
Owner

traud commented Nov 3, 2016

rtpkeepalive=X in sip.conf

Did you try without? I am not used to that feature, so I have not tested it myself and whether it interacts correctly.

at some interval the noise volume rises a lot, before it drops again.

Still, with that OpenCORE patch … mhm. I hoped at least that this is solved. Are you able to create a sound file with such a noise (with SID enabled but RTP-Keepalive and Generic PLC disabled).

@mtryfoss
Copy link
Contributor Author

mtryfoss commented Nov 3, 2016

I'm currently running without rtpkeealive on, but it could be a workaround if RTP is stopped for a long period. Basically I think it sends a dummy CN at interval when nothing else is transmitted.

I'm a little confused right now. I had this noise volume rise yesterday, but I can't reproduce it today. I wonder if I messed up at some point yesterday and ran Asterisk against the old (unpatched) Opencore lib. The original was installed from a package, so I had to symlink a little bit.

Will let you know if I manage to do it again.

I got a little busy here on other things, but I will continue debugging the null value of attr.

Currently I dont't quite understand how this should work. I do receive a constant CMR on the incoming RTP stream (from our MSC). This should update the current_mode for the outgoing stream as well (to complete a mode change)? I dont't understand the code 100% here.

What's the intended process for doing this (do the in- and outbound stream "see" each other)? A little explanation would help me a lot debugging this.

@traud
Copy link
Owner

traud commented Nov 3, 2016

I had to symlink a little bit.

I am on Ubuntu and created the package from scratch. Then, I installed it which essentially replaced it. Alternatively, you could add syslog message into OpenCORE. That way, you know for sure that your patched library was loaded.

I do receive a constant CMR on the incoming RTP stream (from our MSC). This should update the current_mode for the outgoing stream as well (to complete a mode change)? I dont't understand the code 100% here.

attr should never be NULL, that is the main point. If it is NULL, the wrong ast_format object was given. That is the cause. The question remains, why a wrong ast_format is flying around. The same ast_format is required for both legs of the call. That way, the remote party is able to request a change in mode (CMR). You do see such changes, when the remote party moves from LTE to GSM for example. Another scenario is a half-rate channel in GSM. Asterisk has to obey and start sending in the requested mode then. If attr is NULL, this CMR on the incoming call leg cannot be signaled to the outgoing call lag. Furthermore, if attr is NULL, you cannot change the defaults for a call originating at Asterisk, see amr_clone in res/res_format_attr_amr.c.

@mtryfoss
Copy link
Contributor Author

mtryfoss commented Nov 3, 2016

Ok. I'm going to try a few scenarios and add some more debugging to see what's happening.

@mtryfoss
Copy link
Contributor Author

mtryfoss commented Nov 3, 2016

And regarding the package. I've already built a custom package for CentOS, but I like to have the possibility to add debugging easily in that library as well.

@mtryfoss
Copy link
Contributor Author

mtryfoss commented Nov 3, 2016

Exactly where in the code is attr meant to be filled with data when a new encoder/decoder is created?

My scenario:
Call from Cisco SPA504 (G722) to Asterisk (AMR-WB) to MSC

What I find:
attr is null in function lintoamr_new.
attr1 is null in function amr_getjoint (fallback to default values - "attr1 = &default_amr_attr;").
attr is null in function lintoamr_frameout.
attr is NOT null i function amrtolin_framein.

amr_clone is called two times. The first time with original false/null. The second it contains data.

@mtryfoss
Copy link
Contributor Author

mtryfoss commented Nov 4, 2016

Exactly what is this (at Asterisk startup)?

Created encoder (16000 -> AMR) 0x1ce2280 (Format (nil))

@traud
Copy link
Owner

traud commented Nov 5, 2016

Command-line interface (CLI): core show translation

@traud
Copy link
Owner

traud commented Nov 5, 2016

A null attribute is a symptom not the cause. amr_getjoint should pick (or create) an ast_format and exactly that (same memory address) should end up both in lintoamr_frameout and amrtolin_framein. In your scenario, somehow, the wrong ast_format (the cached ast_format or a copy of that) is send to lintoamr_new.

@mtryfoss
Copy link
Contributor Author

mtryfoss commented Nov 14, 2016

I think there is something fundamental wrong here.

I'm really struggling, but It seems like this only will work for inbound calls to Asterisk using AMR (where Asterisk got some AMR-stuff to parse or similar). When reading default values something goes wrong.

Do you got a working scenario with a call coming from a G.711/G.722 phone to Asterisk and then being transcoded to AMR/AMR-WB? The other way it seems to work for me.

@traud
Copy link
Owner

traud commented Nov 15, 2016

Yes, both directions work just fine for me. And in my case, it does not matter who starts the call. It does not matter which audio codec or framing is involved on the other leg (EVS, Opus, G.722, G.711, iLBC, G.729, AMR,…). The user accounts are ‘friends’ in sip.conf. In my Dialplan extensions.conf, I do not do anything special except exten => alex,1,Dial(SIP/alex).

I am doing the support for free, here. To proceed with this, I recommend to to go through the above guideline how to write a bug report. Then, please, partner up with a colleague to go over your report before you post it – or even better hire a qualified Computer Scientist who is experienced in writing bug reports. I bet you found a bug in my code, that for sure. However, to help you, I have to reproduce your scenario here in my Asterisk somehow. For that, I need a qualified bug report. For example, how do you call/dial your Ericsson in your sip.conf or extensions.conf – you stated neither Asterisk nor Ericsson use SIP-REGISTER – both are connected just via SIP-INVITE.

If attr is NULL in lintoamr_frameout while in a call, you should not use this codec module in your scenario until this is debugged. Again, I did not face that issue, yet. AMR(-WB) requires a CMR to be forwarded from the incoming leg to the outgoing leg of the very same call. That is broken in your scenario, because of a software bug somewhere. Normally, in a handover scenario (UMTS to GSM, or VoLTE to GSM, or GSM Full-Rate to GSM Half-Rate) a (different) CMR is sent by your Ericsson MSC. That must be honored by Asterisk otherwise you face congestion in the audio which Asterisk creates and sends back.

@mtryfoss
Copy link
Contributor Author

I'm totally aware you're doing this for free, and I really appreciate your help so far.

However, I think I found a recipe how to reproduce:

If AMR or AMR-WB is the only (or preferred) codec for the outbound peer, it fails. When I added for example opus (or some other codec the MSC will not support) before AMR, amr_getjoint is called an extra time after attributes are set for one of the formats and both the encoder/decoder gets the same attrs.

This fails when amr is the codec going to be used:
[msc]
disallow=all
allow=amr
allow=amrwb

It would probably be OK if the MSC preferred amrwb over amr.

This works:
[msc]
disallow=all
allow=opus (not supported by the MSC)
allow=amr
allow=amrwb

I don't know how to proceed and what's the correct place to fix the issue so this workaround is not required?

@hssamra
Copy link

hssamra commented Nov 29, 2016

Hi there...does the Opencore-AMR logic for SID_FIRST vs SID_UPDATE checking work for regular AMR? I've been testing w/ AMR (not AMR-WB) recently and see packets every 160ms marked as SID_FIRST.

@mtryfoss
Copy link
Contributor Author

That's a good question. I've not been able to do proper testing on every scenario yet.

Are you using octet aligned or bandwidth efficient mode?

@hssamra
Copy link

hssamra commented Nov 29, 2016 via email

@traud
Copy link
Owner

traud commented Dec 5, 2016

@hssamra, thank you for reporting an issue. Small or big, glitch or design issue, it is always good to report. However, I am not able to reproduce your issue, yet. In my case, Asterisk sends SID_FIRST and SID_UPDATE not only for AMR-WB but AMR. Similar for receiving SID: The rx_type is set correctly in opencore/codecs_v2/audio/gsm_amr/amr_nb/dec/src/amrdecode.cpp. Furthermore, this report here was about AMR-WB. You are about AMR. Therefore, please, create a new issue report and describe in detail where you see that issue (in the OpenCORE library, in my transcoding module, …). And do not forget go for a qualified bug report like described by this tutorial. That saves me a lot of time, making it possible to help even more people.

Morten, I am able to reproduce that missing attr issue thanks to your description. Normally, I do not use (dis)allow in a peer section but just in the general section. Therefore, I did not notice that software bug before. That is a bug in Asterisk. However, I have not analyzed its cause, yet. Until this is resolved, instead of Opus, you could go for my new EVS module. By the way, are you sure you want to favor narrow-band above wide-band? I would go for allow=!all,evs,amrwb,amr,gsm,alaw,ulaw (see 3GPP TS 26.101 chapter 7).

@mtryfoss
Copy link
Contributor Author

mtryfoss commented Dec 5, 2016

Thanks for you reply. You're sure this is an Asterisk-bug and not the module? That information is very helpful :-)

I'll try to analyze more myself, but for now we currently continue testing with the workaround (another codec as the first).

Of course AMR-WB should have priority - I was just using these values during testing.

@traud
Copy link
Owner

traud commented Dec 5, 2016

Yes, it is an software bug in Asterisk, because lintoamr_new is called before amr_getjoint. Both should be called the other way around. Because of this, the transcoding module does not have its own ast_format for this call. Instead, the cached ast_format was given for which you cannot change/read its attr because that pointer is shared between other calls.

@mtryfoss
Copy link
Contributor Author

mtryfoss commented Dec 5, 2016

Aha. Then I assume this applies for Opus as well, if any information between the encoder and decoder is shared there.

@traud
Copy link
Owner

traud commented Dec 5, 2016

Well, it affects just AMR(-WB) and EVS because only with AMR the encoder and decoder exchange information in-band. With Opus (and it predecessor SILK), RTCP is used to exchange information.

@hssamra
Copy link

hssamra commented Dec 5, 2016 via email

@mtryfoss
Copy link
Contributor Author

Just a quick update on this.

Morten, I am able to reproduce that missing attr issue thanks to your description. Normally, I do not use (dis)allow in a peer section but just in the general section. Therefore, I did not notice that software bug before. That is a bug in Asterisk. However, I have not analyzed its cause, yet. Until this is resolved, instead of Opus, you could go for my new EVS module. By the way, are you sure you want to favor narrow-band above wide-band? I would go for allow=!all,evs,amrwb,amr,gsm,alaw,ulaw (see 3GPP TS 26.101 chapter 7).

I've been playing around with this again lately. My Asterisk knowledge at this point is not sufficient to debug any further.

I stated a workaround in a earlier post by adding another codec as the preferred. It will also work to add all codecs supported on the inbound peer to the outbound peer, so Asterisk will try using the codec stated in "chan_sip.c: ** Our prefcodec: (g722)" first and then switch to AMR-WB if the remote side wants that.

I guess this is the reason you never hit the bug when allow/disallowing globally in sip.conf

@traud
Copy link
Owner

traud commented Feb 15, 2017

Yes, I am able to reproduce your issue. In the meantime, I was able to debug that issue a bit further. Asterisk generates the SDP lines before it negotiates a joint format actually. Because of this, not a new+joint but the cached format is passed to the codec module, which cannot be used to picky back the CMR or the fmtp. This is severe bug in Asterisk and affects all formats with fmtp lines, like Opus Codec and iLBC 20 as well. However, I was not able to find a fix – except your workaround of course.

In such a case, you have to create a qualified bug report and raise an issue with the Asterisk team. However, in the world of Asterisk, software bugs without a fix, are ignored until someone adds a patch. Therefore, after creating that issue report, you have to offer a bug bounty on the Asterisk Developers Mailing List. Although all this might be easier when done by myself, it still takes time. Therefore, please, do it yourself.

@mtryfoss
Copy link
Contributor Author

mtryfoss commented Nov 15, 2017

Bringing up this old case again. I think I found (at least) a workaround for this issue, and why the encoder is created before a joint exists.

Take a look at this code in app_dial.c:

if (ast_channel_make_compatible(in, outgoing->chan) < 0) {
	/* If these channels can not be made compatible,
	 * there is no point in continuing.  The bridge
	 * will just fail if it gets that far.
	 */
	*to = -1;
	strcpy(pa->status, "CONGESTION");
	ast_channel_publish_dial(in, outgoing->chan, NULL, pa->status);
	return NULL;
}

Just commenting out this whole block seems to work. 'ast_set_write_format' which changes the format and creates the encoder is then called from channel.c instead, when the first packet arrives:

if (ast_format_cmp(ast_channel_writeformat(chan), fr->subclass.format) != AST_FORMAT_CMP_EQUAL) {
	struct ast_str *codec_buf = ast_str_alloca(AST_FORMAT_CAP_NAMES_LEN);

	/*
	 * We are not setup to write this frame.  Things may have changed
	 * on the peer side of the world and we try to adjust the format to
	 * make it compatible again.  However, bad things can happen if we
	 * cannot setup a new translation path.  Problems range from no
	 * audio, one-way audio, to garbled audio.  The best we can do is
	 * request the call to hangup since we could not make it compatible.
	 *
	 * Being continuously spammed by this message likely indicates a
	 * problem with the peer because it cannot make up its mind about
	 * which format to use.
	 */
	ast_debug(1, "Channel %s changing write format from %s to %s, native formats %s\n",
		ast_channel_name(chan),
		ast_format_get_name(ast_channel_writeformat(chan)),
		ast_format_get_name(fr->subclass.format),
		ast_format_cap_get_names(ast_channel_nativeformats(chan), &codec_buf));
	ast_log(LOG_WARNING, "ast_set_write_format in AST_FRAME_VOICE\n");
	if (ast_set_write_format(chan, fr->subclass.format)) {
		/* Could not handle the new write format.  Induce a hangup. */
		break;
	}
}

I can't see any reason why Asterisk needs to make the channels compatible so early. We don't know anything about what the called party will respond with yet?

My other workaround was to offer another codec than AMR-WB at first, so the same routine is called after the callee answered with AMR-WB as first priority.

@traud traud changed the title Noise on transcoding AMR-WB Section Peer with allow=!all,amr creates an Encoder with Format (nil) Aug 31, 2020
@traud
Copy link
Owner

traud commented Aug 31, 2020

Changed the topic, because that is an outstanding issue in Asterisk affecting this Asterisk module.

For passive readers: If you face noise or a clicking sound with AMR-WB (or AMR), please, create a new issue report. Then, we look into that separately. In any case, check whether you use the latest OpenCORE AMR library. Its current version is 0.1.5. That version contains the patch above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants