The latest cumulative update for Lync was recently released, and one of the fixes deals with an interesting problem to do with audio conferences. I thought I would take the opportunity to discuss a few things about how media works in a Lync audio conference, and the implications of this change in the update.
The fix that I’m referring to (details here) resolves a problem where one-way audio could occur between a client and an audio/video MCU. For background, the MCU, or multipoint control unit, is the Lync component that mixes media for the conference and distributes it to the participants. What was happening in certain cases was that an audio conference participant would be audible in the conference, but could not hear other participants.
It turns out that the reason for this was an issue with media encryption in the MCU. SIP-based communications platforms generally use a protocol called RTP (Real-time Transport Protocol) to move media between endpoints. For security, this can be encrypted, in which case it is called SRTP (the S is for “Secure”).
To save resources, the audio MCU in Lync uses a special form of SRTP, called Scale-SRTP. Essentially, instead of creating a separate encryption key for the media session it has with each conference participant, it creates a single encryption key for all participants. This way, instead of encrypting the outbound media once for every participant, it can encrypt it only once. It can then send the same encrypted media stream to all participants, since they all have the same key needed to decrypt it.
The participants, however, send back ordinary SRTP to the conference; their individual media streams are different from one another anyway.
Now, it’s not guaranteed that an endpoint will always support this special form of SRTP, and so the decision of whether or not to use it for a given media session is determined during media negotiation, when the MCU is first establishing the audio call with a conference participant. In order for Scale-SRTP to be used, one of the parties in the call (the MCU) has to take the “server” role and the other has to take the “client” role. The “server” party sends media using Scale-SRTP, while the client receives the Scale-SRTP and sends back ordinary SRTP. With me so far?
If you look at the message body of the SIP INVITE and the 200 OK response, you can find the a=cryptoscale attribute in the Session Description Protocol (SDP) content; this a=cryptoscale is the attribute that signals support for Scale-SRTP. The MCU’s a=cryptoscale attribute will say “server” while the client’s will say “client.” In both cases some encryption details will follow.
The a=crypto attribute, also in there, is for regular (non-Scale-SRTP) encryption.
So let’s say the MCU dials out to a client, which happens not to support receiving Scale-SRTP. The INVITE from the MCU will include both an a=crypto attribute and an a=cryptoscale attribute. The 200 OK from the client will lack an a=cryptoscale attribute, and so regular SRTP will be used for media.
So far so good. However, it turns out that, even after agreeing to use ordinary SRTP on a particular call, the MCU was being sulky and encrypting that media using the encryption key it had generated for Scale-SRTP. Oops! So the client, which had the non-Scale encryption key, would be unable to decrypt the media. Voilà, one way audio.
In this update, the issue I just described has been fixed, so the MCU works with either Scale-SRTP or regular SRTP. Now, this has a couple of noteworthy implications. First, it means that non-Lync endpoints that support SRTP should now be able to exchange media with a Lync A/V MCU. (Scale-SRTP is specific to Lync.) If you, for example, use a back-to-back call to connect a non-Lync endpoint to a conference, you should get audio flowing in both directions.
Second, in theory it should now be possible to connect two Lync audio conferences to each other! Two A/V MCUs can’t use Scale-SRTP to communicate, because both will try to take the “server” role, and send the Scale-SRTP media, while neither will support receiving it. Previously, the two MCUs would fall back on regular SRTP, only to run into the encryption issue I described above. With this fixed, it should theoretically now be possible to use a back-to-back call to “chain” two A/V MCUs together and have them pass media back and forth.
I should be clear that, according to what I understand, chaining together two MCUs isn’t an officially supported scenario, and as I’m traveling right now I unfortunately don’t have a couple MCUs handy to test it out. However, in principle it is a possible setup, and in certain cases might be just the thing for the application you are trying to build.
If there’s interest, I may put together a demo application that shows this kind of conference chaining in a future post.
Best wishes to readers in the US for a happy Thanksgiving!