Microsoft.Speech and System.Speech
Posted: September 6th, 2012 | Author: Michael | Filed under: Lync Development | Tags: SAPI, speech | 1 Comment »One thing that can be slightly confusing about the speech synthesis and recognition features in UCMA is that there are two completely separate but very similar namespaces with speech-related classes. The Microsoft.Speech and System.Speech namespaces share many of the same class names (e.g., System.Speech.Synthesis.SpeechSynthesizer vs. Microsoft.Speech.Synthesis.SpeechSynthesizer), but belong to two different speech APIs, and function differently. I’ve run into a couple of mix-ups with these two namespaces, and I’m sure that others have too, so I thought I’d write up a quick explanation of what’s going on here.
The short version of the story is that the System.Speech classes and the Microsoft.Speech classes serve different purposes. Both are based on the Speech Application Programming Interface, a.k.a. SAPI. System.Speech is for desktop applications. Microsoft.Speech, on the other hand, belongs to the Microsoft Speech Platform Server SDK, a flavour of SAPI designed for server applications. They work with separate sets of speech synthesis voices, and have some differences in capabilities. For instance, the desktop flavour (System.Speech) can adapt to the voice of a specific user, which makes sense for desktop applications where the user will generally be the same. It also implements a dictation engine (for recognizing arbitrary words spoken by the user, rather than a specific set of words or phrases) which the server version does not.
The two namespaces are so similar that if you switch your using statements from System.Speech to Microsoft.Speech or vice versa, most of your speech code will compile just fine with no changes. This can lead to some very confusing situations.
What often happens is this: you start writing speech code that uses a class like SpeechSynthesizer, and let Visual Studio automatically add the using statement when it helpfully offers to do so. Now, since System.Speech is one of the standard libraries available by default in .NET 3.5, while Microsoft.Speech requires a reference to be added to a DLL,Visual Studio of course adds a using statement for System.Speech. Your UCMA app will happily compile and all of the speech functionality will seem to work with System.Speech. But the voices that you’ve installed as part of the Speech Platform Server SDK won’t be available, and instead you’ll get the default System.Speech voice, which doesn’t sound very good on phone calls.
When this has happened to me, I’ve been completely baffled as to why the voices I’ve installed aren’t available to my UCMA application. If you run into this problem, and can’t seem to use your installed TTS voices in UCMA, or the quality of the synthesized speech on calls seems poor, check to make sure you are using the Microsoft.Speech namespace and not System.Speech! It’s a very easy mistake to make.
The real pain is that you can’t find voices of quality for the speech server version (SAPI 5.2). Voices provided by Microsoft are barely usable in real application. Compared to what you can found for desktop version, it’s seems they stop improving the server TTS engine since the 80’s.