iSpeech SDK for iOS

Introduction

This document will show you the differences between the new SDK and the old one, as well as help you migrate your code to use the new SDK. It will also explain new implementation details about the SDK to help you integrate it into your application better.

Changes in the New SDK

This SDK has been rewritten from the ground up to be more memory efficient, more developer friendly, as well as more bug-free. As a result, pretty much everything has been changed in some way, shape, or form.

The first thing you’ll probably notice is an increased number of header files, a result of breaking up the SDK into multiple chunks. In older versions, you’d have the ISpeechSDK class that handled everything: configuration, speech recognition, and speech synthesis. In this new SDK, they’ve been broken up into chunks to make management easier. The main iSpeechSDK class is now used just for configuration, with speech synthesis and speech recognition being moved off into ISSpeechSynthesis and ISSpeechRecognition, respectively. Also note that the main class has been renamed from ISpeechSDK to iSpeechSDK (note the capitalization).

This new SDK also has improved performance, a result of splitting up the work onto multiple threads and using Grand Central Dispatch for certain bits of work. Now, instead of doing all the audio work on whichever thread the action was initiated on, a separate thread will be spawned to handle the task that needs to be performed. The new SDK also takes advantage of Automatic Reference Counting to help with memory management bugs, as well as increase overall performance.

There are also changes relating to the audio session in this new SDK: the audio session is only activated when needed, as opposed to the old SDK, which activated the audio session right when the SDK was initialized. This means that any existing background audio will continue to play when you initialize the SDK. Only when you ask the SDK to either speak something or recognize speech will the background audio get shut down. Then, once the SDK is done with the audio session, it deactivates it, allowing any background audio to start playing again. This is a major user experience change that should please a lot of users. If you don’t want the SDK to deactivate the audio session because you’re using it for your own audio, you can change this behavior by setting shouldDeactivateAudioSessionWhenFinished to NO on iSpeechSDK.

We’ve added something called the Configuration API. Several methods on both ISSpeechSynthesis and ISSpeechRecognition are marked with CONFIGURATION_METHOD. This means that you can call this method on [[iSpeechSDK sharedSDK] configuration] and that value will be the new default for all objects created after that point. This makes it easy to set a default voice to be used for all speech synthesis, and a default locale to be used for all speech recognition, while also being able to change it for one instance.

Other minor changes include new audio prompts when performing speech recognition, having the SDK vibrate when it plays those prompts, and being able to enable extra audio prompts for when speech recognition is successful or when it fails.

Code Compatibility

We realize that it’s not easy for developers to change out all their code that uses the old SDK to have it use the new SDK. That’s why we included a compatibility class for you to use to get the new SDK while still using the old SDK’s APIs. We re-implemented the old ISpeechSDK class and basically just make it a wrapper for all the new SDK APIs. Make note that this should not be used long term, and is only intended as a method of transitioning to using the new SDK, allowing you to release bug fix updates and use the new SDK while using the old SDK’s APIs. If you have a major release in the works, you should use that chance to fully upgrade to the new SDK.

Requirements

Due to the SDK using Grand Central Dispatch and blocks internally, as well as Automatic Reference Counting, and needing to support armv7s for the iPhone 5, it now needs a minimum system requirement of iOS 4.3. We’ve also cut down on the number of frameworks you’re required to link against. Now you only need to link against AudioToolbox, SystemConfiguration, Security, and CFNetwork.

Getting Started

Actually integrating the new SDK into your application is very similar to the way that you’d integrate with the old SDK.

  1. Copy libiSpeechSDK.a into your Frameworks group in your Xcode project.
  2. Copy the headers into your Xcode project.
  3. If you want to use the compatibility class, copy that into your Xcode project as well.
  4. Copy iSpeechSDK.bundle into your Supporting Files group in your Xcode project.
  5. Link against AudioToolbox.framework, SystemConfiguration.framework, Security.framework, and CFNetwork.framework.
  6. Add -ObjC to “Other Linker Flags” in your Build Settings.
    1. If you’re not using ARC in your project, add -fobjc-arc to “Other Linker Flags” in your Build Settings.
  7. Build and Go.

For actually using the new SDK, take a look at some of the code samples below.

Code Samples

Setting up the SDK

#import "iSpeechSDK.h"
#Import "iSpeechSDKSampleAppDelegate.h"

@implementation iSpeechSDKSampleAppDelegate

- (BOOL)application:(UIApplication *)application didFinishLaunchingWithOptions:(NSDictionary *)launchOptions {
    // Your other setup here

    iSpeechSDK *sdk = [iSpeechSDK sharedSDK];
    sdk.APIKey = @"YOUR_API_KEY_HERE";

    // Other possible SDK configuration.

    return YES;
}

// ---
// Your other methods here.
// ---

@end

Speech Synthesis

#import "ISSpeechSynthesis.h"

- (void)speak:(NSString *)text {
    ISSpeechSynthesis *synthesis = [[ISSpeechSynthesis alloc] initWithText:text];
    [synthesis setDelegate:self];

    NSError *error;

    if(![synthesis speak:&error]) {
        [self doSomethingWith:error];
    }
}

Speech Recognition

#import "ISSpeechRecognition.h"

- (void)recognize {
    ISSpeechRecognition *recognition = [[ISSpeechRecognition alloc] init];
    [recognition setDelegate:self];

    NSError *error;

    if(![recognition listenAndRecognizeWithTimeout:10 error:&error]) {
        [self doSomethingWith:error];
    }
}

- (void)recognition:(ISSpeechRecognition *)speechRecognition didGetRecognitionResult:(ISSpeechRecognitionResult *)result {
    [self doSomethingWith:result];
}

Speech Recognition With Commands

#import "ISSpeechRecognition.h"

- (void)recognize {
    ISSpeechRecognition *recognition = [[ISSpeechRecognition alloc] init];
    [recognition setDelegate:self];

    [recognition addAlias:@"people" forItems:[NSArray arrayWithObjects:
        @"John Smith",
        @"Bilbo Baggins",
        @"River Song",
        @"Han Solo",
        nil
    ]];
    [recognition addCommand:@"Call %people%"];

    NSError *error;

    if(![recognition listenAndRecognizeWithTimeout:10 error:&error]) {
        [self doSomethingWith:error];
    }
}

- (void)recognition:(ISSpeechRecognition *)speechRecognition didGetRecognitionResult:(ISSpeechRecognitionResult *)result {
    [self doSomethingWith:result];
}