JavaScript must be enabled to use this site. Please enable JavaScript in your browser and refresh the page.

iSpeech API Specification Version 2.1

This document provides a detailed description of the iSpeech API and examples for almost every possible use case. It demonstrates how to dynamically retrieve API key properties such as voice lists, credits, and enabled key options. Wireshark and RESTClient are recommended for viewing network transactions, creating requests, and debugging code.


Last Updated: December 12, 2011.



iSpeech Developer Support

http://www.ispeech.org/developers

iSpeech Inc. Version 2.1

iSpeech Inc. (“iSpeech”) has made efforts to ensure the accuracy and

completeness of the information in this document. However, iSpeech Inc. disclaims all

representations, warranties and conditions, whether express or implied, arising by

statute, operation of law, usage of trade, course of dealing or otherwise, with respect to

the information contained herein. iSpeech Inc. assumes no liability to any party for any

loss or damage, whether direct, indirect, incidental, consequential, special or exemplary,

with respect to (a) the information; and/or (b) the evaluation, application or use of any

product or service described herein.

iSpeech Inc. disclaims any and all representation that its products or services infringe

upon any existing or future intellectual property rights. iSpeech Inc. owns and retains all

right, title and interest in and to the iSpeech Inc. intellectual property, including without

limitation, its patents, marks, copyrights and technology associated with the

iSpeech Inc. services. No title or ownership of any of the foregoing is granted or

otherwise transferred hereunder. iSpeech Inc. reserves the right to make changes to

any information herein without further notice.

Revision History

PUBLISH DATE

UPDATES

Aug  8, 2011

Document created

Sept 13, 2011

Added ASR

Sept 21, 2011

Added AMR to ASR

Nov 9, 2011

Added voice command examples

Nov 10, 2011

Removed references to ASR Raw POST

Nov 11, 2011

Made output variable explicit for ASR and voice list examples

Nov 17, 2011

Specified HTTP POST/GET instead of REST, fixed /r/n typos

Nov 22, 2011

Added reference for Speex in ASR content-type example

Dec 12, 2011

Added endpadding and startpadding variables

Dec 12, 2011

Added TTS examples, added background highlighting to emphasize examples


Table of Contents

Revision History

Section 1 Introduction .....................................................................................

   

   Minimum Requirements

      Internet Connection

      HTTP Protocol

      API Key

   Managing API Key Settings

      View/Edit Keys

   API Features

      Text to Speech

      Automated Speech Recognition

   Developer Support

      Sales

      Support/Troubleshooting

   Software Development Kits

       Availability

       API Access Pricing

Section 2 Text to Speech ................................................................................. 

   Transaction Types and URL Formats

      HTP GET/POST: URL Encoded, XML, JSON

   Request Parameters

   Voices

      HTTP GET Example: URL Encoded  → Binary Audio

   Voice List Retrieval
     
HTTP GET: URL Encoded, XML, JSON

   Speed

      HTTP GET Example: URL Encoded  → Binary Audio

   Formats

      HTTP GET Example: URL Encoded  → Binary Audio

   Bitrates

      HTTP GET Example: URL Encoded  → Binary Audio

   Padding

      HTTP GET Example: URL Encoded  → Binary Audio

   Example Transactions

      Summary

      HTTP POST: URL Encoded → Binary Audio

      HTTP POST: JSON → Binary Audio
     HTTP
POST: XML → Binary Audio

      HTTP GET: URL Encoded → URL Encoded (this transaction purposely contains an error)

   Error Codes for Text-to-Speech and General Errors


S
ection 3 Automated Speech Recognition .....................................................

   

   Transaction Types and URL Formats  

      HTTP GET/POST: URL Encoded, XML, JSON

   Request Parameters  

   Languages

      Standard Languages

      Custom Languages

   Speech Recognition Models

      Standard

      Custom

   Example Transactions for Freeform Speech

      Format of Examples

      HTTP POST: URL Encoded → URL Encoded

      HTTP POST: JSON JSON

      HTTP POST: XML → XML

   Command Lists

      About

      Example Transactions for Command Lists

         Formatting of Examples

         HTTP POST: XML → XML        

         HTTP POST: URL Encoded → URL Encoded

         HTTP POST: JSON → JSON

         HTTP POST: XML XML -- Detecting multiple audio commands from multiple command lists

   Error Codes for Speech Recognition and General Errors

Section 4 Translation ......................................................................................

  Summary

      iSpeech Translation API

Section 1

Introduction

Welcome to the iSpeech Inc. Application Programming Interface (API) Developer Guide.  This guide describes the available variables, commands, and interfaces that make up the iSpeech API.

The iSpeech API allows developers to implement Text-To-Speech (TTS) and Automated Voice Recognition (ASR) in any Internet-enabled application.

The API's are platform agnostic which means any device that can record or play audio connected to the Internet can use the iSpeech API.

Minimum Requirements

Below are the minimum requirements needed to use the iSpeech API.  The API can be use with and without a SDK.

Internet connection

iSpeech services require a connection to the internet.

HTTP Protocol

The iSpeech API follows the HTTP standard by using GET and POST.  Some web browsers limit the length of GET requests to a few thousand characters.

Request/Responses

Requests can be in URL encoded, JSON, or XML data formats.  You can specify the output data format of responses.  For TTS, binary data is is usually returned if the request is successful.  For speech recognition, URL encoded text, JSON, or XML can be returned by setting the output variable.

API Key

An API key is a password that is required for access.  To obtain an API key please visit: http://www.ispeech.org/developers and register for a developer account.


Managing API Key Settings

View/Edit Keys

Manage your API keys by using the iSpeech developer website: http://www.ispeech.org/developers.  You can request additional features for your API keys on that website.

API Features

Text to Speech

You can play audio through iSpeech TTS in a variety of voices, formats, bitrates, frequencies, and playback speeds.

Automated Speech Recognition

You can convert audio from a variety of languages and recognition models.  We can create custom recognition models to improve recognition quality.

Developer Support

Sales

 

iSpeech sales can be contacted at the following phone number: +1-917-338-7723 from 10 AM to 6 PM Eastern Time, Monday to Friday.  You can also email us at sales@ispeech.org.

Support / Troubleshooting

Please look for the answer to your problem in the iSpeech Developer Forum: http://www.ispeech.org/forums/

Software Development Kits

iSpeech SDKs simplify the iSpeech API.  You should use iSpeech SDKs if the option is available.  Only mobile SDKs made by iSpeech allow you to use the iSpeech API for free.

Availability

iPhone, Android, BlackBerry, .NET, Java (Server), Java applet (Client) [coming soon],  PHP, Javascript/Flash [coming soon]

API Access Pricing

PLATFORMS

PRICE

iPhone, Android, BlackBerry

Free using iSpeech SDK

.NET, Java, PHP

Between $.05 and 0.00001 per word converted or recognized depending on quantity

Section 2

Text to Speech

The iSpeech Text-To-Speech API allows you to create high quality spoken audio in multiple formats. The iSpeech API doesn’t use callbacks.  It’s synchronous and fast. This means you'll always receive audio data or an error message in the same HTTP request.

Transaction Types and URL Formats

TRANSACTION TYPE

INPUT FORMAT

URL

HTTP GET/POST

URL Encoded

http://api.ispeech.org/api/rest

HTTP GET/POST

XML

http://api.ispeech.org/api/xml

HTTP GET/POST

JSON

http://api.ispeech.org/api/json

Request Parameters

PARAMETER

DATA TYPE

EXAMPLE VALUE

Apikey

32 character hex number

abcdef1234567890abcdef1234567890

Action

String

convert, information

Text

String

Hello World

Voice (optional)

String

usenglishfemale

Format (optional)

String

mp3

Frequency (optional)

String

16000

Bitrate (optional)

String

64

Speed (optional)

Integer

-10 to 10

Startpadding (optional)

Integer (seconds)

5

Endpadding (optional)

Integer (seconds)

5

Example HTTP GET Request (Using every possible variable)

http://api.ispeech.org/api/rest?apikey=developerdemokeydeveloperdemokey&action=convert&text=something&voice=usenglishfemale
&format=mp3&frequency=44100&bitrate=128&speed=1&startpadding=1&endpadding=1

Voices

Standard Voices

Name

Alias

US English Female (default)

usenglishfemale

US English Male

usenglishfemale

UK English Female

ukenglishfemale

UK English Male

ukenglishmale

Australian English Female

auenglishfemale

US Spanish Female

usspanishfemale

US Spanish Male

usspanishmale

Chinese Female

chchinesefemale

Chinese Male

chchinesemale

Hong Kong Cantonese Female

hkchinesefemale

Taiwan Chinese Female

twchinesefemale

Japanese Female

jpjapanesefemale

Japanese Male

jpjapanesemale

Korean Female

krkoreanfemale

Korean Male

krkoreanmale

Canadian English Female

caenglishfemale

Hungarian Female

huhungarianfemale

Brazilian Portuguese Female

brportuguesefemale

European Portuguese Female

eurportuguesefemale

European Portuguese Male

eurportuguesemale

European Spanish Female

eurspanishfemale

European Spanish Male

eurspanishmale

European Catalan Female

eurcatalanfemale

European Czech Female

eurczechfemale

European Danish Female

eurdanishfemale

European Finnish Female

eurfinnishfemale

European French Female

eurfrenchfemale

European French Male

eurfrenchmale

European Norwegian Female

eurnorwegianfemale

European Dutch Female

eurdutchfemale

European Dutch Male

eurdutchmale

European Polish Female

eurpolishfemale

European Italian Female

euritalianfemale

European Italian Male

euritalianmale

European Turkish Female

eurturkishfemale

European Turkish Male

eurturkishmale

European German Female

eurgermanfemale

European German Male

eurgermanmale

Russian Female

rurussianfemale

Russian Male

rurussianmale

Swedish Female

swswedishfemale

Canadian French Female

cafrenchfemale

Canadian French Male

cafrenchmale

HTTP GET Request (Setting voice to European French Female)

http://api.ispeech.org/api/rest?apikey=developerdemokeydeveloperdemokey&action=convert&text=something&format=mp3&voice=eurfrenchfemale

Custom Voices

Custom Voices may be enabled for your account.  They can be found in the developer portal -> api key properties -> custom voices.  You can use them by setting voice to the custom alias.

Name

Alias

President Obama

obama

Custom Voice

customvoice1

Voice List Retrieval

A current list of voices that are enabled for an API key can be retrieved in REST, JSON, and XML format by using the following service.  HTTP GET and POST are supported.  A REST client can be used to make these HTTP requests.

HTTP GET Network Transaction to get XML voice list.

HTTP GET Request and XML Response

http://api.ispeech.org/api/rest?apikey=developerdemokeydeveloperdemokey&action=information&output=xml

<?xml version='1.0'?>

<data>

    <result>success</result>

    <voice-1>krkoreanfemale</voice-1>

    <voice-locale-1-1>ko-kr</voice-locale-1-1>

    <voice-locale-1-2>ko</voice-locale-1-2>

    <voice-gender-1>female</voice-gender-1>

    <voice-description-1>Korean Female Voice</voice-description-1>

    <voice-2>usenglishfemale</voice-2>

    <voice-locale-2-1>en-us</voice-locale-2-1>

    <voice-locale-2-2>en</voice-locale-2-2>

    <voice-gender-2>female</voice-gender-2>

    <voice-description-2>United States English Female Voice</voice-description-2>

    [... more voices ...]

</data>

HTTP GET Network Transaction to get JSON voice list.

HTTP GET URL Encoded Request and JSON Response

http://api.ispeech.org/api/rest?apikey=developerdemokeydeveloperdemokey&action=information&output=json

{"voice-gender-48":"female","voice-locale-22-1":"fr-ca","voice-locale-8-1":"pt-br","voice-description-2":"Finnish Female Voice","voice-description-3":"Hong Kong Chinese Male Voice","voice-58":"eurdanishfemale","voice-description-1":"Korean Female Voice","voice-description-6":"Chinese Female Voice","voice-description-7":"United Kingdom English Female Voice","voice-description-4”, [...more voices...]}

HTTP GET Network Transaction to get URL Encoded voice list.

HTTP GET URL Encoded Request and URL Encoded Response

http://api.ispeech.org/api/rest?apikey=developerdemokeydeveloperdemokey&action=information&output=rest

result=success&voice-1=krkoreanfemale2&voice-locale-1-1=ko-kr&voice-gender-1=female&voice-description-1=Korean+Female+Voice&voice-2=eurfinnishfemale&voice-locale-2-1=fi-fi&voice-gender-2=female&voice-description-2=Finnish+Female+Voice&voice-3=chchinesemale1&voice-locale-3-1=zh&voice-locale-3-2=zh-hk[...more voices...]

Speed

Most voices support speed controls.

Speed

Value (integer)

Fastest

10

Faster

Speed > 0

Normal (default)

0

Slower

Speed < 0

Slowest

-10

HTTP GET Request (Setting speed to 5)

http://api.ispeech.org/api/rest?apikey=developerdemokeydeveloperdemokey&action=convert&text=something&voice=usenglishfemale&format=mp3&speed=5

Bitrates

Note: Bitrates can only be selected for MP3s.

Valid values are 16, 24, 32, 48 (default), 56, 64, 80, 96, 112, 128, 144, 160, 192, 224, 256, or 320.

Bitrates are listed in kbps (kilobits per second).

HTTP GET Request (Setting bitrate to 16 kilobits per second)

http://api.ispeech.org/api/rest?apikey=developerdemokeydeveloperdemokey&action=convert&text=something&voice=usenglishfemale&format=mp3&bitrate=16

Formats

Name

File extension

Audio Interchange File Format

aiff

MPEG Layer 3 (default)

mp3

Ogg

ogg

Windows Media Audio

wma

Free Lossless Audio Codec

flac

Wave PCM

wav

Example HTTP GET Request (Setting format to wav)

http://api.ispeech.org/api/rest?apikey=developerdemokeydeveloperdemokey&action=convert&text=something&voice=usenglishfemale&format=wav

Frequencies

Possible values: 16000, 22000, 24000, 32000, 44100, 48000 Hertz

Example HTTP GET Request (Setting frequency to 16000 Hz)

http://api.ispeech.org/api/rest?apikey=developerdemokeydeveloperdemokey&action=convert&text=something&voice=usenglishfemale&frequency=16000

Padding

Padding adds silence to a section of the audio file.

Start Padding

Example HTTP GET Request (Setting start padding to 3 seconds)

http://api.ispeech.org/api/rest?apikey=developerdemokeydeveloperdemokey&action=convert&text=something&voice=usenglishfemale&startpadding=3

Adds a period of silence to the beginning of the audio file..

End Padding

Example

http://api.ispeech.org/api/rest?apikey=developerdemokeydeveloperdemokey&action=convert&text=something&voice=usenglishfemale&endpadding=3

Adds a period of silence to the end of the audio file.

Example Transactions

Summary

The following examples are packet captures from TCP connections that used the HTTP protocol.  You can compare your network traffic to these in order to debug code.  Wireshark can be used to analyze network connections.  A REST client can be used to make these HTTP requests.

HTTP POST URL encoded request for Text to Speech

HTTP POST Request and Reply

POST /api/rest HTTP/1.1

Content-Length: 71

Content-Type: text/plain; charset=UTF-8

Host: api.ispeech.org

Connection: Keep-Alive

apikey=developerdemokeydeveloperdemokey&action=convert&text=hello+world

HTTP/1.0 200 OK

Connection: close

Server: iSpeech Cloud/1.2

Accept-Ranges: none

X-Time-Length: 3853

X-Content-Hash: e969ef3dd0dc0e9c417f31f7ffbd10ed

Content-Length: 23760

Content-Type: audio/mpeg

Cache-Control: no-cache, no-store, must-revalidate, max-age=0, proxy-revalidate, no-transform

Pragma: no-cache

[mp3 binary audio data]

HTTP POST JSON request for Text to Speech

HTTP POST, JSON Request and Reply

POST /api/json HTTP/1.1

Content-Length: 11

Content-Type: application/json; charset=UTF-8

Host: api.ispeech.org

Connection: Keep-Alive

{"apikey":"developerdemokeydeveloperdemokey","action":"convert","text":"hello world","voice":"usenglishfemale"}

Connection: close

Server: iSpeech Cloud/1.2

Accept-Ranges: none

X-Time-Length: 3853

X-Content-Hash: e969ef3dd0dc0e9c417f31f7ffbd10ed

Content-Length: 23760

Content-Type: audio/mpeg

Cache-Control: no-cache, no-store, must-revalidate, max-age=0, proxy-revalidate, no-transform

Pragma: no-cache

[mp3 audio binary data]

HTTP POST XML request for Text to Speech

HTTP POST, XML Request and Reply

POST /api/xml HTTP/1.1

Content-Length: 150

Content-Type: application/xml; charset=UTF-8

Host: api.ispeech.org

Connection: Keep-Alive

<data>

<apikey>developerdemokeydeveloperdemokey</apikey>

<action>convert</action>

<text>hello world</text>

<voice>usenglishfemale</voice>

</data>

HTTP/1.0 200 OK

Connection: close

Server: iSpeech Cloud/1.2

Accept-Ranges: none

X-Time-Length: 3853

X-Content-Hash: 4affe15913fccd851ebf08a7e2650955

Content-Length: 23760

Content-Type: audio/mpeg

Cache-Control: no-cache, no-store, must-revalidate, max-age=0, proxy-revalidate, no-transform

Pragma: no-cache

[mp3 audio binary data]

Example of a network transaction with an error

Responses with text errors instead of audio data return “HTTP/1.0 202 Accepted”.

HTTP GET, URL Encoded Request and Reply (misspelled variable)

GET /api/rest?apikey=developerdemokeydeveloperdemokey&action=convert&

text=something&voice=usenglishfemale HTTP/1.1

Host: api.ispeech.org

Connection: keep-alive

User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.2 (KHTML, like Gecko) Chrome/15.0.874.58 Safari/535.2

Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8

Accept-Encoding: gzip,deflate,sdch

Accept-Language: en-US,en;q=0.8

Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.3

HTTP/1.0 202 Accepted

Server: iSpeech Cloud/1.2

Connection: close

Content-Length: 41

Content-Type: text/plain

Cache-Control: no-cache, no-store, must-revalidate, max-age=0,

proxy-revalidate, no-transform

Pragma: no-cache

result=error&code=8&message=Invalid+voice

Error codes for Text-to-Speech and General Errors

Code

Summary

1

Invalid API key

2

Could not convert text

3

Not enough credits

4

No action specified

5

Invalid text

6

Too many words

7

Invalid text entry

8

Invalid voice

12

Invalid file format

13

Invalid speed

14

Invalid dictionary

15

Invalid bitrate

16

Invalid frequency

30

Option not enabled for your account. Please contact iSpeech sales at +1 (917) 338-7723 or at sales@ispeech.org to modify your license.

100

This evaluation account has exceeded its trial period. Please contact iSpeech sales at +1 (917) 338-7723 or at sales@ispeech.org to upgrade your license.

101

Your key has been disabled. Please contact iSpeech sales at +1 (917) 338-7723 or at sales@ispeech.org to modify your license.

997

No api access

998

Unsupported output type

999

Invalid request

1000

Invalid Request Method POST Required

Section 3

Automated Speech Recognition

Transaction Types and URL Formats

There are currently three transaction types available for use with the iSpeech API.  All transactions must be posted to the appropriate URL:

TRANSACTION TYPE

INPUT TYPE

URL FORMAT

HTTP GET/POST

URL Encoded

http://api.ispeech.org/api/rest

HTTP GET/POST

XML

http://api.ispeech.org/api/xml

HTTP GET/POST

JSON

http://api.ispeech.org/api/json

Request Parameters

PARAMETER

VALUE

EXAMPLE

Apikey

32 character hex integer

abcdef1234567890abcdef1234567890

Locale

String

en-US (see list)

Action

String

recognize, information

Content-Type

String

audio/x-wav, audio/amr, audio/speex

Audio

String (base64, remove \r\n)

(the audio data base64 encoded)

Output

String

xml, json, rest

Locales Supported

Standard Languages

Name

Alias

Support

English (United States)

en-US

freeform & command list

English (Canada)

en-CA

freeform & command list

English (United Kingdom)

en-GB

freeform & command list

English (Australia)

en-AU

command list

Spanish (Spain)

es-ES

freeform & command list

Spanish (Mexico)

es-MX

command list

Italian (Italy)

it-IT

freeform & command list

French (France)

fr-FR

freeform & command list

French (Canada)

fr-CA

command list

Polish (Poland)

pl-PL

freeform & command list

Portuguese (Portugal)

pt-PT

freeform & command list

Catalan (Catalan)

ca-ES

command list

Chinese (Taiwan)

zh-TW

command list

Danish (Denmark)

da-DK

command list

German (Germany)

fr-FR

command list

Finnish (Finland)

it-IT

command list

Japanese (Japan)

ja-JP

command list

Korean (Korea)

ko-KR

command list

Dutch (Netherlands)

nl-NL

command list

Norwegian (Norway)

nb-NO

command list

Portuguese (Brazil)

pt-BR

command list

Russian (Russia)

ru-RU

command list

Swedish (Sweden)

sv-SE

command list

Chinese (People's Republic of China)

zh-CN

command list

Chinese (Hong Kong S.A.R.)

zh-HK

command list

Custom Languages

Contact sales@ispeech.org for details.

Speech Recognition Models

Statistical speech recognition models are used to influence the result by probability.  Models with fewer word choices are faster and more accurate than the freeform models.  For example, in the food model the words, “7 up” would be recognized as, “7up”.  Another example is with a food model would recognize the audio from “ice cream”  as “ice cream” instead of “i scream”.

Standard Models

Name

Value

Use Case

SMS

1

Text Messages

Voice mail

2

Voice Mail

Dictation

3

Normal speech

Message

4

Email

Instant Message

5

Instant Message

Transcript (coming soon)

6

Memo (coming soon)

7

Memorandum

Custom Models

Call iSpeech sales and support to inquire about custom speech recognition models.  

Example Transactions for Freeform Speech

Format of Examples

The following examples are packet captures from TCP connections that used the HTTP protocol.  You can compare your network traffic to these in order to debug code.  Wireshark can be used to analyze network connections.

HTTP REST transaction for Speech Recognition

HTTP REST Request and Response

POST /api/rest HTTP/1.1

Content-Length: 34875

Content-Type: text/plain; charset=UTF-8

Host: api.ispeech.org

Connection: Keep-Alive

apikey=developerdemokeydeveloperdemokey&action=recognize&freeform=1&content-type=audio/x-wav&output=rest&audio=[base64 encoded something.wav without \r

\n characters]

HTTP/1.0 200 OK

Connection: close

Content-Length: 59

Content-Type: text/plain

Cache-Control: no-cache, no-store, must-revalidate, max-age=0, proxy-revalidate, no-transform

Pragma: no-cache

text=something&confidence=0.0216270890086889&result=success

HTTP JSON transaction for Speech Recognition

HTTP JSON Request and REST Reply

POST /api/json HTTP/1.1

Content-Length: 34897

Content-Type: text/plain; charset=UTF-8

Host: api.ispeech.org

Connection: Keep-Alive

{"apikey":"developerdemokeydeveloperdemokey","action":"recognize",

"freeform":"1","content-type":"audio/x-wav", "output":"rest", "audio":"[base64 encoded something.wav without \r\n characters]”}

HTTP/1.0 200 OK

Connection: close

Content-Length: 59

Content-Type: application/json

Cache-Control: no-cache, no-store, must-revalidate, max-age=0, proxy-revalidate, no-transform

Pragma: no-cache

text=something&confidence=0.0134419081732631&result=success

HTTP XML network transaction for Speech Recognition

HTTP XML Request and Reply

POST /api/xml HTTP/1.1

Content-Length: 34953

Content-Type: text/plain; charset=UTF-8

Host: api.ispeech.org

Connection: Keep-Alive

User-Agent: Apache-HttpClient/4.0.1 (java 1.5)

Expect: 100-Continue

<data>

<apikey>developerdemokeydeveloperdemokey</apikey>

<action>recognize</action>

<freeform>1</freeform>

<content-type>audio/x-wav</content-type>

<output>xml</output>

<audio>[base64 encoded something.wav without \r\n characters]</audio>

</data>

HTTP/1.0 200 OK

Connection: close

Content-Length: 140

Content-Type: text/xml

Cache-Control: no-cache, no-store, must-revalidate, max-age=0, proxy-revalidate, no-transform

Pragma: no-cache

<?xml version="1.0" encoding="UTF-8"?>

<data>

<text>something</text>

<confidence>0.0216270890086889</confidence>

<result>success</result>

</data>

Command Lists

About

Command lists are used to limit the possible values returned during speech recognition.  For example, if the command list contains only “yes” and “no”, the result will be either “yes” or “no”.

Example Transactions for Command Lists

Formatting of Examples

 

The following examples are packet captures of TCP connections that use the HTTP protocol.  You can compare your network traffic with these to debug code.  Wireshark can be used to analyze network connections.  A REST client can be used to make these HTTP requests.

HTTP XML network transaction to detect commands from a list.

HTTP XML Request and Response

POST /api/xml HTTP/1.1

Content-Length: 80941

Content-Type: text/xml; charset=UTF-8

Host: api.ispeech.org

Expect: 100-Continue

<data>

<apikey>developerdemokeydeveloperdemokey</apikey>

<action>recognize</action>

<output>xml</output>

<alias>command1|YESNOMAYBE</alias>

<YESNOMAYBE>yes|no|maybe</YESNOMAYBE>

<command1>say %YESNOMAYBE%</command1>

<content-type>audio/x-wav</content-type>

<audio>[base64 encoded say_yes.wav without \r\n characters]</audio>

</data>

HTTP/1.0 200 OK

Connection: close

Content-Length: 137

Content-Type: text/xml

Cache-Control: no-cache, no-store, must-revalidate, max-age=0, proxy-revalidate, no-transform

Pragma: no-cache

<?xml version="1.0" encoding="UTF-8"?>

<data>

<text>say yes</text>

<confidence>0.726751327514648</confidence>

<result>success</result>

</data>

If a user speaks "say yes", or ”say maybe”, or “say no” it will be successfully recognized.

HTTP REST network transaction to detect commands from a list.

HTTP REST Request and Response

POST /api/rest/ HTTP/1.1

Content-Length: 72682

Content-Type: text/plain; charset=UTF-8

Host: api.ispeech.org

Expect: 100-Continue

apikey=developerdemokeydeveloperdemokey&action=recognize&content-type=audio%2Fwav&output=rest&alias=command1|NAMES

&NAMES=john|mary|anna&command1=call%20%25NAMES%25&audio=[base64 encoded wav without \r\n characters]

HTTP/1.0 200 OK

Connection: close

Content-Length: 58

Content-Type: text/plain

Cache-Control: no-cache, no-store, must-revalidate, max-age=0, proxy-revalidate, no-transform

Pragma: no-cache

text=call+mary&confidence=0.672464966773987&result=success

If a user speaks "call john" or "call anna" or "call mary" it will be successfuly recognized.

HTTP POST JSON request to detect commands from a list.

HTTP POST JSON Request and REST Response

POST /api/json/ HTTP/1.1

Content-Length: 22788

Content-Type: text/plain; charset=UTF-8

Host: api.ispeech.org

Expect: 100-Continue

{"apikey":"developerdemokeydeveloperdemokey","action":"recognize",

"alias":"command1|YESNOMAYBE","YESNOMAYBE":"yes|no|maybe","command1":"say %YESNOMAYBE%","content-type":"audio/x-wav","output":"rest","audio":"[base64 encoded say_yes.wav without \r\n characters]"}

HTTP/1.0 200 OK

Connection: close

Content-Length: 56

Content-Type: application/json

Cache-Control: no-cache, no-store, must-revalidate, max-age=0, proxy-revalidate, no-transform

Pragma: no-cache

text=say+yes&confidence=0.726751327514648&result=success

If a user speaks "say yes", or ”say maybe”, or “say no” it will be successfully recognized.

Advanced Example, HTTP POST XML request to detect multiple audio commands from multiple lists.

HTTP XML Request and Response

POST /api/xml HTTP/1.1

Content-Length: 91393

Content-Type: text/xml; charset=UTF-8

Host: api.ispeech.org

Connection: Keep-Alive

Expect: 100-Continue

<data>

<apikey>developerdemokeydeveloperdemokey</apikey>

<action>recognize</action>

<content-type>audio/x-wav</content-type>

<output>xml</output>

<alias>command1|command2|MONITORACTIONS|COLORLIST|

DYNAMITEACTIONS|OBJECTLIST</alias>

<MONITORACTIONS>on|off|reset</MONITORACTIONS>

<COLORLIST>blue|green|red|yellow|purple|orange|black|white|cyan</COLORLIST>

<DYNAMITEACTIONS>explode|fizzle out</DYNAMITEACTIONS>

<OBJECTLIST>monitor %MONITORACTIONS%|color %COLORLIST%|dynamite  %DYNAMITEACTIONS%</OBJECTLIST>

<command1>set %OBJECTLIST%</command1>

<command2>quit menu</command2>

<audio>[base64 encoded set_dynamite_explode.wav

without \r\n characters]</audio>

</data>

HTTP/1.0 200 OK

Connection: close

Content-Length: 150

Content-Type: text/xml

Cache-Control: no-cache, no-store, must-revalidate,

max-age=0, proxy-revalidate, no-transform

Pragma: no-cache

<?xml version="1.0" encoding="UTF-8"?><data><text>set dynamite explode</text><confidence>0.589247465133667</confidence>

<result>success</result></data>

If a user speaks "set monitor on", or ”set monitor off”, or “set dynamite explode”, etc. it will be successfully recognized.

Error Codes for Speech Recognition and General Errors

Code

Summary

1

Invalid API key

3

Not enough credits

4

No action specified

12

Invalid file format

14

Invalid dictionary

17

Invalid alias list

18

Alias missing

19

Invalid content type

20

Alias list too complex

21

Could not recognize

30

Option not enabled for your account. Please contact iSpeech sales at +1 (917) 338-7723 or at sales@ispeech.org to modify your license.

100

This evaluation account has exceeded its trial period. Please contact iSpeech sales at +1 (917) 338-7723 or at sales@ispeech.org to upgrade your license.

101

Your key has been disabled. Please contact iSpeech sales at +1 (917) 338-7723 or at sales@ispeech.org to modify your license.

997

No api access

998

Unsupported output type

999

Invalid request

1000

Invalid Request Method POST Required

Section 4

Translation

Summary

iSpeech Translation API

Documentation on the iSpeech translation API is only available on request.  Please send inquiries to sales@ispeech.org.

iSpeech API Specification Version 2.0
The iSpeech API allows developers to implement Text-To-Speech (TTS) and Automated Voice Recognition (ASR) in any Internet enabled application.

The API's are platform agnostic which means any device that can record or play audio and is connected to the Internet can use the iSpeech API.
General Working Notes

The API accepts standard HTTP requests in XML, JSON or URL-encoded formats for easy integration in your favorite development language.

Every request must have a valid API key. To obtain an API key please visit our sign up page.

All requests must be sent to:

The root node for XML requests is "data".

All API calls are synchronous.

HTTP POST must be used for XML or JSON requests, GET request will fail with an error code.

An action must be specified with every request.

The API may be accessed via HTTP or HTTPS (SSL) connections.

You can make a request in XML and receive a response in JSON or URL encoding by specifying the output parameter to JSON, XML, or REST.

Text-To-Speech API

The iSpeech Text-To-Speech API allows you to create high quality speech audio in multiple formats, including mp3, wav, wma, mp4, ogg, flac. The iSpeech API is fast enough to not require call-back URL's. This means you'll always receive audio data or an error message in the same request.

Since the Text-To-Speech API is so easy to use, we provide this very simple one-line quick start. Simply change the API key in the URL to your API key, and you'll receive an mp3 containing the text you specified: http://api.ispeech.org/api/rest?apikey=YOURAPIKEYHERE&action=convert&text=Your+url+encoded+text

Actions

All actions return a result value with either success or error. If the result value is error and appropriate error code and message will be returned in the request. Also, a "202 Accepted" response will be issued along with the error result.

The following actions are available for the text to speech API.

Information

Required Parameters:

Optional Parameters:

Response Values:

Example Response:

Convert

Required Parameters:

Optional Parameters:

Response Values:

Returns a HTTP "200 OK" containing the binary audio data on success. If you receive a "202 Accepted", parse the response for an error message.

Speech Recognition API

The iSpeech Automated Speech Recognition API allows you to specify simple word list based grammars or freeform dictation.

Due to the complexity of the Speech Recognition API, we highly recommend using the appropriate language SDK instead of writing your own custom implementation.

Note: You must specify the audio content type in your request. URL encoded requests must past only audio data in their POST. Your requested URL must contain your API key and any other parameters.

The following actions are available for the text to speech API.

Recognize

Required Parameters:

Required Grammar/Commands Parameters:

Response Values:

Example Response:

result=success&text=say+yes&confidence=0.89235

Error Codes

If something goes wrong, you'll receive an error message along with an error code, below is a chart containing the possible error codes.

Example error response:

result=error&code=1&message=Invalid+api+key
Contact    Investors    Policies© 2009-2012 iSpeech, Inc. All Rights Reserved. iSpeech and the iSpeech logo are registered trademarks of iSpeech, Inc.