This document provides a detailed description of the iSpeech API and examples for almost every possible use case. It demonstrates how to dynamically retrieve API key properties such as voice lists, credits, and enabled key options. Wireshark and RESTClient are recommended for viewing network transactions, creating requests, and debugging code. Last Updated: December 12, 2011. |
iSpeech Developer Support
http://www.ispeech.org/developers
iSpeech Inc. Version 2.1
iSpeech Inc. (“iSpeech”) has made efforts to ensure the accuracy and
completeness of the information in this document. However, iSpeech Inc. disclaims all
representations, warranties and conditions, whether express or implied, arising by
statute, operation of law, usage of trade, course of dealing or otherwise, with respect to
the information contained herein. iSpeech Inc. assumes no liability to any party for any
loss or damage, whether direct, indirect, incidental, consequential, special or exemplary,
with respect to (a) the information; and/or (b) the evaluation, application or use of any
product or service described herein.
iSpeech Inc. disclaims any and all representation that its products or services infringe
upon any existing or future intellectual property rights. iSpeech Inc. owns and retains all
right, title and interest in and to the iSpeech Inc. intellectual property, including without
limitation, its patents, marks, copyrights and technology associated with the
iSpeech Inc. services. No title or ownership of any of the foregoing is granted or
otherwise transferred hereunder. iSpeech Inc. reserves the right to make changes to
any information herein without further notice.
Revision History
|
PUBLISH DATE |
UPDATES |
|
Aug 8, 2011 |
Document created |
|
Sept 13, 2011 |
Added ASR |
|
Sept 21, 2011 |
Added AMR to ASR |
|
Nov 9, 2011 |
Added voice command examples |
|
Nov 10, 2011 |
Removed references to ASR Raw POST |
|
Nov 11, 2011 |
Made output variable explicit for ASR and voice list examples |
|
Nov 17, 2011 |
Specified HTTP POST/GET instead of REST, fixed /r/n typos |
|
Nov 22, 2011 |
Added reference for Speex in ASR content-type example |
|
Dec 12, 2011 |
Added endpadding and startpadding variables |
|
Dec 12, 2011 |
Added TTS examples, added background highlighting to emphasize examples |
Revision History
Section 1 Introduction .....................................................................................
Minimum Requirements
Internet Connection
HTTP Protocol
API Key
Managing API Key Settings
View/Edit Keys
API Features
Text to Speech
Automated Speech Recognition
Developer Support
Sales
Support/Troubleshooting
Software Development Kits
Availability
API Access Pricing
Section 2 Text to Speech .................................................................................
Transaction Types and URL Formats
HTP GET/POST: URL Encoded, XML, JSON
Request Parameters
Voices
HTTP GET Example: URL Encoded → Binary Audio
Voice List Retrieval
HTTP GET: URL Encoded, XML, JSON
Speed
HTTP GET Example: URL Encoded → Binary Audio
Formats
HTTP GET Example: URL Encoded → Binary Audio
Bitrates
HTTP GET Example: URL Encoded → Binary Audio
Padding
HTTP GET Example: URL Encoded → Binary Audio
Example Transactions
Summary
HTTP POST: URL Encoded → Binary Audio
HTTP POST: JSON → Binary
Audio
HTTP POST: XML → Binary Audio
HTTP GET: URL Encoded → URL Encoded (this transaction purposely contains an error)
Error Codes for Text-to-Speech and General Errors
Section 3 Automated Speech Recognition
.....................................................
Transaction Types and URL Formats
HTTP GET/POST: URL Encoded, XML, JSON
Request Parameters
Languages
Standard Languages
Custom Languages
Speech Recognition Models
Standard
Custom
Example Transactions for Freeform Speech
Format of Examples
HTTP POST: URL Encoded → URL Encoded
HTTP POST: JSON → JSON
HTTP POST: XML → XML
Command Lists
About
Example Transactions for Command Lists
Formatting of Examples
HTTP POST: XML → XML
HTTP POST: URL Encoded → URL Encoded
HTTP POST: JSON → JSON
HTTP POST: XML → XML -- Detecting multiple audio commands from multiple command lists
Error Codes for Speech Recognition and General Errors
Section 4 Translation ......................................................................................
Summary
iSpeech Translation API
Section 1
Introduction
Welcome to the iSpeech Inc. Application Programming Interface (API) Developer Guide. This guide describes the available variables, commands, and interfaces that make up the iSpeech API.
The iSpeech API allows developers to implement Text-To-Speech (TTS) and Automated Voice Recognition (ASR) in any Internet-enabled application.
The API's are platform agnostic which means any device that can record or play audio connected to the Internet can use the iSpeech API.
Minimum Requirements
Below are the minimum requirements needed to use the iSpeech API. The API can be use with and without a SDK.
Internet connection
iSpeech services require a connection to the internet.
HTTP Protocol
The iSpeech API follows the HTTP standard by using GET and POST. Some web browsers limit the length of GET requests to a few thousand characters.
Request/Responses
Requests can be in URL encoded, JSON, or XML data formats. You can specify the output data format of responses. For TTS, binary data is is usually returned if the request is successful. For speech recognition, URL encoded text, JSON, or XML can be returned by setting the output variable.
API Key
An API key is a password that is required for access. To obtain an API key please visit: http://www.ispeech.org/developers and register for a developer account.
Managing API Key
Settings
View/Edit Keys
Manage your API keys by using the iSpeech developer website: http://www.ispeech.org/developers. You can request additional features for your API keys on that website.
API Features
Text to Speech
You can play audio through iSpeech TTS in a variety of voices, formats, bitrates, frequencies, and playback speeds.
Automated Speech Recognition
You can convert audio from a variety of languages and recognition models. We can create custom recognition models to improve recognition quality.
Developer Support
Sales
iSpeech sales can be contacted at the following phone number: +1-917-338-7723 from 10 AM to 6 PM Eastern Time, Monday to Friday. You can also email us at sales@ispeech.org.
Support / Troubleshooting
Please look for the answer to your problem in the iSpeech Developer Forum: http://www.ispeech.org/forums/
Software Development Kits
iSpeech SDKs simplify the iSpeech API. You should
use iSpeech SDKs if the option is available. Only mobile SDKs made by iSpeech
allow you to use the iSpeech API for free.
Availability
iPhone, Android, BlackBerry, .NET, Java (Server), Java applet (Client) [coming soon], PHP, Javascript/Flash [coming soon]
API Access Pricing
|
PLATFORMS |
PRICE |
|
iPhone, Android, BlackBerry |
Free using iSpeech SDK |
|
.NET, Java, PHP |
Between $.05 and 0.00001 per word converted or recognized depending on quantity |
Section 2
Text to Speech
The iSpeech Text-To-Speech API allows you to create high quality spoken audio in multiple formats. The iSpeech API doesn’t use callbacks. It’s synchronous and fast. This means you'll always receive audio data or an error message in the same HTTP request.
Transaction Types and URL Formats
|
TRANSACTION TYPE |
INPUT FORMAT |
URL |
|
HTTP GET/POST |
URL Encoded |
http://api.ispeech.org/api/rest |
|
HTTP GET/POST |
XML |
http://api.ispeech.org/api/xml |
|
HTTP GET/POST |
JSON |
http://api.ispeech.org/api/json |
Request Parameters
|
PARAMETER |
DATA TYPE |
EXAMPLE VALUE |
|
Apikey |
32 character hex number |
abcdef1234567890abcdef1234567890 |
|
Action |
String |
convert, information |
|
Text |
String |
Hello World |
|
Voice (optional) |
String |
usenglishfemale |
|
Format (optional) |
String |
mp3 |
|
Frequency (optional) |
String |
16000 |
|
Bitrate (optional) |
String |
64 |
|
Speed (optional) |
Integer |
-10 to 10 |
|
Startpadding (optional) |
Integer (seconds) |
5 |
|
Endpadding (optional) |
Integer (seconds) |
5 |
|
Example HTTP GET Request (Using every possible variable) |
Voices
Standard Voices
|
Name |
Alias |
|
US English Female (default) |
usenglishfemale |
|
US English Male |
usenglishfemale |
|
UK English Female |
ukenglishfemale |
|
UK English Male |
ukenglishmale |
|
Australian English Female |
auenglishfemale |
|
US Spanish Female |
usspanishfemale |
|
US Spanish Male |
usspanishmale |
|
Chinese Female |
chchinesefemale |
|
Chinese Male |
chchinesemale |
|
Hong Kong Cantonese Female |
hkchinesefemale |
|
Taiwan Chinese Female |
twchinesefemale |
|
Japanese Female |
jpjapanesefemale |
|
Japanese Male |
jpjapanesemale |
|
Korean Female |
krkoreanfemale |
|
Korean Male |
krkoreanmale |
|
Canadian English Female |
caenglishfemale |
|
Hungarian Female |
huhungarianfemale |
|
Brazilian Portuguese Female |
brportuguesefemale |
|
European Portuguese Female |
eurportuguesefemale |
|
European Portuguese Male |
eurportuguesemale |
|
European Spanish Female |
eurspanishfemale |
|
European Spanish Male |
eurspanishmale |
|
European Catalan Female |
eurcatalanfemale |
|
European Czech Female |
eurczechfemale |
|
European Danish Female |
eurdanishfemale |
|
European Finnish Female |
eurfinnishfemale |
|
European French Female |
eurfrenchfemale |
|
European French Male |
eurfrenchmale |
|
European Norwegian Female |
eurnorwegianfemale |
|
European Dutch Female |
eurdutchfemale |
|
European Dutch Male |
eurdutchmale |
|
European Polish Female |
eurpolishfemale |
|
European Italian Female |
euritalianfemale |
|
European Italian Male |
euritalianmale |
|
European Turkish Female |
eurturkishfemale |
|
European Turkish Male |
eurturkishmale |
|
European German Female |
eurgermanfemale |
|
European German Male |
eurgermanmale |
|
Russian Female |
rurussianfemale |
|
Russian Male |
rurussianmale |
|
Swedish Female |
swswedishfemale |
|
Canadian French Female |
cafrenchfemale |
|
Canadian French Male |
cafrenchmale |
|
HTTP GET Request (Setting voice to European French Female) |
|
http://api.ispeech.org/api/rest?apikey=developerdemokeydeveloperdemokey&action=convert&text=something&format=mp3&voice=eurfrenchfemale |
Custom Voices
Custom Voices may be enabled for your account. They can be found in the developer portal -> api key properties -> custom voices. You can use them by setting voice to the custom alias.
|
Name |
Alias |
|
President Obama |
obama |
|
Custom Voice |
customvoice1 |
Voice List Retrieval
A current list of voices that are enabled for an API key can be retrieved in REST, JSON, and XML format by using the following service. HTTP GET and POST are supported. A REST client can be used to make these HTTP requests.
HTTP GET Network Transaction to get XML voice list.
|
HTTP GET Request and XML Response |
|
<?xml version='1.0'?> <data> <result>success</result> <voice-1>krkoreanfemale</voice-1> <voice-locale-1-1>ko-kr</voice-locale-1-1> <voice-locale-1-2>ko</voice-locale-1-2> <voice-gender-1>female</voice-gender-1> <voice-description-1>Korean Female Voice</voice-description-1> <voice-2>usenglishfemale</voice-2> <voice-locale-2-1>en-us</voice-locale-2-1> <voice-locale-2-2>en</voice-locale-2-2> <voice-gender-2>female</voice-gender-2> <voice-description-2>United States English Female Voice</voice-description-2> [... more voices ...] </data> |
HTTP GET Network Transaction to get JSON voice list.
|
HTTP GET URL Encoded Request and JSON Response |
|
http://api.ispeech.org/api/rest?apikey=developerdemokeydeveloperdemokey&action=information&output=json |
|
{"voice-gender-48":"female","voice-locale-22-1":"fr-ca","voice-locale-8-1":"pt-br","voice-description-2":"Finnish Female Voice","voice-description-3":"Hong Kong Chinese Male Voice","voice-58":"eurdanishfemale","voice-description-1":"Korean Female Voice","voice-description-6":"Chinese Female Voice","voice-description-7":"United Kingdom English Female Voice","voice-description-4”, [...more voices...]} |
HTTP GET Network Transaction to get URL Encoded voice list.
|
HTTP GET URL Encoded Request and URL Encoded Response |
|
http://api.ispeech.org/api/rest?apikey=developerdemokeydeveloperdemokey&action=information&output=rest |
|
result=success&voice-1=krkoreanfemale2&voice-locale-1-1=ko-kr&voice-gender-1=female&voice-description-1=Korean+Female+Voice&voice-2=eurfinnishfemale&voice-locale-2-1=fi-fi&voice-gender-2=female&voice-description-2=Finnish+Female+Voice&voice-3=chchinesemale1&voice-locale-3-1=zh&voice-locale-3-2=zh-hk[...more voices...] |
Speed
Most voices support speed controls.
|
Speed |
Value (integer) |
|
Fastest |
10 |
|
Faster |
Speed > 0 |
|
Normal (default) |
0 |
|
Slower |
Speed < 0 |
|
Slowest |
-10 |
|
HTTP GET Request (Setting speed to 5) |
|
http://api.ispeech.org/api/rest?apikey=developerdemokeydeveloperdemokey&action=convert&text=something&voice=usenglishfemale&format=mp3&speed=5 |
Bitrates
Note: Bitrates can only be selected for MP3s.
Valid values are 16, 24, 32, 48 (default), 56, 64, 80, 96, 112, 128, 144, 160, 192, 224, 256, or 320.
Bitrates are listed in kbps (kilobits per second).
|
HTTP GET Request (Setting bitrate to 16 kilobits per second) |
|
http://api.ispeech.org/api/rest?apikey=developerdemokeydeveloperdemokey&action=convert&text=something&voice=usenglishfemale&format=mp3&bitrate=16 |
Formats
|
Name |
File extension |
|
Audio Interchange File Format |
aiff |
|
MPEG Layer 3 (default) |
mp3 |
|
Ogg |
ogg |
|
Windows Media Audio |
wma |
|
Free Lossless Audio Codec |
flac |
|
Wave PCM |
wav |
|
Example HTTP GET Request (Setting format to wav) |
|
http://api.ispeech.org/api/rest?apikey=developerdemokeydeveloperdemokey&action=convert&text=something&voice=usenglishfemale&format=wav |
Frequencies
Possible values: 16000, 22000, 24000, 32000, 44100, 48000 Hertz
|
Example HTTP GET Request (Setting frequency to 16000 Hz) |
|
http://api.ispeech.org/api/rest?apikey=developerdemokeydeveloperdemokey&action=convert&text=something&voice=usenglishfemale&frequency=16000 |
Padding
Padding adds silence to a section of the audio file.
Start Padding
|
Example HTTP GET Request (Setting start padding to 3 seconds) |
|
http://api.ispeech.org/api/rest?apikey=developerdemokeydeveloperdemokey&action=convert&text=something&voice=usenglishfemale&startpadding=3 |
Adds a period of silence to the beginning of the audio file..
End Padding
|
Example |
|
http://api.ispeech.org/api/rest?apikey=developerdemokeydeveloperdemokey&action=convert&text=something&voice=usenglishfemale&endpadding=3 |
Adds a period of silence to the end of the audio file.
Example Transactions
Summary
The following examples are packet captures from TCP connections that used the HTTP protocol. You can compare your network traffic to these in order to debug code. Wireshark can be used to analyze network connections. A REST client can be used to make these HTTP requests.
HTTP POST URL encoded request for Text to Speech
|
HTTP POST Request and Reply |
|
POST /api/rest HTTP/1.1 Content-Length: 71 Content-Type: text/plain; charset=UTF-8 Host: api.ispeech.org Connection: Keep-Alive apikey=developerdemokeydeveloperdemokey&action=convert&text=hello+world |
|
HTTP/1.0 200 OK Connection: close Server: iSpeech Cloud/1.2 Accept-Ranges: none X-Time-Length: 3853 X-Content-Hash: e969ef3dd0dc0e9c417f31f7ffbd10ed Content-Length: 23760 Content-Type: audio/mpeg Cache-Control: no-cache, no-store, must-revalidate, max-age=0, proxy-revalidate, no-transform Pragma: no-cache [mp3 binary audio data] |
HTTP POST JSON request for Text to Speech
|
HTTP POST, JSON Request and Reply |
|
POST /api/json HTTP/1.1 Content-Length: 11 Content-Type: application/json; charset=UTF-8 Host: api.ispeech.org Connection: Keep-Alive {"apikey":"developerdemokeydeveloperdemokey","action":"convert","text":"hello world","voice":"usenglishfemale"} |
|
Connection: close Server: iSpeech Cloud/1.2 Accept-Ranges: none X-Time-Length: 3853 X-Content-Hash: e969ef3dd0dc0e9c417f31f7ffbd10ed Content-Length: 23760 Content-Type: audio/mpeg Cache-Control: no-cache, no-store, must-revalidate, max-age=0, proxy-revalidate, no-transform Pragma: no-cache [mp3 audio binary data] |
HTTP POST XML request for Text to Speech
|
HTTP POST, XML Request and Reply |
|
POST /api/xml HTTP/1.1 Content-Length: 150 Content-Type: application/xml; charset=UTF-8 Host: api.ispeech.org Connection: Keep-Alive <data> <apikey>developerdemokeydeveloperdemokey</apikey> <action>convert</action> <text>hello world</text> <voice>usenglishfemale</voice> </data> |
|
HTTP/1.0 200 OK Connection: close Server: iSpeech Cloud/1.2 Accept-Ranges: none X-Time-Length: 3853 X-Content-Hash: 4affe15913fccd851ebf08a7e2650955 Content-Length: 23760 Content-Type: audio/mpeg Cache-Control: no-cache, no-store, must-revalidate, max-age=0, proxy-revalidate, no-transform Pragma: no-cache [mp3 audio binary data] |
Example of a network transaction with an error
Responses with text errors instead of audio data return “HTTP/1.0 202 Accepted”.
|
HTTP GET, URL Encoded Request and Reply (misspelled variable) |
|
GET /api/rest?apikey=developerdemokeydeveloperdemokey&action=convert& text=something&voice=usenglishfemale HTTP/1.1 Host: api.ispeech.org Connection: keep-alive User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.2 (KHTML, like Gecko) Chrome/15.0.874.58 Safari/535.2 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 Accept-Encoding: gzip,deflate,sdch Accept-Language: en-US,en;q=0.8 Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.3 |
|
HTTP/1.0 202 Accepted Server: iSpeech Cloud/1.2 Connection: close Content-Length: 41 Content-Type: text/plain Cache-Control: no-cache, no-store, must-revalidate, max-age=0, proxy-revalidate, no-transform Pragma: no-cache result=error&code=8&message=Invalid+voice |
Error codes for Text-to-Speech and General Errors
|
Code |
Summary |
|
1 |
Invalid API key |
|
2 |
Could not convert text |
|
3 |
Not enough credits |
|
4 |
No action specified |
|
5 |
Invalid text |
|
6 |
Too many words |
|
7 |
Invalid text entry |
|
8 |
Invalid voice |
|
12 |
Invalid file format |
|
13 |
Invalid speed |
|
14 |
Invalid dictionary |
|
15 |
Invalid bitrate |
|
16 |
Invalid frequency |
|
30 |
Option not enabled for your account. Please contact iSpeech sales at +1 (917) 338-7723 or at sales@ispeech.org to modify your license. |
|
100 |
This evaluation account has exceeded its trial period. Please contact iSpeech sales at +1 (917) 338-7723 or at sales@ispeech.org to upgrade your license. |
|
101 |
Your key has been disabled. Please contact iSpeech sales at +1 (917) 338-7723 or at sales@ispeech.org to modify your license. |
|
997 |
No api access |
|
998 |
Unsupported output type |
|
999 |
Invalid request |
|
1000 |
Invalid Request Method POST Required |
Section 3
Automated Speech Recognition
Transaction Types and URL Formats
There are currently three transaction types available for use with the iSpeech API. All transactions must be posted to the appropriate URL:
|
TRANSACTION TYPE |
INPUT TYPE |
URL FORMAT |
|
HTTP GET/POST |
URL Encoded |
http://api.ispeech.org/api/rest |
|
HTTP GET/POST |
XML |
http://api.ispeech.org/api/xml |
|
HTTP GET/POST |
JSON |
http://api.ispeech.org/api/json |
Request Parameters
|
PARAMETER |
VALUE |
EXAMPLE |
|
Apikey |
32 character hex integer |
abcdef1234567890abcdef1234567890 |
|
Locale |
String |
en-US (see list) |
|
Action |
String |
recognize, information |
|
Content-Type |
String |
audio/x-wav, audio/amr, audio/speex |
|
Audio |
String (base64, remove \r\n) |
(the audio data base64 encoded) |
|
Output |
String |
xml, json, rest |
Locales Supported
Standard Languages
|
Name |
Alias |
Support |
|
English (United States) |
en-US |
freeform & command list |
|
English (Canada) |
en-CA |
freeform & command list |
|
English (United Kingdom) |
en-GB |
freeform & command list |
|
English (Australia) |
en-AU |
command list |
|
Spanish (Spain) |
es-ES |
freeform & command list |
|
Spanish (Mexico) |
es-MX |
command list |
|
Italian (Italy) |
it-IT |
freeform & command list |
|
French (France) |
fr-FR |
freeform & command list |
|
French (Canada) |
fr-CA |
command list |
|
Polish (Poland) |
pl-PL |
freeform & command list |
|
Portuguese (Portugal) |
pt-PT |
freeform & command list |
|
Catalan (Catalan) |
ca-ES |
command list |
|
Chinese (Taiwan) |
zh-TW |
command list |
|
Danish (Denmark) |
da-DK |
command list |
|
German (Germany) |
fr-FR |
command list |
|
Finnish (Finland) |
it-IT |
command list |
|
Japanese (Japan) |
ja-JP |
command list |
|
Korean (Korea) |
ko-KR |
command list |
|
Dutch (Netherlands) |
nl-NL |
command list |
|
Norwegian (Norway) |
nb-NO |
command list |
|
Portuguese (Brazil) |
pt-BR |
command list |
|
Russian (Russia) |
ru-RU |
command list |
|
Swedish (Sweden) |
sv-SE |
command list |
|
Chinese (People's Republic of China) |
zh-CN |
command list |
|
Chinese (Hong Kong S.A.R.) |
zh-HK |
command list |
Custom Languages
Contact sales@ispeech.org for details.
Speech Recognition Models
Statistical speech recognition models are used to influence the result by probability. Models with fewer word choices are faster and more accurate than the freeform models. For example, in the food model the words, “7 up” would be recognized as, “7up”. Another example is with a food model would recognize the audio from “ice cream” as “ice cream” instead of “i scream”.
Standard Models
|
Name |
Value |
Use Case |
|
SMS |
1 |
Text Messages |
|
Voice mail |
2 |
Voice Mail |
|
Dictation |
3 |
Normal speech |
|
Message |
4 |
|
|
Instant Message |
5 |
Instant Message |
|
Transcript (coming soon) |
6 |
|
|
Memo (coming soon) |
7 |
Memorandum |
Custom Models
Call iSpeech sales and support to inquire about custom speech recognition models.
Example Transactions for Freeform Speech
Format of Examples
The following examples are packet captures from TCP connections that used the HTTP protocol. You can compare your network traffic to these in order to debug code. Wireshark can be used to analyze network connections.
HTTP REST transaction for Speech Recognition
|
HTTP REST Request and Response |
|
POST /api/rest HTTP/1.1 Content-Length: 34875 Content-Type: text/plain; charset=UTF-8 Host: api.ispeech.org Connection: Keep-Alive apikey=developerdemokeydeveloperdemokey&action=recognize&freeform=1&content-type=audio/x-wav&output=rest&audio=[base64 encoded something.wav without \r \n characters] |
|
HTTP/1.0 200 OK Connection: close Content-Length: 59 Content-Type: text/plain Cache-Control: no-cache, no-store, must-revalidate, max-age=0, proxy-revalidate, no-transform Pragma: no-cache text=something&confidence=0.0216270890086889&result=success |
HTTP JSON transaction for Speech Recognition
|
HTTP JSON Request and REST Reply |
|
POST /api/json HTTP/1.1 Content-Length: 34897 Content-Type: text/plain; charset=UTF-8 Host: api.ispeech.org Connection: Keep-Alive {"apikey":"developerdemokeydeveloperdemokey","action":"recognize", "freeform":"1","content-type":"audio/x-wav", "output":"rest", "audio":"[base64 encoded something.wav without \r\n characters]”} |
|
HTTP/1.0 200 OK Connection: close Content-Length: 59 Content-Type: application/json Cache-Control: no-cache, no-store, must-revalidate, max-age=0, proxy-revalidate, no-transform Pragma: no-cache text=something&confidence=0.0134419081732631&result=success |
HTTP XML network transaction for Speech Recognition
|
HTTP XML Request and Reply |
|
POST /api/xml HTTP/1.1 Content-Length: 34953 Content-Type: text/plain; charset=UTF-8 Host: api.ispeech.org Connection: Keep-Alive User-Agent: Apache-HttpClient/4.0.1 (java 1.5) Expect: 100-Continue <data> <apikey>developerdemokeydeveloperdemokey</apikey> <action>recognize</action> <freeform>1</freeform> <content-type>audio/x-wav</content-type> <output>xml</output> <audio>[base64 encoded something.wav without \r\n characters]</audio> </data> |
|
HTTP/1.0 200 OK Connection: close Content-Length: 140 Content-Type: text/xml Cache-Control: no-cache, no-store, must-revalidate, max-age=0, proxy-revalidate, no-transform Pragma: no-cache <?xml version="1.0" encoding="UTF-8"?> <data> <text>something</text> <confidence>0.0216270890086889</confidence> <result>success</result> </data> |
Command Lists
About
Command lists are used to limit the possible values returned during speech recognition. For example, if the command list contains only “yes” and “no”, the result will be either “yes” or “no”.
Example Transactions for Command Lists
Formatting of Examples
The following examples are packet captures of TCP connections that use the HTTP protocol. You can compare your network traffic with these to debug code. Wireshark can be used to analyze network connections. A REST client can be used to make these HTTP requests.
HTTP XML network transaction to detect commands from a list.
|
HTTP XML Request and Response |
|
POST /api/xml HTTP/1.1 Content-Length: 80941 Content-Type: text/xml; charset=UTF-8 Host: api.ispeech.org Expect: 100-Continue <data> <apikey>developerdemokeydeveloperdemokey</apikey> <action>recognize</action> <output>xml</output> <alias>command1|YESNOMAYBE</alias> <YESNOMAYBE>yes|no|maybe</YESNOMAYBE> <command1>say %YESNOMAYBE%</command1> <content-type>audio/x-wav</content-type> <audio>[base64 encoded say_yes.wav without \r\n characters]</audio> </data> |
|
HTTP/1.0 200 OK Connection: close Content-Length: 137 Content-Type: text/xml Cache-Control: no-cache, no-store, must-revalidate, max-age=0, proxy-revalidate, no-transform Pragma: no-cache <?xml version="1.0" encoding="UTF-8"?> <data> <text>say yes</text> <confidence>0.726751327514648</confidence> <result>success</result> </data> |
If a user speaks "say yes", or ”say maybe”, or “say no” it will be successfully recognized.
HTTP REST network transaction to detect commands from a list.
|
HTTP REST Request and Response |
|
POST /api/rest/ HTTP/1.1 Content-Length: 72682 Content-Type: text/plain; charset=UTF-8 Host: api.ispeech.org Expect: 100-Continue apikey=developerdemokeydeveloperdemokey&action=recognize&content-type=audio%2Fwav&output=rest&alias=command1|NAMES &NAMES=john|mary|anna&command1=call%20%25NAMES%25&audio=[base64 encoded wav without \r\n characters] |
|
HTTP/1.0 200 OK Connection: close Content-Length: 58 Content-Type: text/plain Cache-Control: no-cache, no-store, must-revalidate, max-age=0, proxy-revalidate, no-transform Pragma: no-cache text=call+mary&confidence=0.672464966773987&result=success |
If a user speaks "call john" or "call anna" or "call mary" it will be successfuly recognized.
HTTP POST JSON request to detect commands from a list.
|
HTTP POST JSON Request and REST Response |
|
POST /api/json/ HTTP/1.1 Content-Length: 22788 Content-Type: text/plain; charset=UTF-8 Host: api.ispeech.org Expect: 100-Continue {"apikey":"developerdemokeydeveloperdemokey","action":"recognize", "alias":"command1|YESNOMAYBE","YESNOMAYBE":"yes|no|maybe","command1":"say %YESNOMAYBE%","content-type":"audio/x-wav","output":"rest","audio":"[base64 encoded say_yes.wav without \r\n characters]"} |
|
HTTP/1.0 200 OK Connection: close Content-Length: 56 Content-Type: application/json Cache-Control: no-cache, no-store, must-revalidate, max-age=0, proxy-revalidate, no-transform Pragma: no-cache text=say+yes&confidence=0.726751327514648&result=success |
If a user speaks "say yes", or ”say maybe”, or “say no” it will be successfully recognized.
Advanced Example, HTTP POST XML request to detect multiple audio commands from multiple lists.
|
HTTP XML Request and Response |
|
POST /api/xml HTTP/1.1 Content-Length: 91393 Content-Type: text/xml; charset=UTF-8 Host: api.ispeech.org Connection: Keep-Alive Expect: 100-Continue <data> <apikey>developerdemokeydeveloperdemokey</apikey> <action>recognize</action> <content-type>audio/x-wav</content-type> <output>xml</output> <alias>command1|command2|MONITORACTIONS|COLORLIST| DYNAMITEACTIONS|OBJECTLIST</alias> <MONITORACTIONS>on|off|reset</MONITORACTIONS> <COLORLIST>blue|green|red|yellow|purple|orange|black|white|cyan</COLORLIST> <DYNAMITEACTIONS>explode|fizzle out</DYNAMITEACTIONS> <OBJECTLIST>monitor %MONITORACTIONS%|color %COLORLIST%|dynamite %DYNAMITEACTIONS%</OBJECTLIST> <command1>set %OBJECTLIST%</command1> <command2>quit menu</command2> <audio>[base64 encoded set_dynamite_explode.wav without \r\n characters]</audio> </data> |
|
HTTP/1.0 200 OK Connection: close Content-Length: 150 Content-Type: text/xml Cache-Control: no-cache, no-store, must-revalidate, max-age=0, proxy-revalidate, no-transform Pragma: no-cache <?xml version="1.0" encoding="UTF-8"?><data><text>set dynamite explode</text><confidence>0.589247465133667</confidence> <result>success</result></data> |
If a user speaks "set monitor on", or ”set monitor off”, or “set dynamite explode”, etc. it will be successfully recognized.
Error Codes for Speech Recognition and General Errors
|
Code |
Summary |
|
1 |
Invalid API key |
|
3 |
Not enough credits |
|
4 |
No action specified |
|
12 |
Invalid file format |
|
14 |
Invalid dictionary |
|
17 |
Invalid alias list |
|
18 |
Alias missing |
|
19 |
Invalid content type |
|
20 |
Alias list too complex |
|
21 |
Could not recognize |
|
30 |
Option not enabled for your account. Please contact iSpeech sales at +1 (917) 338-7723 or at sales@ispeech.org to modify your license. |
|
100 |
This evaluation account has exceeded its trial period. Please contact iSpeech sales at +1 (917) 338-7723 or at sales@ispeech.org to upgrade your license. |
|
101 |
Your key has been disabled. Please contact iSpeech sales at +1 (917) 338-7723 or at sales@ispeech.org to modify your license. |
|
997 |
No api access |
|
998 |
Unsupported output type |
|
999 |
Invalid request |
|
1000 |
Invalid Request Method POST Required |
Section 4
Translation
Summary
iSpeech Translation API
Documentation on the iSpeech translation API is only available on request. Please send inquiries to sales@ispeech.org.
The API accepts standard HTTP requests in XML, JSON or URL-encoded formats for easy integration in your favorite development language.
Every request must have a valid API key. To obtain an API key please visit our sign up page.
All requests must be sent to:
| URL Encoded: | api.ispeech.org/api/rest |
| XML: | api.ispeech.org/api/xml |
| JSON: | api.ispeech.org/api/json |
The root node for XML requests is "data".
All API calls are synchronous.
HTTP POST must be used for XML or JSON requests, GET request will fail with an error code.
An action must be specified with every request.
The API may be accessed via HTTP or HTTPS (SSL) connections.
You can make a request in XML and receive a response in JSON or URL encoding by specifying the output parameter to JSON, XML, or REST.
The iSpeech Text-To-Speech API allows you to create high quality speech audio in multiple formats, including mp3, wav, wma, mp4, ogg, flac. The iSpeech API is fast enough to not require call-back URL's. This means you'll always receive audio data or an error message in the same request.
Since the Text-To-Speech API is so easy to use, we provide this very simple one-line quick start. Simply change the API key in the URL to your API key, and you'll receive an mp3 containing the text you specified:
http://api.ispeech.org/api/rest?apikey=YOURAPIKEYHERE&action=convert&text=Your+url+encoded+text
All actions return a result value with either success or error. If the result value is error and appropriate error code and message will be returned in the request. Also, a "202 Accepted" response will be issued along with the error result.
The following actions are available for the text to speech API.
Required Parameters:
Optional Parameters:
Response Values:
| credits | integer value | Amount of credits in account |
Example Response:
result=success&credits=1234Required Parameters:
| text | The text to convert to speech |
Optional Parameters:
| voice | The voice to use | usenglishfemale1 |
| format | The file format of the audio file | mp3 |
| speed | An integer value -10 (slowest) to 10 | 0 |
| bitrate | MP3 format Only. A valid bitrate | 32 |
| frequency | MP3 format Only. A valid frequency | 16000 |
| startpadding | Adds silence to start of audio | 0 |
| endpadding | Adds silence to end of audio | 0 |
Response Values:
Returns a HTTP "200 OK" containing the binary audio data on success. If you receive a "202 Accepted", parse the response for an error message.
The iSpeech Automated Speech Recognition API allows you to specify simple word list based grammars or freeform dictation.
Due to the complexity of the Speech Recognition API, we highly recommend using the appropriate language SDK instead of writing your own custom implementation.
Note: You must specify the audio content type in your request. URL encoded requests must past only audio data in their POST. Your requested URL must contain your API key and any other parameters.
The following actions are available for the text to speech API.
Required Parameters:
Required Grammar/Commands Parameters:
alias A pipe ("|") separated list of commands and aliases.
Commands must be prefixed with the label 'command'. Any command or alias can refer to any other command or alias by referring to its name wrapped in percent symbols (%).
You can have multiple commands per alias by separating them with a pipe ("|")
Examples are in XML format for clarity.
Example 1:
<data>
<alias>command1MyCustomCommand</alias>
<command1MyCustomCommand>say yes</command1MyCustomCommand>
</data>
If a user speaks "say yes" it will be successfully recognized.
Example 2:
<data>
<alias>command1MyCustomCommand|names</alias>
<command1MyCustomCommand>call %names%</command1MyCustomCommand>
<names>john|mary|anna</names>
</data>
If a user speaks "call john" or "call anna" or "call mary" it will be successfully recognized.
Response Values:
Example Response:
result=success&text=say+yes&confidence=0.89235
If something goes wrong, you'll receive an error message along with an error code, below is a chart containing the possible error codes.
Example error response:
result=error&code=1&message=Invalid+api+key
| 1 | Invalid api key |
| 2 | Could not convert text |
| 3 | Not enough credits |
| 4 | No action specified |
| 5 | Invalid text |
| 6 | Too many words |
| 7 | Invalid text entry |
| 8 | Invalid voice |
| 12 | Invalid file format |
| 13 | Invalid speed |
| 14 | Invalid dictionary |
| 15 | Invalid bitrate |
| 16 | Invalid frequency |
| 17 | Invalid alias list |
| 18 | Alias missing |
| 19 | Invalid content type |
| 20 | Alias list too complex |
| 21 | Could not recognize |
| 30 | Option not enabled for your account. Please contact iSpeech sales at +1 (917) 338-7723 or at sales@ispeech.org to modify your license. |
| 997 | No api access |
| 998 | Unsupported output type |
| 999 | Invalid request |
| 100 | This evaluation account has exceeded its trial period. Please contact iSpeech sales at +1 (917) 338-7723 or at sales@ispeech.org to upgrade your license. |
| 101 | Your key has been disabled. Please contact iSpeech sales at +1 (917) 338-7723 or at sales@ispeech.org to modify your license. |
| 1000 | Invalid Request Method POST Required |