TTS: Google Translate vs Cloud in Home Assistant

Google Translate Vs Google Cloud

This article explains the difference between the google_translate and google_cloud platforms in Home Assistant and their proper integration.

Google Translate: Configuration

The Google Translate Text-To-Speech integration uses the unofficial Google Translate TTS engine to read input text in natural sounding voices.

language string (optional, default: en)
The default speech language to use. For full list click HERE
cache boolean (optional, default: true)
Allow TTS to cache voice file to local storage.
cache_dir string (optional, default: tts)
Folder name or path to a folder for caching files.
time_memory integer (optional, default: 300)
Time to hold the voice data inside memory for fast play on a media player. Minimum is 60 s and the maximum 57600 s (16 hours).
base_url string (optional, default: value of internal URL)
A base URL to use instead of the one set in the Home Assistant configuration. It is used as-is by the tts component. In particular, you need to include the protocol scheme http:// or https:// and the correct port number. They will not be automatically added for you.
service_name string (optional)
Define the service name.
Default: The service name default set to _say. For example, for google_translate tts, its service name default is google_translate_say.

IMPORTANT: If you are using SSL certificate to access your Home Assistant server, you must provide to URL to enable the google translate service. Google Cast devices reject self-signed certificates and simply providing the internal IP when using SSL will make the google cast devices refuse the connection, thus you need to use your host name (eg. my-hostname.duckdns.org).

If you are not using SSL simply provide the Internal IP because the cast device will not have to resolve the host name. Set your host name under base_url.

Example of full configuration

#Add this to your configuration.yaml
tts:
  - platform: google_translate
    language: "en"
    service_name: google_say
    cache: true
    cache_dir: /tmp/tts
    time_memory: 300
    base_url: https://yourdomain.duckdns.org

Google Cloud: Configuration

The google_cloud platform allows you to use the Google Cloud Platform APIs and integrate them into Home Assistant. Before we can configure the google_cloud tts service, we need to obtain an API key from google’s resource manager.

IMPORTANT: Google requires a billing account to be setup in order to use their APIs. This does not mean you will be charged for using the API, as long as you do not exceed the word quota.

Google Cloud: Setting up a billing account

Visit Google’s Billing Account site
Click Add a Billing Account
Select your Country and Organization needs (eg. Personal Project) and agree to the Terms Of Use
Add your phone number and verify it via SMS
In Account Type, select Individual
Setup your Payment Method (country dependent) and click Start my Free Trial
Done

Google Cloud: Obtaining an API key

Visit Google’s Cloud Resource Manager site
Create New Project, specify Name and click Create
Visit Google’s APIs Library
Search for “text-to-speech” and click on the Cloud Text-To-Speech API
4.1 Or follow THIS direct link to enable TTS
Select your project from the dropdown at the top and click enable
5.1 If you get a Billing required you need to setup a billing account first
Setup authentication
6.1 Visit Google’s Cloud Resource Manager site
6.2 Click the hamburger menu of the left and choose IAM & Admin
6.3 Click Service Accounts on the left menu, than click Create new service account on the top
6.4 Name your account, ID field is populated automatically, Click Create and continue
6.5 Click Continue again without selecting a role (not needed for TTS), click Done
6.6 Open the account by clicking its name under the email tab
6.7 Go the KEYS tab and press ADD KEY, Create New Key
6.8 Select JSON which will download the JSON file containing your key to your PC
Upload the file to the config folder of your Home Assistant server

Google Cloud: Configuration Variables

key_file string (optional)
The API key file to use with Google Cloud Platform. If not specified os.environ[‘GOOGLE_APPLICATION_CREDENTIALS’] path will be used.
language string (optional, default: en-US)
Default language of the voice, e.g., en-US. Supported languages, genders and voices listed here. Also there are extra not documented but supported languages (see dropdown here).
gender string (optional, default: neutral)
Default gender of the voice, e.g., male. Supported languages, genders and voices listed here.
voice string (optional)
Default voice name, e.g., en-US-Wavenet-F. Supported languages, genders and voices listed here. Important! This parameter will override language and gender parameters if set.
encoding string (optional, default: mp3)
Default audio encoder. Supported encodings are ogg_opus, mp3 and linear16.
speed float (optional, default: 1.0)
Default rate/speed of the voice, in the range [0.25, 4.0]. 1.0 is the normal native speed supported by the specific voice. 2.0 is twice as fast, and 0.5 is half as fast. If unset(0.0), defaults to the native 1.0 speed.
pitch float (optional, default: 0.0)
Default pitch of the voice, in the range [-20.0, 20.0]. 20 means increase of 20 semitones from the original pitch. -20 means decrease of 20 semitones from the original pitch.
gain float (optional, default: 0.0)
Default volume gain (in dB) of the voice, in the range [-96.0, 16.0]. If unset, or set to a value of 0.0 (dB), will play at normal native signal amplitude. A value of -6.0 (dB) will play at approximately half the amplitude of the normal native signal amplitude. A value of +6.0 (dB) will play at approximately twice the amplitude of the normal native signal amplitude. Strongly recommend not to exceed +10 (dB) as there’s usually no effective increase in loudness for any value greater than that.
profiles list (optional, default: [])
An identifier which selects ‘audio effects’ profiles that are applied on (post synthesized) text to speech. Effects are applied on top of each other in the order they are given. Supported profile ids listed here.
text_type string (optional, default: text)
Default text type. Supported text types are text and ssml. Read more on what is that and how to use SSML here.

Example of full configuration

#Add this to your configuration.yaml
tts:
  - platform: google_cloud
    key_file: my-apikey-cloud.json
    service_name: google_cloud
    language: en-US
    gender: male
    voice: en-GB-Wavenet-D
    speed: 1.1
    pitch: -2
    gain: 0.0
    text_type: text

Pricing: Standard vs Wavenet voices

Text-to-Speech is priced based on the number of characters sent to the service to be synthesized into audio each month. The total number of characters in the input string are counted for billing purposes, including spaces.

The full list of available voices can be found HERE

Standard

Free Tier
- 0-4 Million Characters
- 600.000-800.00 Words
Price after exceeding limit
- $0.000004 per character ($4 USD per 1 Million Characters

Wavenet

Free Tier
- 0-1 Million Characters
- 150.000-200.00 Words
Price after exceeding limit
- $0.000016 per character ($16 USD per 1 Million Characters

Comparison

The google_translate and google_cloud integrations in home assistant, serve the same purpose: Converting Text-to-Speech via their respective service. We will list the pros & cons of each so you can decide which one is for you.

Google Translate

Pros

Easier to setup
No billing account required
No google cloud platform required
More languages supported – 100+
No character limit

Cons

Robotic sounding voice
Inability to change voice in Home Assistant
Static speed, pitch & gain
Need to include base_url in configuration to bypass google rejecting self-signed certificates

Google Cloud

Pros

More natural sounding voices
Higher quality speech synthesis
SSML support for customization of audio response
Audio profiles can be applied on post-synthesized TTS

Cons

More difficult to setup
Google cloud platform required
Billing Account Required
Character limit
Less Languages available

Summary

Both integrations work seamlessly in Home Assistant if setup correctly. We recommend trying them both and see which one is for you. If you are a power user, and have a lot of audible notifications, announcements etc. yelling through your smart speaker all the time, and if cost is your concern, than google translate may be for you. Easier to setup and no character limit.

Although, we must bring to your attention that the free tier of google cloud tts lets you use a large number of characters free of charge (4 million for standard voices and 1 million for WaveNet voices). This would be around 600.000 and 150.000 words respectively! Which is a LOT!

You can use both integrations simultaneously, and call on them via their services.

Sources:

3 thoughts on “TTS: Google Translate vs Google Cloud in Home Assistant”

Bismarck
June 14, 2022 at 6:24 am
Thanks for the very detailed explanation. But I am getting an error while testing it. I have the feeling that I placed the JSON file in the wrong directory. Can you help me to clarify what is the default directory or how we can specify the path to the JSON file? In my case, I added it to the folder where the config file is.
“key_file string (optional)
The API key file to use with Google Cloud Platform. If not specified os.environ[‘GOOGLE_APPLICATION_CREDENTIALS’] path will be used.”
SHS
June 20, 2022 at 7:57 am
Yes, you need to move the file to /config/my-file.json. Another important step people miss is to enable “Test” environment within your project in Google’s service account, otherwise HA cant communicate with Google’s servers