FAQ Text-to-Speech based on Google Cloud – Knowledgebase

Q: What is Text-to-Speech based on Google Cloud?

A: Text-to-Speech based on Google Cloud is a feature from Visiolink which utilizes Googles high-quality speech synthesis service to enrich your articles with narration. We use Googles Wavenet voices constantly improved through AI technology. You can choose the share of articles to make available with narration, if you want to focus only on the top stories or you want to have them all read aloud. The share is defined as a percentage of the longest articles (most characters) of each section. You can chose between a male and a female narrator. The feature is available for your ePaper readers on iOS, Android and Web (on Web, the Desktop Web App is required).

Q: What are the benefits of the feature?

A: We see some media companies successfully launch the choice of either reading or listening to their news stories. The choice clearly expands the time and attention span in which the content is relevant. Now the readers can consume the news stories not only while sitting at the breakfast table or in the train, but also while driving their car, exercising or doing household chores. This way a media company can reach out to a wider audience, or keep existing subscribers onboard despite the fact that they feel time-challenged in their daily life and don’t have the time to sit down and read the paper.

Other benefits include:

Cheap and time-efficient way of enriching your articles with narration
Fully automized process starting the minute you deliver your ePaper to Visiolink
Based on a constantly improving service from Google Cloud
Corrections implemented in one language, e.g. mis-pronunciations, abbreviations, will benefit all titles in the same language
Integrated with the Visiolink podcast player giving the users great handles to inject their favorite news content into their ears

Q: How does et work in the ePaper app?

A: In the iOS News Modules app, the feature is combined with the Article Teasers Module on the front screen. On Android the articles are displayed in their own card with a full view available in the top bar. Furthermore, we have made the feature way more visible in the article view. From the front screen, it is possible to play all articles at once or to play them individually by building a queue of articles. Playing an article activates the audio player (known from the Visiolink podcast feature) making the user experience quite convincing.

Playing an article will validate against the corresponding epaper, meaning that if the readers have access to the ePaper, they also have access to the narrated articles. Articles are downloaded temporarily to the device and available offline.

Q: How can we test the product?

A: We have built a test system which in a few minutes allows us to create an audio file of one of your latest articles. You don’t have to create a Google Cloud account to test your own content, and you can listen to the file as many times as you like, when you have the time.

When going live, we have to set the audio live on your existing app in the article view, as it works server side. This means that your existing users will experience the play button in the article view and will be able to listen to the articles even before the app is upgraded. The front screen module will not go live on existing versions - only on the new version of the app.

Q: How can we get hold of the audio files?

A: After the audio files are generated, we make them available on the same FTP server as you normally drop your ePaper content for processing. If you prefer, we can send them to another FTP of your choice.

Q: How does processing of the ePaper in combination with processing of audio files work?

A: When you deliver your ePaper content to Visiolink, we start processing it for the app. As soon as the articles have been processed, the share you have chosen to enrich is converted into so called SSML files and sent to Googles service. They return an MP3 file for each article, which is matched with the corresponding article and made available to the app. Everything is fully automated.

Q: How much can we expect to pay Google for the service?

A: Google charges 16 USD per 1 million characters and bills every month. Every month the first 1.000.000 characters are free. Below we have put up a pricing example giving you an idea of the expenses towards Google.

The example is based on a daily medium/large newspaper processing top 50 % of their articles. In average, they enrich 36 articles with audio a day with an average of 5.000 characters spanning from 1.200 to 20.000 characters. This makes 180.000 characters per day, leading to a cost of:

180.000 characters x 31 days
- 1.000.000 free characters
x 16 USD per 1.000.000 characters

= USD 73 per month

Q: What happens if we change article content and reprocess the ePaper?

A: Reprocessing the ePaper with updated content or changed articles will also reprocess the narrated articles. We will scan and compare each article with the previous version and fetch new audio if anything has changed. This way, we ensure that important changes are included and at the same time we avoid fetching audio unnecessarily.

Q: How do you handle content that is shared between regional editions?

A: Shared or repeated content is not sent multiple times to Google. So if your regional editions share content, we will use the same audio files across regions. This actually also works across apps and completely different titles.

Q: What are the requirements for implementing the feature?

A: You will need to set up a Google Cloud account and send us an API key, so we can match your content to your account. Follow our guide to create the account and send the API key. Furthermore, your article content must be delivered as XML files to our FTP server. On web, the Desktop Web App is a requirement.

Guide for creating a Google Cloud Account

Q: How can we track the use of the articles?

A: Tracking of narrated articles is hooked up to the metrics of the Engagement Event (available through Google Analytics), providing you with both number of times an article has been played and the duration of each session.

Q: How can we report mispronounced words?

A: Machine narration will have flaws and mispronunciations. Some words, e.g. local geographical names, people names, will have the need for correction. Visiolink has the advantage of having many customers sharing the same language. Corrections will benefit all titles sharing the same language, as they will be added to a common database of corrections. You can report mispronounced words by filling out this form: https://podio.com/webforms/26051131/1955259

Q: How do you treat the articles before sending them to Google?

A: We normalize article content when we receive it from you. This means that we have the same article elements available across all titles, so we can enrich the articles on a general level with pauses, rules and corrections (corrections apply per language). General settings we do to all articles:

Pauses: We have set 1.200 ms (1,2 seconds) after the headline and between paragraphs
Removed content: We have by default removed intermediate headings and bylines, as these elements can be very confusing, if you only listen to the article without having the text in front of you. If you wish, we can turn on/off both blurbs, intermediate headings and bylines for individual titles.

Please note: The better content, we receive from you, the better quality the narration is. We can for instance only remove intermediate headlines, if they are tagged as such and not just part of the body text.

Q: What is written in fine print?

A: Google is a third-party provider of which we have no control or guaranties. If the Google service is not responding or they change their technical setup, the audio articles risk being unavailable. Visiolink cannot be held responsible for interruptions caused by the Google service. We consider article content processed on Googles servers non-personal, and thus not under the GDPR legislation.

Comments