Improvement of AI speech synthesis speed... ‘Customers can't tell if it's a human or a robot’

Photo Image — <KT restored the voice of Hae-chul Shin, a popular music artist, and reproduced radio contents.>

KT will apply its artificial intelligence (AI) speech synthesis technology to the audiobook company 'Millie' within this year. Naver and Kakao are rapidly increasing as they enter the advanced technology beyond the commercialization of AI speech synthesis technology. As the number of service application cases increases, user responses are taken into consideration and the quality of technology is expected to improve.

KT plans to apply 'Personalized Text To Speech (P-TTS)' to Millie's technology, which was recently acquired by its subsidiary Genie Music, within this year. P-TTS is a technology that allows voice modulation and synthesis with AI. Existing audiobooks could only be heard with pre-recorded voice actors (voice actors, celebrities) but customers can choose and listen to their desired various voices by applying the technology,

The customers can listen to the children's book recorded with the voice their fathers or change the voice of the user's favorite celebrity. They can listen to the audiobook in various desired tones according to the weather, time, space, and their mood. They can listen to Before bedtime, with a comfortable tone that does not interfere with sleep before bed, or listen to a lively tone on a gloomy day.

KT expect to provide a unique service in the audiobook by applying its technology. A KT official said, “It is possible to reduce production costs. In the future, related services will be able to grow into 'audio industry YouTube', such as revenue sharing with AI audio content using one's own voice.”

KT is speeding up the commercialization of AI speech synthesis technology by restoring and releasing the voice of Hae-chul Shin, who passed away in 2014, prior to the application of P-TTS in Millie's study. The technology restored his voice by learning the radio broadcast data of ‘ Hae-chul Shin’s Ghost Station', which was around for 11 years from 2001 to 2012.

Naver and Kakao, which applied advanced AI speech synthesis technology, are working towards improving the technology.

Naver's strength is its 'NES (Natural End-to-end Speech Synthesis System)' technology. It produces a voice synthesized sound that is difficult to distinguish from a real person with only 40 minutes of recording and 400 sentences of data, Naver has commercialized the speech synthesis API as a paid product through Naver Cloud.Naver is dedicated to expand services by providing free content only for personal content production.

Kakao Enterprise signed a contract with Hyundai Department Store on the 8th to use the 'AI voice bot’. Before and after the Chuseok holiday, some branches of Hyundai Department Store provided an automated voice service with AI to recipients of gifts.

Kakao's STT (Speech To Text), TTS (Text to Speech), and NLU (Natural Language Understanding) technologies were applied to the AI phone voice bot. If one is not conscious of it, it's hard to tell if the person you're talking to is an AI. The technology recognizes the detailed address and date of the building, building, number, floor, etc. When a customer requests to change the delivery address, it asks “Where can I send the delivery, then?” to obtain additional information.

The industry predicted that the quality of AI speech synthesis will improve and services will diversify as data accumulates as the number of practical use cases increases.

By Staff Reporter Si-so Kim (siso@etnews.com)