Friday, April 26, 2024

AI to the Rescue

 Three weeks ago, I happened to catch a discussion about the paucity of audiobooks compared to their readable brethren‚ eBooks and Print books. The participants were deploring the disparity, at the same time they lamented the cause—the high cost of hiring a narrator as well as the cost of editing and mastering the recorded audio files. That, they said, would condemn audiobooks to limited numbers, and therefore a limited readership, a situation disadvantaging folks with vision issues.

A week later, I was updating one of my titles on Google Play when I spotted an addition to their interface: they were offering to convert my eBook to an audiobook, using, you guessed it, a digital voice. In other words, AI.

I couldn’t think of a good reason to NOT test this free offer. Nevertheless, being familiar with the digital voices used by Google Maps and iMaps, I thought I would probably be disappointed. Could their process handle dialog and tricky intonation? I wasn’t optimistic. Boy, was I wrong… The sample voices I listened to were amazing and the choice of voices—male and female, American and British, old and young—was astounding.

The conversion took less than thirteen seconds. My son, who’s into AI says the amount of processing power used in all AI applications—the conversion of eBooks to audiobooks being only one to them— is also astounding. He says AI is not only a toolset with an incredible future, it is an investment bonanza right now.

Sharess of AI stocks

As soon as the file was converted, it opened a studio interface for editing/correcting pronunciation. In my opinion, this interface was not quite ready for prime time. The whole setup smacked of Beta.

Shortly after this experience, I discovered a joint venture between Apple and D2D offering eBook-to-audiobook conversion. At the moment, their process takes place behind closed doors. Their digital voice samples, however, were indistinguishable from human voices.

Apple quality digital voices

This from the D2D website:

“Once your request is submitted, it takes one to two months to process the book and conduct quality checks that include file quality, content compatibility (i.e. no complex formatting elements, limited non-English words and phrases), and editorial review. Pre-orders are not currently supported.

Even more recently, I discovered Amazon now offers a conversion process similar to Google’s, only more stable. A Beta, their editing interface is impressive as are their voice samples.

Amazon Beta audiobook production

This impending proliferation of audiobooks will benefit 80-something readers and writers with impaired vision. More than the general readership? Who knows? It may take a while for cost and quality to normalize.

An interesting sidebar to the whole thing—purely a subjective impression on my part—is that many existing audiobooks have been recorded with inferior human narration. They can now be re-recorded with a much-improved narration. Yes, I’m saying that many of the human voices offered on the large audiobook sites are not up to standard, mainly because there is no standard. Ironically, AI narration will set a standard.


No comments:

Post a Comment