Text to Speech with Natural Language Processing in SwiftUI

DevTechie Inc

Jul 5, 2023

Synthesizing Speech with Natural Language Processing in SwiftUI

NaturalLanguage framework in iOS provides a convenient way to add intelligence to our text based apps.

We can leverage power of Natural Language Processing using built in NaturalLanguage framework in iOS.

One such feature of the framework is to be able to identify dominant language in a given string. Once we learn about the language text is written in, we can combine this information and use it with other frameworks, such as AVKit’s Speech Synthesizer.

AVSpeechSynthesizer gives apps ability to produce synthesized speech from text utterances. It also enables monitoring or controlling of ongoing speech. Combined with Natural Language, we can make our apps speak content in the written different languages.

Today, we will build an example to combine power of NaturalLanguage and AVSpeechSynthesizer to speak content written in different languages.

Note: you will need a device to run this example, as simulator failed on me numerous times while putting this example together.

Let’s start with setup. We will have a TextEditor view for user to type or paste text string. We will have a button which upon tapping, will start reading the content written inside the TextEditor.

struct DevTechieNLSpeechSynthesis: View {
    @State private var text = ""
    
    var body: some View {
        VStack {
            TextEditor(text: $text)
                .padding()
                .overlay(RoundedRectangle(cornerRadius: 10).stroke(Color.gray.opacity(0.5), lineWidth: 2))
                .padding()
            Button("Speak") {
                
            }
        }
    }
}

We will start with the import statements so let’s import AVKit, where AVSpeechSynthesizer lives, also NaturalLanguagewhere NLP related stuff is located.

import AVKit
import NaturalLanguage

We will create instances of NLLanguageRecognizer and AVSpeechSynthesizer next.

@State private var text = ""
let recognizer = NLLanguageRecognizer()
let speechSynthesizer = AVSpeechSynthesizer()

Inside the Button’s action, we first wanna process input text in order to recognize the dominant language

recognizer.processString(text)
let lang = recognizer.dominantLanguage!.rawValue

Once we know the dominant language, we will start working on our text to speech part.

We will create an instance of AVSpeechUtterance, which is an object that encapsulates the text for speech synthesisand parameters that affect the speech.

let utterance = AVSpeechUtterance(string: text)

AVSpeechUtterance has a voice property which is the voice the speech synthesizer uses when speaking the utterance and we can set language for the voice by assigning AVSpeechSynthesisVoice instance. AVSpeechSynthesisVoice creates a distinct voice to use with speech synthesis.

For the language parameter in AVSpeechSynthesisVoice, we will pass our detected dominant language from the input string.

utterance.voice = AVSpeechSynthesisVoice(language: lang)

Before we ask speech synthesizer to start speaking, we will set the AVAudioSession’s sharedInstance with the category as playback

do {
    try AVAudioSession.sharedInstance().setCategory(AVAudioSession.Category.playback)
    try AVAudioSession.sharedInstance().setActive(true)
} catch {
    print(error.localizedDescription)
}

Last but not the least, we will ask AVSpeechSynthesizer to speak the utterance.

speechSynthesizer.speak(utterance)

Our complete code will look like this:

import AVKit
import NaturalLanguage

struct DevTechieNLSpeechSynthesis: View {
    @State private var text = ""
    
    let recognizer = NLLanguageRecognizer()
    let speechSynthesizer = AVSpeechSynthesizer()
    
    var body: some View {
        VStack {
            TextEditor(text: $text)
                .padding()
                .overlay(RoundedRectangle(cornerRadius: 10).stroke(Color.gray.opacity(0.5), lineWidth: 2))
                .padding()
            Button("Speak") {
                recognizer.processString(text)
                let lang = recognizer.dominantLanguage!.rawValue
                
                let utterance = AVSpeechUtterance(string: text)
                utterance.voice = AVSpeechSynthesisVoice(language: lang)
                
                do {
                    try AVAudioSession.sharedInstance().setCategory(AVAudioSession.Category.playback)
                    try AVAudioSession.sharedInstance().setActive(true)
                } catch {
                    print(error.localizedDescription)
                }
                
                speechSynthesizer.speak(utterance)
            }
        }
    }
}

Build and run on device: