• Sep 9, 2025

Building a Text-to-Speech API with Azure AI and .NET Core

In this article, we’ll walk through how to build a simple yet powerful Text-to-Speech (TTS) API using Azure AI Speech Services and .NET Core. This API accepts text input and returns synthesized speech as a WAV audio stream — perfect for accessibility, voice interfaces, or media automation.

In this article, we’ll walk through how to build a simple yet powerful Text-to-Speech (TTS) API using Azure AI Speech Services and .NET Core. This API accepts text input and returns synthesized speech as a WAV audio stream — perfect for accessibility, voice interfaces, or media automation.

Step 1: Azure Setup — Provisioning the Speech Service

Before writing any code, you need to set up the Azure Speech resource:

Create the Speech Resource
1. Go to the Azure Portal.
2. Click Create a resource → Search for Speech → Select Speech under AI Services.
3. Choose your subscription, resource group, and region (e.g., ).
4. Click Review + Create, then Create.
🔑 Get Your Credentials
Once deployed:
• Go to the resource.
• Navigate to Keys and Endpoint.
• Copy one of the API keys and the region (e.g., ).
You’ll use these in your .NET Core app to authenticate with Azure.

Step 2: .NET Core Wrapper API — Code Breakdown[HttpPost("speak")] public async Task<IActionResult> Speak(string speech) { if (string.IsNullOrWhiteSpace(speech)) return BadRequest("Text is required."); var audioBytes = await ConvertTextToSpeechAsync(speech); return File(audioBytes, "audio/wav"); }

This endpoint receives a string () and returns a WAV audio stream. If the input is empty, it returns a 400 Bad Request.

Speech Conversion Logic

private async Task<byte[]> ConvertTextToSpeechAsync(string text)
{
    SpeechConfig _config = SpeechConfig.FromSubscription(
        AzureAIConstants.SpeeechApiKey, "eastus"
    );

    using var synthesizer = new SpeechSynthesizer(_config);
    var result = await synthesizer.SpeakTextAsync(text);

    if (result.Reason == ResultReason.SynthesizingAudioCompleted)
    {
        return result.AudioData;
    }

    throw new Exception($"Speech synthesis failed: {result.Reason}");
}

This method:

  • Initializes the SpeechConfig using your Azure key and region.

  • Uses SpeechSynthesizer to convert text to audio.

  • Returns the audio as a byte array if successful.

Dependencies

Install the required NuGet package:

dotnet add package Microsoft.CognitiveServices.Speech

Step 3: Testing the API with Postman

Here’s how to test your API using Postman:
🔹 Request Setup
• Method: 
• URL: 
• Body: Select → → 
• Content: Enter any sentence, e.g., 
🔹 Response
• You’ll receive a WAV audio stream.
• Postman will prompt to download the file or play it depending on your setup.

Below is the screen shot.

By following these steps, you can create a text to speech solution.