Stay organized with collections
Save and categorize content based on your preferences.
You can improve the accuracy of the transcription results you
get from Speech-to-Text by using model adaptation. The model
adaptation feature lets you specify words and/or phrases that
Speech-to-Text must recognize more frequently in your audio data than
other alternatives that might otherwise be suggested. Model adaptation is
particularly useful for improving transcription accuracy in the following use
cases:
Your audio contains words or phrases that are likely to occur frequently.
Your audio is likely to contain words that are rare
(such as proper names) or words that do not exist in general use.
Your audio contains noise or is otherwise not very clear.
Speech Adaptation is an optional Speech-to-Text configuration that you
can use to customize your transcription results according to your needs. See the
RecognitionConfig
documentation for more information about configuring the recognition request
body.
The following code sample shows how to improve transcription accuracy using a
SpeechAdaptation
resource:
PhraseSet,
CustomClass,
and model adaptation boost.
To use a PhraseSet or CustomClass in future requests, make a note of its
resource name, returned in the response when you create the resource.
For a list of the pre-built classes available for your language, see
Supported class tokens.
importosfromgoogle.cloudimportspeech_v1p1beta1asspeechPROJECT_ID=os.getenv("GOOGLE_CLOUD_PROJECT")deftranscribe_with_model_adaptation(audio_uri:str,custom_class_id:str,phrase_set_id:str,)-> str:"""Create `PhraseSet` and `CustomClasses` for custom item lists in input data. Args: audio_uri (str): The Cloud Storage URI of the input audio. e.g. gs://[BUCKET]/[FILE] custom_class_id (str): The unique ID of the custom class to create phrase_set_id (str): The unique ID of the PhraseSet to create. Returns: The transcript of the input audio. """# Specifies the location where the Speech API will be accessed.location="global"# Audio objectaudio=speech.RecognitionAudio(uri=audio_uri)# Create the adaptation clientadaptation_client=speech.AdaptationClient()# The parent resource where the custom class and phrase set will be created.parent=f"projects/{PROJECT_ID}/locations/{location}"# Create the custom class resourceadaptation_client.create_custom_class({"parent":parent,"custom_class_id":custom_class_id,"custom_class":{"items":[{"value":"sushido"},{"value":"altura"},{"value":"taneda"},]},})custom_class_name=(f"projects/{PROJECT_ID}/locations/{location}/customClasses/{custom_class_id}")# Create the phrase set resourcephrase_set_response=adaptation_client.create_phrase_set({"parent":parent,"phrase_set_id":phrase_set_id,"phrase_set":{"boost":10,"phrases":[{"value":f"Visit restaurants like ${{{custom_class_name}}}"}],},})phrase_set_name=phrase_set_response.name# The next section shows how to use the newly created custom# class and phrase set to send a transcription request with speech adaptation# Speech adaptation configurationspeech_adaptation=speech.SpeechAdaptation(phrase_set_references=[phrase_set_name])# speech configuration objectconfig=speech.RecognitionConfig(encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,sample_rate_hertz=24000,language_code="en-US",adaptation=speech_adaptation,)# Create the speech clientspeech_client=speech.SpeechClient()response=speech_client.recognize(config=config,audio=audio)forresultinresponse.results:print(f"Transcript: {result.alternatives[0].transcript}")
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-29 UTC."],[],[],null,["# Send a recognition request with model adaptation\n\nYou can improve the accuracy of the transcription results you\nget from Speech-to-Text by using **model adaptation**. The model\nadaptation feature lets you specify words and/or phrases that\nSpeech-to-Text must recognize more frequently in your audio data than\nother alternatives that might otherwise be suggested. Model adaptation is\nparticularly useful for improving transcription accuracy in the following use\ncases:\n\n1. Your audio contains words or phrases that are likely to occur frequently.\n2. Your audio is likely to contain words that are rare (such as proper names) or words that do not exist in general use.\n3. Your audio contains noise or is otherwise not very clear.\n\nFor more information about using this feature, see\n[Improve transcription results with model adaptation](/speech-to-text/docs/adaptation-model).\nFor information about phrase and character limits per model adaptation request,\nsee [Quotas and limits](/speech-to-text/quotas). Not all models\nsupport speech adaptation. See [Language Support](/speech-to-text/docs/speech-to-text-supported-languages)\nto see which models support adaptation.\n\nCode sample\n-----------\n\nSpeech Adaptation is an optional Speech-to-Text configuration that you\ncan use to customize your transcription results according to your needs. See the\n[`RecognitionConfig`](/speech-to-text/docs/reference/rest/v1/RecognitionConfig)\ndocumentation for more information about configuring the recognition request\nbody.\n\nThe following code sample shows how to improve transcription accuracy using a\n[SpeechAdaptation](/speech-to-text/docs/reference/rest/v1p1beta1/RecognitionConfig#speechadaptation)\nresource:\n[`PhraseSet`](/speech-to-text/docs/reference/rest/v1p1beta1/projects.locations.phraseSets),\n[`CustomClass`](/speech-to-text/docs/reference/rest/v1p1beta1/projects.locations.customClasses),\nand [model adaptation boost](/speech-to-text/docs/adaptation-model#fine-tune_transcription_results_using_boost_beta).\nTo use a `PhraseSet` or `CustomClass` in future requests, make a note of its\nresource `name`, returned in the response when you create the resource.\n\nFor a list of the pre-built classes available for your language, see\n[Supported class tokens](/speech-to-text/docs/class-tokens). \n\n### Python\n\n\nTo learn how to install and use the client library for Speech-to-Text, see\n[Speech-to-Text client libraries](/speech-to-text/docs/client-libraries).\n\n\nFor more information, see the\n[Speech-to-Text Python API\nreference documentation](/python/docs/reference/speech/latest).\n\n\nTo authenticate to Speech-to-Text, set up Application Default Credentials.\nFor more information, see\n\n[Set up authentication for a local development environment](/docs/authentication/set-up-adc-local-dev-environment).\n\n import os\n\n from google.cloud import speech_v1p1beta1 as speech\n\n PROJECT_ID = os.getenv(\"GOOGLE_CLOUD_PROJECT\")\n\n\n def transcribe_with_model_adaptation(\n audio_uri: str,\n custom_class_id: str,\n phrase_set_id: str,\n ) -\u003e str:\n \"\"\"Create `PhraseSet` and `CustomClasses` for custom item lists in input data.\n Args:\n audio_uri (str): The Cloud Storage URI of the input audio. e.g. gs://[BUCKET]/[FILE]\n custom_class_id (str): The unique ID of the custom class to create\n phrase_set_id (str): The unique ID of the PhraseSet to create.\n Returns:\n The transcript of the input audio.\n \"\"\"\n # Specifies the location where the Speech API will be accessed.\n location = \"global\"\n\n # Audio object\n audio = speech.RecognitionAudio(uri=audio_uri)\n\n # Create the adaptation client\n adaptation_client = speech.AdaptationClient()\n\n # The parent resource where the custom class and phrase set will be created.\n parent = f\"projects/{PROJECT_ID}/locations/{location}\"\n\n # Create the custom class resource\n adaptation_client.create_custom_class(\n {\n \"parent\": parent,\n \"custom_class_id\": custom_class_id,\n \"custom_class\": {\n \"items\": [\n {\"value\": \"sushido\"},\n {\"value\": \"altura\"},\n {\"value\": \"taneda\"},\n ]\n },\n }\n )\n custom_class_name = (\n f\"projects/{PROJECT_ID}/locations/{location}/customClasses/{custom_class_id}\"\n )\n # Create the phrase set resource\n phrase_set_response = adaptation_client.create_phrase_set(\n {\n \"parent\": parent,\n \"phrase_set_id\": phrase_set_id,\n \"phrase_set\": {\n \"boost\": 10,\n \"phrases\": [\n {\"value\": f\"Visit restaurants like ${{{custom_class_name}}}\"}\n ],\n },\n }\n )\n phrase_set_name = phrase_set_response.name\n # The next section shows how to use the newly created custom\n # class and phrase set to send a transcription request with speech adaptation\n\n # Speech adaptation configuration\n speech_adaptation = speech.SpeechAdaptation(phrase_set_references=[phrase_set_name])\n\n # speech configuration object\n config = speech.RecognitionConfig(\n encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,\n sample_rate_hertz=24000,\n language_code=\"en-US\",\n adaptation=speech_adaptation,\n )\n\n # Create the speech client\n speech_client = speech.SpeechClient()\n\n response = speech_client.recognize(config=config, audio=audio)\n\n for result in response.results:\n print(f\"Transcript: {result.alternatives[0].transcript}\")\n\n\u003cbr /\u003e"]]