ó <±hj)ãóŽ•SSKJrJrJr SSKJr SSKJr SSKJ r \"5(aSSK r SSKJr SS K Jr S r"SS\ 5rg) é)ÚAnyÚUnionÚoverloadé)ÚGenerationConfig)Úis_torch_availableé)ÚPipelineN)Ú%MODEL_FOR_TEXT_TO_SPECTROGRAM_MAPPING)ÚSpeechT5HifiGanzmicrosoft/speecht5_hifiganc ó*^•\rSrSrSrSrSrSrSrSrSr \ "SS9rSSSS.U4S jjrS r Sr\S\S \S\\\44Sj5r\S\\S \S\\\\44Sj5rS\\\\4S\\\\4\\\\444U4SjjrSSjrSrSrU=r$)ÚTextToAudioPipelineéa| Text-to-audio generation pipeline using any `AutoModelForTextToWaveform` or `AutoModelForTextToSpectrogram`. This pipeline generates an audio file from an input text and optional other conditional inputs. Unless the model you're using explicitly sets these generation parameters in its configuration files (`generation_config.json`), the following default values will be used: - max_new_tokens: 256 Example: ```python >>> from transformers import pipeline >>> pipe = pipeline(model="suno/bark-small") >>> output = pipe("Hey it's HuggingFace on the phone!") >>> audio = output["audio"] >>> sampling_rate = output["sampling_rate"] ``` Learn more about the basics of using a pipeline in the [pipeline tutorial](../pipeline_tutorial) You can specify parameters passed to the model by using [`TextToAudioPipeline.__call__.forward_params`] or [`TextToAudioPipeline.__call__.generate_kwargs`]. Example: ```python >>> from transformers import pipeline >>> music_generator = pipeline(task="text-to-audio", model="facebook/musicgen-small", framework="pt") >>> # diversify the music generation by adding randomness with a high temperature and set a maximum music length >>> generate_kwargs = { ... "do_sample": True, ... "temperature": 0.7, ... "max_new_tokens": 35, ... } >>> outputs = music_generator("Techno music with high melodic riffs", generate_kwargs=generate_kwargs) ``` This pipeline can currently be loaded from [`pipeline`] using the following task identifiers: `"text-to-speech"` or `"text-to-audio"`. See the list of available models on [huggingface.co/models](https://huggingface.co/models?filter=text-to-speech). TFé)Úmax_new_tokensN)ÚvocoderÚ sampling_rateÚno_processorcó¦>•[T U]"U0UD6 X0lURS:Xa[ S5eSUlURR[R"5;aGUc=[R"[5RURR5OUUlX lUR b%UR R RUlURc~URR nURR"R%SS5nUbUR'UR)55 SHn[+XhS5nUcMX lM URcTUR(dB[-UR.S5(a&UR.R0RUlgggg)NÚtfz5The TextToAudioPipeline is only available in PyTorch.Úgeneration_config)Úsample_raterÚfeature_extractor)ÚsuperÚ__init__rÚ frameworkÚ ValueErrorrÚmodelÚ __class__rÚvaluesrÚfrom_pretrainedÚDEFAULT_VOCODER_IDÚtoÚdevicerÚconfigÚ__dict__ÚgetÚupdateÚto_dictÚgetattrÚhasattrÚ processorr) ÚselfrrrÚargsÚkwargsr%Ú gen_configÚsampling_rate_namers €Ú\/var/www/html/shao/venv/lib/python3.13/site-packages/transformers/pipelines/text_to_audio.pyrÚTextToAudioPipeline.__init__as~ø€Ü ‰Ò˜$Ð) &Ò)ð)Ôà>‰>˜TÓ!ÜÐTÓUÐUàˆŒØ:‰:×ÑÔ#H×#OÒ#OÓ#QÓQð‘?ô ×/Ò/Ô0BÓC×FÑFÀtÇzÁz×GXÑGXÔYàð ŒLð+ÔØ<‰<Ñ#Ø!%§¡×!4Ñ!4×!BÑ!BˆDÔà×ÑÑ%ð—Z‘Z×&Ñ&ˆFØŸ™×,Ñ,×0Ñ0Ð1DÀdÓKˆJØÑ%Ø— ‘ ˜j×0Ñ0Ó2Ô3ã&FÐ"Ü '¨ÀDÓ I Ø Ó,Ø)6Ö&ñ'Gð×ÑÑ%¨d×.?×.?ÄGÈDÏNÉNÐ\o×DpÑDpØ!%§¡×!AÑ!A×!OÑ!OˆDÕðEqÐ.?Ð%ócóh•[U[5(aU/nURRRS:Xa?UR RRSS5SSSSS.nURU5 UnUR(aUROURnU"U40UDSS 0D6nU$) NÚbarkÚmax_input_semantic_lengthrFTÚ max_length)r8Úadd_special_tokensÚreturn_attention_maskÚreturn_token_type_idsÚpaddingÚreturn_tensorsÚpt)Ú isinstanceÚstrrr%Ú model_typerÚsemantic_configr'r(rÚ tokenizerr,)r-Útextr/Ú new_kwargsÚpreprocessorÚoutputs r2Ú preprocessÚTextToAudioPipeline.preprocess‡s¦€ÜdœC× Ñ Ø6ˆDà:‰:×Ñ×'Ñ'¨6Ó1ð#×4Ñ4×DÑD×HÑHÐIdÐfiÓjØ&+Ø)-Ø).Ø'ñˆJð ×Ñ˜fÔ%àˆFà)-×):×):t—~’~ÀÇÁˆÙ˜dÑB fÑB¸TÒBˆàˆ r4có•URX RS9nUSnUSnURR5(a^URX@RS9nSU;aURUS'URU5 URR"S0UDUD6nOC[U5(a[SUR535eUR"S0UDUD6SnURbURU5nU$)N)r$Úforward_paramsÚgenerate_kwargsrzñYou're using the `TextToAudioPipeline` with a forward-only model, but `generate_kwargs` is non empty. For forward-only TTA models, please use `forward_params` instead of `generate_kwargs`. For reference, the `generate_kwargs` used here are: r©)Ú_ensure_tensor_on_devicer$rÚcan_generaterr(ÚgenerateÚlenrÚkeysr)r-Úmodel_inputsr/rKrLrGs r2Ú_forwardÚTextToAudioPipeline._forwardŸs€à×.Ñ.¨v¿k¹kÐ.ÐJˆØÐ 0Ñ1ˆØ Ð!2Ñ3ˆà:‰:×"Ñ"×$Ñ$à"×;Ñ;¸O×T_ÑT_Ð;Ð`ˆOð#¨/Ó9Ø7;×7MÑ7MÐ 3Ñ4ð ×!Ñ! /Ô2à—Z‘Z×(Ò(ÑJ¨<ÐJ¸>ÑJ‰Fä?×#Ñ#Ü ðKàKZ×K_ÑK_ÓKaÐJbðdóðð —Z’ZÑA ,ÐA°.ÑAÀ!ÑDˆFà<‰<Ñ#à—\‘\ &Ó)ˆFàˆ r4Útext_inputsrKÚreturncó•g©NrM©r-rVrKs r2Ú__call__ÚTextToAudioPipeline.__call__Às€ØSVr4có•grYrMrZs r2r[r\Ãs€Ø_br4có&>•[TU]"U40UD6$)a Generates speech/audio from the inputs. See the [`TextToAudioPipeline`] documentation for more information. Args: text_inputs (`str` or `list[str]`): The text(s) to generate. forward_params (`dict`, *optional*): Parameters passed to the model generation/forward method. `forward_params` are always passed to the underlying model. generate_kwargs (`dict`, *optional*): The dictionary of ad-hoc parametrization of `generate_config` to be used for the generation call. For a complete overview of generate, check the [following guide](https://huggingface.co/docs/transformers/en/main_classes/text_generation). `generate_kwargs` are only passed to the underlying model if the latter is a generative model. Return: A `dict` or a list of `dict`: The dictionaries have two keys: - **audio** (`np.ndarray` of shape `(nb_channels, audio_length)`) -- The generated audio waveform. - **sampling_rate** (`int`) -- The sampling rate of the generated audio waveform. )rr[)r-rVrKrs €r2r[r\Æsø€ô0‰wÒ Ñ>¨~Ñ>Ð>r4cóØ•[USS5bURUS'[USS5bURUS'URUS'U(aUO0U(aUO0S.nUc0n0nXU4$)NÚassistant_modelÚassistant_tokenizerrC)rKrL)r*r`rCra)r-Úpreprocess_paramsrKrLÚparamsÚpostprocess_paramss r2Ú_sanitize_parametersÚ(TextToAudioPipeline._sanitize_parametersàs‹€ô4Ð*¨DÓ1Ñ=Ø15×1EÑ1EˆOÐ-Ñ.Ü4Ð.°Ó5ÑAØ+/¯>©>ˆO˜KÑ(Ø59×5MÑ5MˆOÐ1Ñ2ö1?™nÀBÞ2A™Àrñ ˆð Ñ$Ø "ÐØÐà Ð*<Ð<ÐÓVóØVà Øb D¨¡IÐbÀÐbÈÈdÐSVÐX[ÐS[ÉnÑI]ÓbóØbð?Ø d¨3¡i Ñ0ð?à ˆtC˜H‰~˜t D¨¨c¨¡NÑ3Ð3Ñ 4÷?ð8ØØô =÷.ðr4r)ÚtypingrrrÚ generationrÚutilsrÚbaser roÚmodels.auto.modeling_autorÚ!models.speecht5.modeling_speecht5rr"rrMr4r2Úr‰s>ð÷(Ñ'å)Ý&Ýñ×ÑÛåQÝCà1Ðôk˜(õkr4