ó <±hŸãóp•SrSSKJr SSKJrJr SSKJr \R"\ 5r "SS\5rS/rg) zTokenization class for Dia.é)ÚOptionalé)Ú AddedTokenÚPreTrainedTokenizer)Úloggingc óÞ^•\rSrSrSrSS/rSS\\S\\S\\S\4U4S jjjr \ S 5rSrS\S \ \4SjrSrSrS\ \S \4SjrSS\S\\S \\4SjjrSrU=r$)ÚDiaTokenizeréa€ Construct a Dia tokenizer. Dia simply uses raw bytes utf-8 encoding except for special tokens `[S1]` and `[S2]`. This tokenizer inherits from [`PreTrainedTokenizerFast`] which contains most of the main methods. Users should refer to this superclass for more information regarding those methods. Args: pad_token (`str`, *optional*, defaults to `""`): The token used for padding, for example when batching sequences of different lengths. unk_token (`str`, *optional*, defaults to `""`): The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this token instead. max_length (`int`, *optional*, defaults to 1024): The maximum length of the sequences when encoding. Sequences longer than this will be truncated. offset (`int`, *optional*, defaults to 0): The offset of the tokenizer. Ú input_idsÚattention_maskÚ pad_tokenÚ unk_tokenÚ max_lengthÚoffsetcó >•[U[5(a[U5OUn[U[5(a[U5OUnSUlU[S5[S5S.UlX@l[TU]"SUUUS.UD6 g)Néz[S1]z[S2])réé)rr r©)Ú isinstanceÚstrrÚ_utf_vocab_sizeÚ_added_tokens_decoderrÚsuperÚ__init__)Úselfr rrrÚkwargsÚ __class__s €Ú`/var/www/html/shao/venv/lib/python3.13/site-packages/transformers/models/dia/tokenization_dia.pyrÚDiaTokenizer.__init__/sø€ô.8¸ Ä3×-GÑ-G”J˜yÔ)ÈYˆ Ü-7¸ Ä3×-GÑ-G”J˜yÔ)ÈYˆ à#ˆÔØ)2´zÀ&Ó7IÌjÐY_ÓN`Ñ%aˆÔ"ØŒÜ ‰Òð ØØØ!ñ ðó ócó•UR$©N)r)rs rÚ vocab_sizeÚDiaTokenizer.vocab_sizeEs€à×#Ñ#Ð#r!cóÆ•[URUR-5Vs0sHoRU5U_M nnUR UR 5 U$s snfr#)Úranger$rÚconvert_ids_to_tokensÚupdateÚadded_tokens_encoder)rÚiÚvocabs rÚ get_vocabÚDiaTokenizer.get_vocabIsX€Ü;@ÀÇÁÐSW×S^ÑS^ÑA^Ô;_Ó`Ñ;_°a×+Ñ+¨AÓ.°Ò1Ñ;_ˆÐ`Ø ‰T×.Ñ.Ô/Øˆùòas¥AÚtextÚreturncób•URS5Vs/sHn[U5PM nnU$s snf)zPTake as input a string and return a list of strings (tokens) for words/sub-wordsúutf-8)ÚencodeÚchr)rr/r+Útokenss rÚ _tokenizeÚDiaTokenizer._tokenizeNs/€à"&§+¡+¨gÔ"6Ó7Ñ"6˜Q”#a–&Ñ"6ˆÐ7Øˆ ùò8s”,có\•[U5S:waSnU$[U5UR-nU$)z0Converts a token (str) in an id using the vocab.rN)ÚlenÚordr)rÚtokenÚtoken_ids rÚ_convert_token_to_idÚ!DiaTokenizer._convert_token_to_idSs4€ôˆu‹:˜‹?ØˆHðˆô˜5“z D§K¡KÑ/ˆHàˆr!có4•[XR- 5nU$)z=Converts an index (integer) in a token (str) using the vocab.)r4r)rÚindexr;s rÚ_convert_id_to_tokenÚ!DiaTokenizer._convert_id_to_token]s€äEŸK™KÑ'Ó(ˆØˆr!r5có•SnUHrnX0R;a*URUn[U5RS5nO2X0R;aURS5nOURS5nX%- nMt UR SSS9nU$)z:Converts a sequence of tokens (string) in a single string.r!r2Úignore)Úerrors)Úadded_tokens_decoderrr3r*Údecode)rr5Úbstringr;Úadded_token_objÚ tok_stringÚstrings rÚconvert_tokens_to_stringÚ%DiaTokenizer.convert_tokens_to_stringbs‹€àˆÛˆEØ×1Ñ1Ó1Ø"&×";Ñ";¸EÑ"BÜ Ó1×8Ñ8¸ÓA‘ Ø×3Ñ3Ó3Ø"Ÿ\™\¨'Ó2‘ à"Ÿ\™\¨'Ó2 ØÑ!ŠGñð—‘ °Ð9ˆØˆ r!Úsave_directoryÚfilename_prefixcó•g)Nrr)rrNrOs rÚsave_vocabularyÚDiaTokenizer.save_vocabularyrs€Ør!)rrr)úrSirr#)Ú__name__Ú __module__Ú__qualname__Ú__firstlineno__Ú__doc__Úmodel_input_namesrrÚintrÚpropertyr$r-Úlistr6r=rArLÚtuplerQÚ__static_attributes__Ú __classcell__)rs@rr r sÛø†ñð$%Ð&6Ð7Ðð$+Ø#*Ø$(Øñ à˜C‘=ð ð˜C‘=ð ð˜S‘Mð ð ÷ ð ð,ñ$óð$òð ˜cð d¨3¡iôò òð ¨t°C©yð ¸Sô ñ ¨cðÀHÈSÁMðÐ]bÐcfÑ]g÷ór!r N) rXÚtypingrÚtokenization_utilsrrÚutilsrÚ get_loggerrTÚloggerr Ú__all__rr!rÚrfs?ðñ"åçAÝð × Ò ˜HÓ %€ôYÐ&ôYðxÐ r!