ó <±h¿+ãóÄ•SrSSKJr SSKrSSKJr SSKJrJ r J r SSKJrJ r JrJr SSKJrJr SS KJr \R*"\5r"S S\ SS 9r"SS\5rS/rg)z Processor class for Llava. é)ÚUnionNé)ÚBatchFeature)Ú ImageInputÚget_image_sizeÚto_numpy_array)ÚMultiModalDataÚProcessingKwargsÚProcessorMixinÚUnpack)ÚPreTokenizedInputÚ TextInput)Úloggingcó$•\rSrSrSSS.0S.rSrg)ÚLlavaProcessorKwargsé&F)ÚpaddingÚreturn_mm_token_type_ids)Útext_kwargsÚ images_kwargs©N)Ú__name__Ú __module__Ú__qualname__Ú__firstlineno__Ú _defaultsÚ__static_attributes__róÚb/var/www/html/shao/venv/lib/python3.13/site-packages/transformers/models/llava/processing_llava.pyrr&s†à#(ÀeÑLØñƒIrrF)Útotalc óº^•\rSrSrSrSS/rSrSrSU4SjjrSS\ S \ \\\ \\ \4S \\S\4SjjrSS jrSrSr\S5rSrU=r$)ÚLlavaProcessoré-a Constructs a LLaVa processor which wraps a LLaVa image processor and a LLaMa tokenizer into a single processor. [`LlavaProcessor`] offers all the functionalities of [`LlavaImageProcessor`] and [`LlamaTokenizerFast`]. See the [`~LlavaProcessor.__call__`] and [`~LlavaProcessor.decode`] for more information. Args: image_processor ([`LlavaImageProcessor`], *optional*): The image processor is a required input. tokenizer ([`LlamaTokenizerFast`], *optional*): The tokenizer is a required input. patch_size (`int`, *optional*): Patch size from the vision tower. vision_feature_select_strategy (`str`, *optional*): The feature selection strategy used to select the vision feature from the vision backbone. Should be same as in model's config chat_template (`str`, *optional*): A Jinja template which will be used to convert lists of messages in a chat into a tokenizable string. image_token (`str`, *optional*, defaults to `""`): Special token used to denote image location. num_additional_image_tokens (`int`, *optional*, defaults to 0): Number of additional tokens added to the image embeddings, such as CLS (+1). If the backbone has no CLS or other extra tokens appended, no need to set this arg. Úimage_processorÚ tokenizerÚAutoImageProcessorÚ AutoTokenizercóÒ>•X0lXplX@l[US5(aUROUUlURURSS9SUl[T U]!XUS9 g)NÚimage_tokenF)Úadd_special_tokensr)Ú chat_template) Ú patch_sizeÚnum_additional_image_tokensÚvision_feature_select_strategyÚhasattrr)ÚencodeÚimage_token_idÚsuperÚ__init__) Úselfr$r%r,r.r+r)r-ÚkwargsÚ __class__s €rr3ÚLlavaProcessor.__init__Ksjø€ð%ŒØ+FÔ(Ø.LÔ+Ü4;¸IÀ}×4UÑ4U˜9×0Ò0Ð[fˆÔØ'×.Ñ.¨t×/?Ñ/?ÐTYÐ.ÐZÐ[\Ñ]ˆÔÜ ‰Ñ˜À=ÐÒQrÚimagesÚtextr5ÚreturncóH•UcUc[S5eUR"[4SURR0UD6nUbUR "U40USD6nO0n[ U[5(aU/nO8[ U[5(d#[ US[5(d[S5eUnURS5b¢USn [[U S55up«X R-X°R--UR-nURS:XaUS -n/nUH=n U R!UR"UR"U-5n UR%U 5 M? US R'SS5nUS R'SS 5nUR"U40US DSS0D6nUR)UUS/S9 U(aW[*R,"US5n[*R."US5nS UUUR0:H'UR35US'[50UEUEUS9$)a÷ Main method to prepare for the model one or several sequences(s) and image(s). This method forwards the `text` and `kwargs` arguments to LlamaTokenizerFast's [`~LlamaTokenizerFast.__call__`] if `text` is not `None` to encode the text. To prepare the image(s), this method forwards the `images` and `kwrags` arguments to CLIPImageProcessor's [`~CLIPImageProcessor.__call__`] if `images` is not `None`. Please refer to the docstring of the above two methods for more information. Args: images (`PIL.Image.Image`, `np.ndarray`, `torch.Tensor`, `list[PIL.Image.Image]`, `list[np.ndarray]`, `list[torch.Tensor]`): The image or batch of images to be prepared. Each image can be a PIL image, NumPy array or PyTorch tensor. Both channels-first and channels-last formats are supported. text (`str`, `list[str]`, `list[list[str]]`): The sequence or batch of sequences to be encoded. Each sequence can be a string or a list of strings (pretokenized string). If the sequences are provided as list of strings (pretokenized), you must set `is_split_into_words=True` (to lift the ambiguity with a batch of sequences). return_tensors (`str` or [`~utils.TensorType`], *optional*): If set, will return tensors of a particular framework. Acceptable values are: - `'tf'`: Return TensorFlow `tf.constant` objects. - `'pt'`: Return PyTorch `torch.Tensor` objects. - `'np'`: Return NumPy `np.ndarray` objects. - `'jax'`: Return JAX `jnp.ndarray` objects. Returns: [`BatchFeature`]: A [`BatchFeature`] with the following fields: - **input_ids** -- List of token ids to be fed to a model. Returned when `text` is not `None`. - **attention_mask** -- List of indices specifying which tokens should be attended to by the model (when `return_attention_mask=True` or if *"attention_mask"* is in `self.model_input_names` and if `text` is not `None`). - **pixel_values** -- Pixel values to be fed to a model. Returned when `images` is not `None`. Nz7You have to specify at least one of `images` or `text`.Útokenizer_init_kwargsrrzAInvalid input text. Please provide a string, or a list of stringsÚpixel_valuesÚdefaultérÚreturn_tensorsrFÚimage)Ú modalitiesÚ input_idsÚmm_token_type_ids)ÚdataÚtensor_type)Ú ValueErrorÚ _merge_kwargsrr%Úinit_kwargsr$Ú isinstanceÚstrÚlistÚ TypeErrorÚgetrrr,r-r.Úreplacer)ÚappendÚpopÚ_check_special_mm_tokensÚnpÚarrayÚ zeros_liker1Útolistr)r4r8r9ÚaudioÚvideosr5Ú output_kwargsÚimage_inputsÚprompt_stringsr=ÚheightÚwidthÚnum_image_tokensÚsampler@rÚtext_inputsÚ array_idsrDs rÚ__call__ÚLlavaProcessor.__call__]s=€ðN‰>˜d™lÜÐVÓWÐWà×*Ò*Ü ñ à"&§.¡.×"<Ñ"<ð ðñ ˆ ð ÑØ×/Ò/°ÑY¸-ÈÑ:XÑY‰LàˆLädœC× Ñ Ø6‰DÜ˜D¤$×'Ñ'´ ¸4À¹7ÄC×0HÑ0HÜÐ_Ó`Ð`ðˆØ×Ñ˜NÓ+Ñ7à'¨Ñ7ˆLÜ*¬>¸,Àq¹/Ó+JÓK‰MˆFØ &¯/©/Ñ 9ØŸ™Ñ(ñ à×0Ñ0ñ 1Ðð×2Ñ2°iÓ?Ø AÑ%Ð àˆNÛØŸ™¨×(8Ñ(8¸$×:JÑ:JÐM]Ñ:]Ó^Ø×%Ñ% fÖ-ñð' }Ñ5×9Ñ9Ð:JÈDÓQˆØ#0°Ñ#?×#CÑ#CÐD^Ð`eÓ#fÐ Ø—n’n ^Ñi°}À]Ñ7SÑiÐdhÒiˆØ×%Ñ% n°kÈwÈiÐ%ÑXæ#ÜŸš ¨[Ñ!9Ó:ˆIÜ "§ ¢ ¨k¸+Ñ.FÓ GÐØBCÐ˜i¨4×+>Ñ+>Ñ>Ñ?Ø/@×/GÑ/GÓ/IˆKÐ+Ñ,äÐ!@ KÐ!@°<Ð!@ÈnÑ]Ð]rcóÖ•0nUbÚ[RRS05nURU5 URSS5=(d URR nUSUSpvX`R-XpR--nX€R- nURS:XaUS-nU/[U5-nS/[U5-n URX‰S.5 [S 0UD6$) a{ Computes the number of placeholder tokens needed for multimodal inputs with the given sizes. Args: image_sizes (`list[list[int]]`, *optional*): The input sizes formatted as (height, width) per each image. Returns: `MultiModalData`: A `MultiModalData` object holding number of tokens per each of the provided input modalities, along with other useful data. NrÚ crop_sizer\r]r>r?)r^Únum_image_patchesr)rrrNÚupdater$rer,r-r.Úlenr ) r4Úimage_sizesr5Úvision_datarreÚresized_heightÚ resized_widthr^rfs rÚ_get_num_multimodal_tokensÚ)LlavaProcessor._get_num_multimodal_tokens´sò€ðˆØÑ"Ü0×:Ñ:×>Ñ>¸ÐPRÓSˆMØ× Ñ Ô(Ø%×)Ñ)¨+°tÓ<×^À×@TÑ@T×@^Ñ@^ˆIØ,5°hÑ,?ÀÈ7ÑAS˜Mà .·/±/Ñ AÀm×WfÑWfÑFfÑgÐØ× @Ñ @Ñ@ÐØ×2Ñ2°iÓ?Ø AÑ%Ð à 0Ð1´C¸Ó4DÑDÐØ!" ¤c¨+Ó&6Ñ 6ÐØ×ÑÐ4DÑmÔnäÑ, Ñ,Ð,rcó:•URR"U0UD6$)zª This method forwards all its arguments to LlamaTokenizerFast's [`~PreTrainedTokenizer.batch_decode`]. Please refer to the docstring of this method for more information. )r%Úbatch_decode©r4Úargsr5s rrpÚLlavaProcessor.batch_decodeÔs€ð ~‰~×*Ò*¨DÐ;°FÑ;Ð;rcó:•URR"U0UD6$)z¤ This method forwards all its arguments to LlamaTokenizerFast's [`~PreTrainedTokenizer.decode`]. Please refer to the docstring of this method for more information. )r%Údecoderqs rruÚLlavaProcessor.decodeÜs€ð ~‰~×$Ò$ dÐ5¨fÑ5Ð5rcóš•URRnURRn[[RX-55$©N)r%Úmodel_input_namesr$rLÚdictÚfromkeys)r4Útokenizer_input_namesÚimage_processor_input_namess rryÚ LlavaProcessor.model_input_namesãs>€ð!%§¡× @Ñ @ÐØ&*×&:Ñ&:×&LÑ&LÐ#Ü”D—M‘MÐ"7Ñ"UÓVÓWÐWr)r)r1r-r,r.)NNNNNzr)NNNNrx)rrrrÚ__doc__Ú attributesÚimage_processor_classÚtokenizer_classr3rrrr rLrrrrbrmrpruÚpropertyryrÚ __classcell__)r6s@rr"r"-sÅø†ñð2$ [Ð1€JØ0ÐØ%€OðØØØ'+ØØØ$%÷Rð("Ø^bØØñU^àðU^ðIÐ0°$°y±/À4ÐHYÑCZÐZÑ[ðU^ðÐ-Ñ.ð U^ð õU^ôn-ò@<ò6ðñXóöXrr")rÚtypingrÚnumpyrSÚfeature_extraction_utilsrÚimage_utilsrrrÚprocessing_utilsr r rrÚtokenization_utils_baser rÚutilsrÚ get_loggerrÚloggerrr"Ú__all__rrrÚrsjðñõãå4ßEÑE÷ó÷DÝð × Ò ˜HÓ %€ôÐ+°5òô{X^ô{Xð|Ð r