ó <±h&QãóP•SSKJrJr SSKJr SSKJrJrJr SSK J r JrJrJ r SSKJrJrJr SSKJr \"5(aSSKr"S S \SS9rS r\"S5Vs/sH nSUSS3PM sn\"S5Vs/sH nSUSS3PM sn-rSr"SS\5rS/rgs snfs snf)é)ÚOptionalÚUnioné)ÚBatchFeature)Ú ImageInputÚis_valid_imageÚmake_flat_list_of_images)ÚMultiModalDataÚProcessingKwargsÚProcessorMixinÚUnpack)Ú AddedTokenÚPreTokenizedInputÚ TextInput)Úis_torch_availableNcó.•\rSrSrSS0SSS.SS0S .rS rg)ÚColPaliProcessorKwargsé$ÚpaddingÚlongestÚchannels_firstT)Údata_formatÚdo_convert_rgbÚreturn_tensorsÚpt)Útext_kwargsÚ images_kwargsÚ common_kwargs©N)Ú__name__Ú __module__Ú__qualname__Ú__firstlineno__Ú _defaultsÚ__static_attributes__róÚf/var/www/html/shao/venv/lib/python3.13/site-packages/transformers/models/colpali/processing_colpali.pyrr$s,†ð yð ð,Ø"ñ ð+¨DÐ1ñ ƒIr&rF)Útotalziz4Ú>é€z3có•X2-U-UUS3$)a Builds a string from the input prompt and image tokens. For example, for the call: build_string_from_input( prompt="Prefix str" bos_token="", image_seq_len=3, image_token="", ) The output will be: "Initial str" Args: prompt (`list[Union[str, ImageInput]]`): The input prompt. bos_token (`str`): The beginning of sentence token. image_seq_len (`int`): The length of the image sequence. image_token (`str`): The image token. num_images (`int`): Number of images in the prompt. Ú r©ÚpromptÚ bos_tokenÚ image_seq_lenÚimage_tokenÚ num_imagess r'Úbuild_string_from_inputr35s"€ð&Ñ)¨JÑ6Ð 7¸ °{À6À(È"ÐMÐMr&c ó„^•\rSrSrSrSS/rSrSrS S\S\4U4S jjjr S!S \ S\\\ \\\\ 4S\\S \4SjjrS"SjrSrSr\S5r\S \4Sj5rS"S \ S\\S \4SjjrS\\\\4S\\S \4SjrS#S\S\S4S\S\S4S\S\SS\S\4S S4SjjrSrU=r$)$ÚColPaliProcessoréKa¯ Constructs a ColPali processor which wraps a PaliGemmaProcessor and special methods to process images and queries, as well as to compute the late-interaction retrieval score. [`ColPaliProcessor`] offers all the functionalities of [`PaliGemmaProcessor`]. See the [`~PaliGemmaProcessor.__call__`] for more information. Args: image_processor ([`SiglipImageProcessor`], *optional*): The image processor is a required input. tokenizer ([`LlamaTokenizerFast`], *optional*): The tokenizer is a required input. chat_template (`str`, *optional*): A Jinja template which will be used to convert lists of messages in a chat into a tokenizable string. visual_prompt_prefix (`str`, *optional*, defaults to `"Describe the image."`): A string that gets tokenized and prepended to the image tokens. query_prefix (`str`, *optional*, defaults to `"Question: "`): A prefix to be used for the query. Úimage_processorÚ tokenizer)ÚSiglipImageProcessorÚSiglipImageProcessorFast)ÚGemmaTokenizerÚGemmaTokenizerFastÚvisual_prompt_prefixÚquery_prefixcó>•[TU]XUS9 Uc[S5eUc[S5e[US5(d[S5eURUl[US5(dK[[SSS 9nS U/0nURU5 UR[5Ul [Ul O"URUl URUl UR[5 SUl SUlX@lXPlg)N)Ú chat_templatez)You need to specify an `image_processor`.z"You need to specify a `tokenizer`.Úimage_seq_lengthz;Image processor is missing an `image_seq_length` attribute.r1FT)Ú normalizedÚspecialÚadditional_special_tokens)ÚsuperÚ__init__Ú ValueErrorÚhasattrrArÚIMAGE_TOKENÚadd_special_tokensÚconvert_tokens_to_idsÚimage_token_idr1Ú add_tokensÚEXTRA_TOKENSÚ add_bos_tokenÚ add_eos_tokenr=r>) Úselfr7r8r@r=r>r1Ú tokens_to_addÚ __class__s €r'rFÚColPaliProcessor.__init__dsùø€ô ‰Ñ˜À=ÐÑQØÑ"ÜÐHÓIÐIØÑÜÐAÓBÐBÜÐ(:×;Ñ;ÜÐZÓ[Ð[à /× @Ñ @ˆÔäy -×0Ñ0Ü$¤[¸UÈDÑQˆKØ8¸;¸-ÐHˆMØ×(Ñ(¨Ô7Ø"+×"AÑ"AÄ+Ó"NˆDÔÜ*ˆDÕà"+×":Ñ":ˆDÔØ(×4Ñ4ˆDÔà×Ñœ\Ô*Ø"'ˆ ÔØ"'ˆ ÔØ$8Ô!Ø(Õr&ÚimagesÚtextÚkwargsÚreturncóx•UR"[4SURR0UD6nUSR SS5nUSLnUcUc[S5eUbUb[S5eUGbà[ U5(aU/nOw[U[5(a[ US5(aON[U[5(a.[US[5(a[ USS5(d[S5eUR/[U5-n UV s/sHoªRS 5PM nn [X‘5VVs/sHTup¼[UURRUR[ [U[5(a[U5OS S9PMV n nn[#U5nUR$"U40USD6S nUSR'SS5bUSS==UR- ss'UR"U 4SS0USD6n0UES U0EnU(a.USR)USS:HS5nUR+SU05 [-US9$Ubà[U[.5(aU/nO8[U[5(a[US[.5(d[S5eUcUR0S-n/nUH@nURRUR2-U-U-S-nUR5U5 MB USR'SS5USS'UR"U4SS0USD6nU$gs sn fs snnf)añ Main method to prepare for the model either (1) one or several texts, either (2) one or several image(s). This method is a custom wrapper around the PaliGemmaProcessor's [`~PaliGemmaProcessor.__call__`] method adapted for the ColPali model. It cannot process both text and images at the same time. When preparing the text(s), this method forwards the `text` and `kwargs` arguments to LlamaTokenizerFast's [`~LlamaTokenizerFast.__call__`]. When preparing the image(s), this method forwards the `images` and `kwargs` arguments to SiglipImageProcessor's [`~SiglipImageProcessor.__call__`]. Please refer to the docstring of the above two methods for more information. Args: images (`PIL.Image.Image`, `np.ndarray`, `torch.Tensor`, `list[PIL.Image.Image]`, `list[np.ndarray]`, `list[torch.Tensor]`): The image or batch of images to be prepared. Each image can be a PIL image, NumPy array or PyTorch tensor. In case of a NumPy array/PyTorch tensor, each image should be of shape (C, H, W), where C is a number of channels, H and W are image height and width. text (`str`, `list[str]`, `list[list[str]]`): The sequence or batch of sequences to be encoded. Each sequence can be a string or a list of strings (pretokenized string). If the sequences are provided as list of strings (pretokenized), you must set `is_split_into_words=True` (to lift the ambiguity with a batch of sequences). return_tensors (`str` or [`~utils.TensorType`], *optional*): If set, will return tensors of a particular framework. Acceptable values are: - `'tf'`: Return TensorFlow `tf.constant` objects. - `'pt'`: Return PyTorch `torch.Tensor` objects. - `'np'`: Return NumPy `np.ndarray` objects. - `'jax'`: Return JAX `jnp.ndarray` objects. Returns: [`BatchFeature`]: A [`BatchFeature`] with the following fields: - **input_ids** -- List of token ids to be fed to a model. - **attention_mask** -- List of indices specifying which tokens should be attended to by the model (when `return_attention_mask=True` or if *"attention_mask"* is in `self.model_input_names` and if `text` is not `None`). - **pixel_values** -- Pixel values to be fed to a model. Returned when `images` is not `None`. Útokenizer_init_kwargsrÚsuffixNz&Either text or images must be providedz5Only one of text or images can be processed at a timerzAimages must be an image, list of images or list of list of imagesÚRGBér-rÚpixel_valuesÚ max_lengthÚreturn_token_type_idsFÚ input_idsÚtoken_type_idsiœÿÿÿÚlabels)Údataz*Text must be a string or a list of stringsé r,é2)Ú _merge_kwargsrr8Úinit_kwargsÚpoprGrÚ isinstanceÚlistr=ÚlenÚconvertÚzipr3r/rArIr r7ÚgetÚmasked_fillÚupdaterÚstrÚquery_augmentation_tokenr>Úappend)rQrUrVÚaudioÚvideosrWÚ output_kwargsr[r`Ú texts_docÚimager.Ú image_listÚ input_stringsr^ÚinputsÚreturn_datarcÚtexts_queryÚqueryÚbatch_querys r'Ú__call__ÚColPaliProcessor.__call__†s|€ðZ×*Ò*Ü"ñ à"&§.¡.×"<Ñ"<ð ðñ ˆ ð ˜}Ñ-×1Ñ1°(¸DÓAˆà &¨dÐ 2Ðà‰<˜F™NÜÐEÓFÐFØÑ Ñ 2ÜÐTÓUÐUàÒÜ˜f×%Ñ%Ø ˜‘Ü˜F¤D×)Ñ)¬n¸VÀA¹Y×.GÑ.GØÜ ¬×.Ñ.´:¸fÀQ¹iÌ×3NÑ3NÔSaÐbhÐijÑbkÐlmÑbn×SoÑSoÜ Ð!dÓeÐeà×2Ñ2Ð3´c¸&³kÑAˆIÙ8>Ó?¹¨u—m‘m EÖ*¹ˆFÐ?ô+.¨iÔ*@ô ñ+AÑ&Fô(Ø!Ø"Ÿn™n×6Ñ6Ø"&×"7Ñ"7Ü +Ü2<¸ZÌ×2NÑ2Nœs :œÐTUôñ+Að ñ ô.¨fÓ5ˆFØ×/Ò/°ÑY¸-ÈÑ:XÑYÐZhÑiˆLð˜]Ñ+×/Ñ/°¸dÓCÑOØ˜mÑ,¨\Ó:¸d×>SÑ>SÑSÓ:à—^’^Øñà&+ðð Ñ.ñˆFðC˜VÐB ^°\ÑBˆKæ$Ø Ñ,×8Ñ8¸Ð@PÑ9QÐUVÑ9VÐX\Ó]Ø×"Ñ" H¨fÐ#5Ô6ä [Ñ1Ð1à Ñ Ü˜$¤×$Ñ$Øv‘Ü ¤t×,Ñ,´¸DÀ¹GÄS×1IÑ1IÜ Ð!MÓNÐNà‰~Ø×6Ñ6¸Ñ;à%'ˆKÛØŸ™×0Ñ0°4×3DÑ3DÑDÀuÑLÈvÑUÐX\Ñ\Ø×"Ñ" 5Ö)ñð:GÀ}Ñ9U×9YÑ9YÐZfÐhjÓ9kˆM˜-Ñ(¨Ñ6àŸ.š.Øñà&+ðð Ñ.ñˆKðÐð-ùòC@ùó s ÄL1Ä>AL6có˜•0nUb;UR/[U5-nS/[U5-nURXES.5 [S0UD6$)ax Computes the number of placeholder tokens needed for multimodal inputs with the given sizes. Args: image_sizes (list[list[str]], *optional*): The input sizes formatted as (height, width) per each image. Returns: `MultiModalData`: A `MultiModalData` object holding number of tokens per each of the provided input modalities, along with other useful data. r])Únum_image_tokensÚnum_image_patchesr)rArlrqr )rQÚimage_sizesrWÚvision_datar„r…s r'Ú_get_num_multimodal_tokensÚ+ColPaliProcessor._get_num_multimodal_tokenssZ€ðˆØÑ"Ø $× 5Ñ 5Ð6¼¸[Ó9IÑIÐØ!" ¤c¨+Ó&6Ñ 6ÐØ×ÑÐ4DÑmÔnÜÑ, Ñ,Ð,r&có:•URR"U0UD6$)zª This method forwards all its arguments to GemmaTokenizerFast's [`~PreTrainedTokenizer.batch_decode`]. Please refer to the docstring of this method for more information. )r8Úbatch_decode©rQÚargsrWs r'r‹ÚColPaliProcessor.batch_decodes€ð ~‰~×*Ò*¨DÐ;°FÑ;Ð;r&có:•URR"U0UD6$)z¤ This method forwards all its arguments to GemmaTokenizerFast's [`~PreTrainedTokenizer.decode`]. Please refer to the docstring of this method for more information. )r8ÚdecoderŒs r'rÚColPaliProcessor.decodes€ð ~‰~×$Ò$ dÐ5¨fÑ5Ð5r&cóš•URRnURRn[[RX-55$©N)r8Úmodel_input_namesr7rkÚdictÚfromkeys)rQÚtokenizer_input_namesÚimage_processor_input_namess r'r”Ú"ColPaliProcessor.model_input_names#s<€à $§¡× @Ñ @ÐØ&*×&:Ñ&:×&LÑ&LÐ#Ü”D—M‘MÐ"7Ñ"UÓVÓWÐWr&có.•URR$)zr Return the query augmentation token. Query augmentation buffers are used as reasoning buffers during inference. )r8Ú pad_token)rQs r'rsÚ)ColPaliProcessor.query_augmentation_token)s€ð~‰~×'Ñ'Ð'r&có*•UR"SSU0UD6$)aæ Prepare for the model one or several image(s). This method is a wrapper around the `__call__` method of the ColPaliProcessor's [`ColPaliProcessor.__call__`]. This method forwards the `images` and `kwargs` arguments to the image processor. Args: images (`PIL.Image.Image`, `np.ndarray`, `torch.Tensor`, `list[PIL.Image.Image]`, `list[np.ndarray]`, `list[torch.Tensor]`): The image or batch of images to be prepared. Each image can be a PIL image, NumPy array or PyTorch tensor. In case of a NumPy array/PyTorch tensor, each image should be of shape (C, H, W), where C is a number of channels, H and W are image height and width. return_tensors (`str` or [`~utils.TensorType`], *optional*): If set, will return tensors of a particular framework. Acceptable values are: - `'tf'`: Return TensorFlow `tf.constant` objects. - `'pt'`: Return PyTorch `torch.Tensor` objects. - `'np'`: Return NumPy `np.ndarray` objects. - `'jax'`: Return JAX `jnp.ndarray` objects. Returns: [`BatchFeature`]: A [`BatchFeature`] with the following fields: - **input_ids** -- List of token ids to be fed to a model. - **attention_mask** -- List of indices specifying which tokens should be attended to by the model (when `return_attention_mask=True` or if *"attention_mask"* is in `self.model_input_names` and if `text` is not `None`). - **pixel_values** -- Pixel values to be fed to a model. Returned when `images` is not `None`. rUr©r)rQrUrWs r'Úprocess_imagesÚColPaliProcessor.process_images2s€ðB}Š}Ñ5 FÐ5¨fÑ5Ð5r&có*•UR"SSU0UD6$)a? Prepare for the model one or several texts. This method is a wrapper around the `__call__` method of the ColPaliProcessor's [`ColPaliProcessor.__call__`]. This method forwards the `text` and `kwargs` arguments to the tokenizer. Args: text (`str`, `list[str]`, `list[list[str]]`): The sequence or batch of sequences to be encoded. Each sequence can be a string or a list of strings (pretokenized string). If the sequences are provided as list of strings (pretokenized), you must set `is_split_into_words=True` (to lift the ambiguity with a batch of sequences). return_tensors (`str` or [`~utils.TensorType`], *optional*): If set, will return tensors of a particular framework. Acceptable values are: - `'tf'`: Return TensorFlow `tf.constant` objects. - `'pt'`: Return PyTorch `torch.Tensor` objects. - `'np'`: Return NumPy `np.ndarray` objects. - `'jax'`: Return JAX `jnp.ndarray` objects. Returns: [`BatchFeature`]: A [`BatchFeature`] with the following fields: - **input_ids** -- List of token ids to be fed to a model. - **attention_mask** -- List of indices specifying which tokens should be attended to by the model (when `return_attention_mask=True` or if *"attention_mask"* is in `self.model_input_names` and if `text` is not `None`). rVrrž)rQrVrWs r'Úprocess_queriesÚ ColPaliProcessor.process_queriesUs€ð@}Š}Ñ1 $Ð1¨&Ñ1Ð1r&Úquery_embeddingsztorch.TensorÚpassage_embeddingsÚ batch_sizeÚoutput_dtypeztorch.dtypeÚ output_deviceztorch.devicec óÊ•[U5S:Xa[S5e[U5S:Xa[S5eUSRUSR:wa[S5eUSRUSR:wa[S5eUcUSRn/n[ S[U5U5GHn/n[ RRRRXXs-SSS9n [ S[U5U5H}n [ RRRRX*X£-SSS9nUR[ R"SX›5RS S 9SRSS 95 M UR[ R"USS 9RU5RU55 GM [ R"USS 9$) aÂ Compute the late-interaction/MaxSim score (ColBERT-like) for the given multi-vector query embeddings (`qs`) and passage embeddings (`ps`). For ColPali, a passage is the image of a document page. Because the embedding tensors are multi-vector and can thus have different shapes, they should be fed as: (1) a list of tensors, where the i-th tensor is of shape (sequence_length_i, embedding_dim) (2) a single tensor of shape (n_passages, max_sequence_length, embedding_dim) -> usually obtained by padding the list of tensors. Args: query_embeddings (`Union[torch.Tensor, list[torch.Tensor]`): Query embeddings. passage_embeddings (`Union[torch.Tensor, list[torch.Tensor]`): Passage embeddings. batch_size (`int`, *optional*, defaults to 128): Batch size for computing scores. output_dtype (`torch.dtype`, *optional*, defaults to `torch.float32`): The dtype of the output tensor. If `None`, the dtype of the input embeddings is used. output_device (`torch.device` or `str`, *optional*, defaults to "cpu"): The device of the output tensor. Returns: `torch.Tensor`: A tensor of shape `(n_queries, n_passages)` containing the scores. The score tensor is saved on the "cpu" device. rzNo queries providedzNo passages providedz/Queries and passages must be on the same devicez-Queries and passages must have the same dtypeT)Úbatch_firstÚ padding_valuez bnd,csd->bcnsr)Údimér])rlrGÚdeviceÚdtypeÚrangeÚtorchÚnnÚutilsÚrnnÚpad_sequencertÚeinsumÚmaxÚsumÚcatÚto)rQr¤r¥r¦r§r¨ÚscoresÚiÚbatch_scoresÚ batch_queriesÚjÚbatch_passagess r'Úscore_retrievalÚ ColPaliProcessor.score_retrievalwsÍ€ô@ÐÓ AÓ%ÜÐ2Ó3Ð3ÜÐ!Ó" aÓ'ÜÐ3Ó4Ð4à˜AÑ×%Ñ%Ð);¸AÑ)>×)EÑ)EÓEÜÐNÓOÐOà˜AÑ×$Ñ$Ð(:¸1Ñ(=×(CÑ(CÓCÜÐLÓMÐMàÑØ+¨AÑ.×4Ñ4ˆLà%'ˆäqœ#Ð.Ó/°×<ˆAØ/1ˆLÜ!ŸH™HŸN™N×.Ñ.×;Ñ;Ø Q¡^Ð4À$ÐVWð<ðˆMô˜1œcÐ"4Ó5°zÖBÜ!&§¡§¡×!3Ñ!3×!@Ñ!@Ø&¨1©>Ð:ÈÐ\]ð"Að"ð×#Ñ#Ü—L’L °-ÓP×TÑTÐYZÐTÐ[Ð\]Ñ^×bÑbÐghÐbÐiöñ Cð M‰Mœ%Ÿ)š) L°aÑ8×;Ñ;¸LÓI×LÑLÈ]Ó[×\ñ=ôyŠy˜ QÑ'Ð'r&)rAr1rLr>r=)NNNzDescribe the image.z Question: )NNNNr“)r*NÚcpu) r r!r"r#Ú__doc__Ú attributesÚimage_processor_classÚtokenizer_classrrrFrrrrrkr rrrrˆr‹rÚpropertyr”rsrŸr¢ÚintrrÁr%Ú __classcell__)rSs@r'r5r5KsÌø†ñð($ [Ð1€JØPÐØ>€OðØØØ$9Ø(ñ )ð "ð )ð÷ )ð )ðH"Ø^bØØñ{àð{ðIÐ0°$°y±/À4ÐHYÑCZÐZÑ[ð{ðÐ/Ñ0ð {ð õ{ôz-ò$<ò6ðñXóðXð ð(¨#ó(óð(ð"ñ!6àð!6ðÐ/Ñ0ð!6ð õ !6ðF 2àI˜t I™Ð.Ñ/ð 2ðÐ/Ñ0ð 2ð ô 2ðLØ04Ø49ñ >(à °°^Ñ0DÐ DÑEð>(ð" .°$°~Ñ2FÐ"FÑGð>(ðð >(ð ˜}Ñ-ð>(ð˜^¨SÐ0Ñ1ð >(ð ÷>(ó>(r&r5)ÚtypingrrÚfeature_extraction_utilsrÚimage_utilsrrr Úprocessing_utilsr rrr Útokenization_utils_baserrrr³rr±rrIr°rNr3r5Ú__all__)r¼s0r'ÚrÑs·ð÷.#å4ßOÑOßXÓXßOÑOÝ'ñ×ÑÛô Ð-°Uò ð€Ù).¨t¬Ó5© A$q˜g˜Q“©Ñ5ÑRWÐX[ÔR\Ó8]ÑR\ÈQ¸4ÀÀ#¸wÀa»ÑR\Ñ8]Ñ]€òNô,j(~ôj(ðZÐ ùòM6ùÒ8]sÁBÁ7B#