HEX

File: //usr/local/lib/python3.10/dist-packages/charset_normalizer/__pycache__/utils.cpython-310.pyc
o

;��g�.�@srddlmZddlZddlZddlZddlmZddlmZddl	m
Z
ddlmZddl
mZddlmZd	d
lmZmZmZmZmZmZe
ed�dgdd��Ze
ed�dhdd��Ze
ed�didd��Ze
ed�dgdd��Ze
ed�dgdd��Ze
ed�dgdd��Ze
ed�dgdd��Ze
ed�dgdd ��Ze
ed�dgd!d"��Z e
ed�dgd#d$��Z!e
ed�dgd%d&��Z"e
ed�dgd'd(��Z#e
ed�dgd)d*��Z$e
ed�dgd+d,��Z%e
ed�dgd-d.��Z&e
ed�dgd/d0��Z'e
e(e�d�djd2d3��Z)e
ed�dgd4d5��Z*dkdld;d<�Z+e
d=d�dmd?d@��Z,dndBdC�Z-dodEdF�Z.dpdqdJdK�Z/drdOdP�Z0dsdQdR�Z1dSej2dTfdtdXdY�Z3	dudvdedf�Z4dS)w�)�annotationsN)�IncrementalDecoder)�aliases)�	lru_cache)�findall)�	Generator)�MultibyteIncrementalDecoder�)�ENCODING_MARKS�IANA_SUPPORTED_SIMILAR�RE_POSSIBLE_ENCODING_INDICATION�UNICODE_RANGES_COMBINED�UNICODE_SECONDARY_RANGE_KEYWORD�UTF8_MAXIMAL_ALLOCATION)�maxsize�	character�str�return�boolcCsdzt�|�}Wn
tyYdSwd|vp1d|vp1d|vp1d|vp1d|vp1d|vp1d|vp1d	|vS)
NFz
WITH GRAVEz
WITH ACUTEzWITH CEDILLAzWITH DIAERESISzWITH CIRCUMFLEXz
WITH TILDEzWITH MACRONzWITH RING ABOVE��unicodedata�name�
ValueError�r�description�r�C/usr/local/lib/python3.10/dist-packages/charset_normalizer/utils.py�is_accentuateds(��������rcCs.t�|�}|s	|S|�d�}tt|dd��S)N� r�)r�
decomposition�split�chr�int)r�
decomposed�codesrrr�
remove_accent,s


r&�
str | NonecCs.t|�}t��D]\}}||vr|SqdS)zK
    Retrieve the Unicode range official name from a single character.
    N)�ordr
�items)r�
character_ord�
range_name�	ord_rangerrr�
unicode_range7s�r-cC�*z
t�|�}Wd|vStyYdSw)NF�LATINrrrrr�is_latinEs��r0cCs2t�|�}d|vrdSt|�}|durdSd|vS)N�PTF�Punctuation�r�categoryr-�r�character_category�character_rangerrr�is_punctuationNs
r8cCsBt�|�}d|vs
d|vrdSt|�}|durdSd|vo |dkS)N�S�NTF�Forms�Lor3r5rrr�	is_symbol]s
r=cCs$t|�}|dur
dSd|vpd|vS)NF�	Emoticons�Pictographs)r-)rr7rrr�is_emoticonlsr@cCs.|��s|dvr
dSt�|�}d|vp|dvS)N>�｜�+�<�>T�Z>�Pc�Pd�Po)�isspacerr4)rr6rrr�is_separatorvs
rJcCs|��|��kS�N)�islower�isupper�rrrr�is_case_variable�srOcCr.)NF�CJKr�r�character_namerrr�is_cjk����rScCr.)NF�HIRAGANArrQrrr�is_hiragana�rTrVcCr.)NF�KATAKANArrQrrr�is_katakana�rTrXcCr.)NF�HANGULrrQrrr�	is_hangul�rTrZcCr.)NF�THAIrrQrrr�is_thai�rTr\cCr.)NF�ARABICrrQrrr�	is_arabic�rTr^cCs4zt�|�}Wn
tyYdSwd|vod|vS)NFr]z
ISOLATED FORMrrQrrr�is_arabic_isolated_form�s�r_r+cst�fdd�tD��S)Nc3s�|]}|�vVqdSrKr)�.0�keyword�r+rr�	<genexpr>�s�z-is_unicode_range_secondary.<locals>.<genexpr>)�anyrrbrrbr�is_unicode_range_secondary�srecCs(|��duo|��duo|dko|dkS)NF�u)rI�isprintablerNrrr�is_unprintable�s
���rh� �sequence�bytes�search_zoner#cCs�t|t�st�t|�}tt|dt||��jddd��}t|�dkr$dS|D]'}|���	dd�}t
��D]\}}||krB|S||krL|Sq4q&dS)zW
    Extract using ASCII-only decoder any specified encoding in the first n-bytes.
    N�ascii�ignore��errorsr�-�_)�
isinstancerk�	TypeError�lenrr�min�decode�lower�replacerr))rjrl�seq_len�results�specified_encoding�encoding_alias�
encoding_ianarrr�any_specified_encoding�s&
���r�rcCs |dvptt�d|���jt�S)zQ
    Verify is a specific encoding is a multi byte one based on it IANA name
    >	�utf_7�utf_8�utf_16�utf_32�	utf_16_be�	utf_16_le�	utf_32_be�	utf_32_le�	utf_8_sig�
encodings.)�
issubclass�	importlib�
import_modulerr)rrrr�is_multi_byte_encoding�s
��r��tuple[str | None, bytes]cCsJtD] }t|}t|t�r|g}|D]}|�|�r!||fSqqdS)z9
    Identify and extract SIG/BOM in given sequence.
    )N�)r
rsrk�
startswith)rj�
iana_encoding�marks�markrrr�identify_sig_or_boms

��r�r�cCs|dvS)N>r�r�r)r�rrr�should_strip_sig_or_bom"sr�T�cp_name�strictcCsN|���dd�}t��D]\}}|||fvr|Sq|r%td|�d���|S)zIReturns the Python normalized encoding name (Not the IANA official name).rqrrzUnable to retrieve IANA for '�')rxryrr)r)r�r�r}r~rrr�	iana_name&s�r��iana_name_a�iana_name_b�floatc	Cs�t|�st|�r
dSt�d|���j}t�d|���j}|dd�}|dd�}d}td�D]}t|g�}|�|�|�|�krA|d7}q,|dS)	Ngr�rnror�r	�)r�r�r�r�rangerkrw)	r�r��	decoder_a�	decoder_b�id_a�id_b�character_match_count�i�
to_be_decodedrrr�
cp_similarity7s


�r�cCs|tvo	|t|vS)z�
    Determine if two code page are at least 80% similar. IANA_SUPPORTED_SIMILAR dict was generated using
    the function cp_similarity.
    )r)r�r�rrr�
is_cp_similarKs
�r��charset_normalizerz)%(asctime)s | %(levelname)s | %(message)s�level�
format_string�NonecCs:t�|�}|�|�t��}|�t�|��|�|�dSrK)�logging�	getLogger�setLevel�
StreamHandler�setFormatter�	Formatter�
addHandler)rr�r��logger�handlerrrr�set_logging_handlerVs


r��	sequencesr~�offsetsr��
chunk_size�bom_or_sig_available�strip_sig_or_bom�sig_payload�is_multi_byte_decoder�decoded_payload�Generator[str, None, None]c	cs&�|r|dur|D]}	||	|	|�}
|
sdS|
Vq	dS|D]p}	|	|}|t|�dkr/q ||	|	|�}|rA|durA||}|j||rHdndd�}
|r�|	dkr�t|d�}
|r�|
d|
�|vr�t|	|	dd	�D]#}|||�}|r{|dur{||}|j|dd�}
|
d|
�|vr�nqi|
Vq dS)
NF�rnr�rorr����)rurwrvr�)r�r~r�r�r�r�r�r�r�r��chunk�	chunk_end�cut_sequence�chunk_partial_size_chk�jrrr�cut_sequence_chunkscsD��
�
���r�)rrrr)rrrr)rrrr')r+rrr)ri)rjrkrlr#rr')rrrr)rjrkrr�)r�rrr)T)r�rr�rrr)r�rr�rrr�)r�rr�rrr)rrr�r#r�rrr�rK)r�rkr~rr�r�r�r#r�rr�rr�rkr�rr�r'rr�)5�
__future__rr�r�r�codecsr�encodings.aliasesr�	functoolsr�rer�typingr�_multibytecodecr�constantr
rrr
rrrr&r-r0r8r=r@rJrOrSrVrXrZr\r^r_rurerhrr�r�r�r�r�r��INFOr�r�rrrr�<module>sz 


									
 



��