File: //usr/local/lib/python3.10/dist-packages/charset_normalizer/__pycache__/api.cpython-310.pyc
o
;��gYX � @ s: d dl mZ d dlZd dlmZ d dlmZ ddlmZm Z m
Z
mZ ddlm
Z
mZmZmZ ddlmZ dd lmZmZ dd
lmZmZmZmZmZmZmZ e�d�Ze� � Z!e!�"e�#d��
d2d3d$d%�Z$
d2d4d(d)�Z%
d2d5d,d-�Z&
d6d7d0d1�Z'dS )8� )�annotationsN)�PathLike)�BinaryIO� )�coherence_ratio�encoding_languages�mb_encoding_languages�merge_coherence_ratios)�IANA_SUPPORTED�TOO_BIG_SEQUENCE�TOO_SMALL_SEQUENCE�TRACE)�
mess_ratio)�CharsetMatch�CharsetMatches)�any_specified_encoding�cut_sequence_chunks� iana_name�identify_sig_or_bom�
is_cp_similar�is_multi_byte_encoding�should_strip_sig_or_bom�charset_normalizerz)%(asctime)s | %(levelname)s | %(message)s� � 皙�����?TF皙�����?� sequences�bytes | bytearray�steps�int�
chunk_size� threshold�float�cp_isolation�list[str] | None�cp_exclusion�preemptive_behaviour�bool�explain�language_threshold�enable_fallback�returnr c
2 C s� t | ttf�std�t| ����|rtj}
t�t � t�
t� t| �}|dkrGt�
d� |r;t�t � t�
|
p9tj� tt| dddg d�g�S |dur]t�td d
�|�� dd� |D �}ng }|durut�td
d
�|�� dd� |D �}ng }||| kr�t�td|||� d}|}|dkr�|| |k r�t|| �}t| �tk }t| �tk}
|r�t�td�|�� n|
r�t�td�|�� g }|r�t| �nd}|dur�|�|� t�td|� t� }g }g }d}d}d}t� }t� }t| �\}}|du�r|�|� t�tdt|�|� |�d� d|v�r|�d� |t D �]5}|�r$||v�r$�q|�r.||v �r.�q||v �r5�q|�|� d}||k}|�oFt|�}|dv �rX|�sXt�td|� �q|dv �ri|�sit�td|� �qzt|�}W n t t!f�y� t�td|� Y �qw z9|
�r�|du �r�t"|du �r�| dtd�� n | t|�td�� |d� nt"|du �r�| n| t|�d� |d�}W n+ t#t$f�y� } zt |t$��s�t�td|t"|�� |�|� W Y d}~�qd}~ww d} |D ]
}!t%||!��r�d} n�q�| �rt�td||!� �qt&|�sdnt|�|t|| ��}"|�o&|du�o&t|�|k }#|#�r1t�td |� tt|"�d! �}$t'|$d"�}$d}%d}&g }'g }(zLt(| ||"||||||� D ]=})|'�|)� |(�t)|)||du �ordt|� k�opd"kn �� |(d# |k�r�|%d7 }%|%|$k�s�|�r�|du �r� n�qSW n! t#�y� } zt�td$|t"|�� |$}%d}&W Y d}~nd}~ww |&�s�|
�r�|�s�z| td%�d� j*|d&d'� W n# t#�y� } zt�td(|t"|�� |�|� W Y d}~�qd}~ww |(�r�t+|(�t|(� nd}*|*|k�s|%|$k�rH|�|� t�td)||%t,|*d* d+d,�� | �rF|dd|fv �rF|&�sFt| ||dg ||d-�}+||k�r<|+}n
|dk�rD|+}n|+}�qt�td.|t,|*d* d+d,�� |�s^t-|�},nt.|�},|,�rqt�td/�|t"|,��� g }-|dk�r�|'D ]})t/|)||,�r�d0�|,�nd�}.|-�|.� �qzt0|-�}/|/�r�t�td1�|/|�� t| ||*||/|
du �s�||ddfv �r�|nd|d-�}0|�|0� ||ddfv �r�|*d2k �r�|*dk�r�t�
d3|0j1� |�r�t�t � t�
|
� t|0g� S |�|0� t|��r+|du �s||v �r+d|v �r+d|v �r+|�2� }1t�
d3|1j1� |�r$t�t � t�
|
� t|1g� S ||k�rLt�
d4|� |�rCt�t � t�
|
� t|| g� S �qt|�dk�r�|�s^|�s^|�rdt�td5� |�rtt�
d6|j1� |�|� n2|�r||du �s�|�r�|�r�|j3|j3k�s�|du�r�t�
d7� |�|� n
|�r�t�
d8� |�|� |�r�t�
d9|�2� j1t|�d � nt�
d:� |�r�t�t � t�
|
� |S );af
Given a raw bytes sequence, return the best possibles charset usable to render str objects.
If there is no results, it is a strong indicator that the source is binary/not text.
By default, the process will extract 5 blocks of 512o each to assess the mess and coherence of a given sequence.
And will give up a particular code page after 20% of measured mess. Those criteria are customizable at will.
The preemptive behavior DOES NOT replace the traditional detection workflow, it prioritize a particular code page
but never take it for granted. Can improve the performance.
You may want to focus your attention to some code page or/and not others, use cp_isolation and cp_exclusion for that
purpose.
This function will strip the SIG in the payload/sequence every time except on UTF-16, UTF-32.
By default the library does not setup any handler other than the NullHandler, if you choose to set the 'explain'
toggle to True it will alter the logger configuration to add a StreamHandler that is suitable for debugging.
Custom logging format and handler can be set manually.
z3Expected object of type bytes or bytearray, got: {}r z<Encoding detection on empty bytes, assuming utf_8 intention.�utf_8g F� Nz`cp_isolation is set. use this flag for debugging purpose. limited list of encoding allowed : %s.z, c S � g | ]}t |d ��qS �F�r ��.0�cp� r5 �A/usr/local/lib/python3.10/dist-packages/charset_normalizer/api.py�
<listcomp>[ � zfrom_bytes.<locals>.<listcomp>zacp_exclusion is set. use this flag for debugging purpose. limited list of encoding excluded : %s.c S r/ r0 r1 r2 r5 r5 r6 r7 f r8 z^override steps (%i) and chunk_size (%i) as content does not fit (%i byte(s) given) parameters.r z>Trying to detect encoding from a tiny portion of ({}) byte(s).zIUsing lazy str decoding because the payload is quite large, ({}) byte(s).z@Detected declarative mark in sequence. Priority +1 given for %s.zIDetected a SIG or BOM mark on first %i byte(s). Priority +1 given for %s.�ascii> �utf_16�utf_32z\Encoding %s won't be tested as-is because it require a BOM. Will try some sub-encoder LE/BE.> �utf_7zREncoding %s won't be tested as-is because detection is unreliable without BOM/SIG.z2Encoding %s does not provide an IncrementalDecoderg ��A)�encodingz9Code page %s does not fit given bytes sequence at ALL. %sTzW%s is deemed too similar to code page %s and was consider unsuited already. Continuing!zpCode page %s is a multi byte encoding table and it appear that at least one character was encoded using n-bytes.� � ���zaLazyStr Loading: After MD chunk decode, code page %s does not fit given bytes sequence at ALL. %sg j�@�strict)�errorsz^LazyStr Loading: After final lookup, code page %s does not fit given bytes sequence at ALL. %szc%s was excluded because of initial chaos probing. Gave up %i time(s). Computed mean chaos is %f %%.�d � )�ndigits)�preemptive_declarationz=%s passed initial chaos probing. Mean measured chaos is %f %%z&{} should target any language(s) of {}�,z We detected language {} using {}r z.Encoding detection: %s is most likely the one.zoEncoding detection: %s is most likely the one as we detected a BOM or SIG within the beginning of the sequence.zONothing got out of the detection process. Using ASCII/UTF-8/Specified fallback.z7Encoding detection: %s will be used as a fallback matchz:Encoding detection: utf_8 will be used as a fallback matchz:Encoding detection: ascii will be used as a fallback matchz]Encoding detection: Found %s as plausible (best-candidate) for content. With %i alternatives.z=Encoding detection: Unable to determine any suitable charset.)4�
isinstance� bytearray�bytes� TypeError�format�type�logger�level�
addHandler�explain_handler�setLevelr
�len�debug�
removeHandler�logging�WARNINGr r �log�joinr r r r �append�setr r
�addr r �ModuleNotFoundError�ImportError�str�UnicodeDecodeError�LookupErrorr �range�maxr r �decode�sum�roundr r r r r= �best�fingerprint)2r r r! r"