HEX
Server: Apache/2.4.52 (Ubuntu)
System: Linux spn-python 5.15.0-89-generic #99-Ubuntu SMP Mon Oct 30 20:42:41 UTC 2023 x86_64
User: arjun (1000)
PHP: 8.1.2-1ubuntu2.20
Disabled: NONE
Upload Files
File: //usr/lib/python3/dist-packages/chardet/__pycache__/universaldetector.cpython-312.pyc
�

�n�_�0���dZddlZddlZddlZddlmZddlmZmZm	Z	ddl
mZddlm
Z
ddlmZdd	lmZGd
�de�Zy)a
Module containing the UniversalDetector detector class, which is the primary
class a user of ``chardet`` should use.

:author: Mark Pilgrim (initial port to Python)
:author: Shy Shalom (original C code)
:author: Dan Blanchard (major refactoring for 3.0)
:author: Ian Cordasco
�N�)�CharSetGroupProber)�
InputState�LanguageFilter�ProbingState)�EscCharSetProber)�Latin1Prober)�MBCSGroupProber)�SBCSGroupProberc	���eZdZdZdZejd�Zejd�Zejd�Z	dddd	d
ddd
d�Z
ejfd�Z
d�Zd�Zd�Zy)�UniversalDetectoraq
    The ``UniversalDetector`` class underlies the ``chardet.detect`` function
    and coordinates all of the different charset probers.

    To get a ``dict`` containing an encoding and its confidence, you can simply
    run:

    .. code::

            u = UniversalDetector()
            u.feed(some_bytes)
            u.close()
            detected = u.result

    g�������?s[�-�]s(|~{)s[�-�]zWindows-1252zWindows-1250zWindows-1251zWindows-1256zWindows-1253zWindows-1255zWindows-1254zWindows-1257)z
iso-8859-1z
iso-8859-2z
iso-8859-5z
iso-8859-6z
iso-8859-7z
iso-8859-8z
iso-8859-9ziso-8859-13c���d|_g|_d|_d|_d|_d|_d|_||_tjt�|_d|_|j�y)N)�_esc_charset_prober�_charset_probers�result�done�	_got_data�_input_state�
_last_char�lang_filter�logging�	getLogger�__name__�logger�_has_win_bytes�reset)�selfrs  �;/usr/lib/python3/dist-packages/chardet/universaldetector.py�__init__zUniversalDetector.__init__Qsa��#'�� � "��������	���� ������&����'�'��1���"����
�
��c�
�dddd�|_d|_d|_d|_tj
|_d|_|jr|jj�|jD]}|j��y)z�
        Reset the UniversalDetector and all of its probers back to their
        initial states.  This is called by ``__init__``, so you only need to
        call this directly in between analyses of different documents.
        N���encoding�
confidence�languageFr )rrrrr�
PURE_ASCIIrrrrr)r�probers  rrzUniversalDetector.reset^sq��$(�s��M�����	����#���&�1�1�������#�#��$�$�*�*�,��+�+�F��L�L�N�,r c���|jryt|�syt|t�st|�}|js�|jtj�rdddd�|_n�|jtjtjf�rdddd�|_nt|jd�rdddd�|_nW|jd	�rd
ddd�|_n:|jtjtjf�rdddd�|_d|_|jd
�d|_y|jtjk(r�|j j#|�rtj$|_
nZ|jtjk(r=|j&j#|j(|z�rtj*|_
|dd|_|jtj*k(r�|j,st/|j0�|_|j,j3|�t4j6k(rS|j,j8|j,j;�|j,j<d�|_d|_yy|jtj$k(�r|j>s~tA|j0�g|_|j0tBjDzr#|j>jGtI��|j>jGtK��|j>D]Z}|j3|�t4j6k(s�&|j8|j;�|j<d�|_d|_n|jLj#|�rd|_'yyy)a�
        Takes a chunk of a document and feeds it through all of the relevant
        charset probers.

        After calling ``feed``, you can check the value of the ``done``
        attribute to see if you need to continue feeding the
        ``UniversalDetector`` more data, or if it has made a prediction
        (in the ``result`` attribute).

        .. note::
           You should always call ``close`` when you're done feeding in your
           document if ``done`` is not already ``True``.
        Nz	UTF-8-SIG��?�r#zUTF-32s��zX-ISO-10646-UCS-4-3412s��zX-ISO-10646-UCS-4-2143zUTF-16Tr$���)(r�len�
isinstance�	bytearrayr�
startswith�codecs�BOM_UTF8r�BOM_UTF32_LE�BOM_UTF32_BE�BOM_LE�BOM_BErrr'�HIGH_BYTE_DETECTOR�search�	HIGH_BYTE�ESC_DETECTORr�	ESC_ASCIIrrr�feedr�FOUND_IT�charset_name�get_confidencer&rr
r�NON_CJK�appendrr	�WIN_BYTE_DETECTORr)r�byte_strr(s   rr<zUniversalDetector.feedosA���9�9���8�}���(�I�.� ��*�H��~�~��"�"�6�?�?�3�+6�-0�+-�/����$�$�f�&9�&9�&,�&9�&9�&;�<�,4�-0�+-�/����$�$�%8�9�+C�-0�+-�/����$�$�%8�9�+C�-0�+-�/����$�$�f�m�m�V�]�]�%C�D�,4�-0�+-�/���"�D�N��{�{�:�&�2� ��	�����
� 5� 5�5��&�&�-�-�h�7�$.�$8�$8��!��"�"�j�&;�&;�;��%�%�,�,�T�_�_�x�-G�H�$.�$8�$8��!�"�2�3�-������
� 4� 4�4��+�+�+;�D�<L�<L�+M��(��'�'�,�,�X�6�,�:O�:O�O�#�7�7�D�D�#�7�7�F�F�H�#�7�7�@�@�B���!��	�P��
�
�*�"6�"6�
6��(�(�)8��9I�9I�)J�(K��%��#�#�n�&<�&<�<��)�)�0�0��1B�C��%�%�,�,�\�^�<��/�/���;�;�x�(�L�,A�,A�A�/5�/B�/B�17�1F�1F�1H�/5���#@�D�K�!%�D�I��
0��%�%�,�,�X�6�&*��#�7�7r c	���|jr|jSd|_|js|jj	d��n|j
tjk(rdddd�|_n�|j
tjk(r�d}d}d}|jD]}|s�|j�}||kDs�|}|}�!|r�||jkDr�|j}|jj�}|j�}|jd	�r(|jr|j j#||�}|||j$d�|_|jj'�t(j*kr�|jd
��|jj	d�|jD]�}|s�t-|t.�rR|j0D]B}|jj	d|j|j$|j���D�h|jj	d|j|j$|j����|jS)
z�
        Stop analyzing the current document and come up with a final
        prediction.

        :returns:  The ``result`` attribute, a ``dict`` with the keys
                   `encoding`, `confidence`, and `language`.
        Tzno data received!�asciir*r+r#Nr"ziso-8859r$z no probers hit minimum thresholdz%s %s confidence = %s)rrrr�debugrrr'r9rr?�MINIMUM_THRESHOLDr>�lowerr0r�ISO_WIN_MAP�getr&�getEffectiveLevelr�DEBUGr.r�probers)	r�prober_confidence�max_prober_confidence�
max_proberr(r>�lower_charset_namer%�group_probers	         r�closezUniversalDetector.close�s0���9�9��;�;����	��~�~��K�K���1�2��
�
�*�"7�"7�
7�'.�),�')�+�D�K�
�
�
�*�"6�"6�
6� $��$'�!��J��/�/����$*�$9�$9�$;�!�$�'<�<�,=�)�!'�J�
0��4�t�7M�7M�M�)�6�6��%/�%<�%<�%B�%B�%D�"�'�6�6�8�
�&�0�0��<��*�*�'+�'7�'7�';�';�<N�<H�(J��+7�-7�+5�+>�+>�@���
�;�;�(�(�*�g�m�m�;��{�{�:�&�.����!�!�"D�E�$(�$9�$9�L�'� �!�,�0B�C�&2�&:�&:�F� �K�K�-�-�.E�.4�.A�.A�.4�o�o�.4�.C�.C�.E�G�';����)�)�*A�*6�*C�*C�*6�*?�*?�*6�*E�*E�*G�I�%:��{�{�r N)r�
__module__�__qualname__�__doc__rG�re�compiler7r:rBrIr�ALLrrr<rS�r rr
r
3s���� ��#����N�3���2�:�:�l�+�L�"��
�
�>�2��!/�!/�!/�!/�!/�!/�!/�"0�2�K�$2�#5�#5���"k+�ZBr r
)rVr1rrW�charsetgroupproberr�enumsrrr�	escproberr�latin1proberr	�mbcsgroupproberr
�sbcsgroupproberr�objectr
rZr r�<module>rbs8��8���	�2�;�;�'�&�,�,�k��kr