HEX
Server: Apache/2.4.52 (Ubuntu)
System: Linux spn-python 5.15.0-89-generic #99-Ubuntu SMP Mon Oct 30 20:42:41 UTC 2023 x86_64
User: arjun (1000)
PHP: 8.1.2-1ubuntu2.20
Disabled: NONE
Upload Files
File: //usr/lib/python3/dist-packages/chardet/__pycache__/charsetprober.cpython-312.pyc
�

-��_���6�ddlZddlZddlmZGd�de�Zy)�N�)�ProbingStatec�z�eZdZdZdd�Zd�Zed��Zd�Zed��Z	d�Z
ed	��Zed
��Z
ed��Zy)
�
CharSetProbergffffff�?Nc�\�d|_||_tjt�|_y�N)�_state�lang_filter�logging�	getLogger�__name__�logger)�selfr
s  �7/usr/lib/python3/dist-packages/chardet/charsetprober.py�__init__zCharSetProber.__init__'s#�����&����'�'��1���c�.�tj|_yr)r�	DETECTINGr	�rs r�resetzCharSetProber.reset,s��"�,�,��rc��yr�rs r�charset_namezCharSetProber.charset_name/s��rc��yrr)r�bufs  r�feedzCharSetProber.feed3s��rc��|jSr)r	rs r�statezCharSetProber.state6s���{�{�rc��y)Ngrrs r�get_confidencezCharSetProber.get_confidence:s��rc�4�tjdd|�}|S)Ns([-])+� )�re�sub)rs r�filter_high_byte_onlyz#CharSetProber.filter_high_byte_only=s���f�f�&��c�2���
rc���t�}tjd|�}|D]C}|j|dd�|dd}|j	�s|dkrd}|j|��E|S)u9
        We define three types of bytes:
        alphabet: english alphabets [a-zA-Z]
        international: international characters [€-ÿ]
        marker: everything else [^a-zA-Z€-ÿ]

        The input buffer can be thought to contain a series of words delimited
        by markers. This function works to filter all words that contain at
        least one international character. All contiguous sequences of markers
        are replaced by a single space ascii character.

        This filter applies to all scripts which do not use English characters.
        s%[a-zA-Z]*[�-�]+[a-zA-Z]*[^a-zA-Z�-�]?N�����r")�	bytearrayr#�findall�extend�isalpha)r�filtered�words�word�	last_chars     r�filter_international_wordsz(CharSetProber.filter_international_wordsBsw���;��
�
�
�O�� ���D��O�O�D��"�I�&��R�S�	�I��$�$�&�9�w�+>� �	��O�O�I�&���rc�>�t�}d}d}tt|��D]a}|||dz}|dk(rd}n|dk(rd}|dks� |j�r�1||kDr'|s%|j	|||�|j	d�|dz}�c|s|j	||d	�|S)
a�
        Returns a copy of ``buf`` that retains only the sequences of English
        alphabet and high byte characters that are not between <> characters.
        Also retains English alphabet and high byte characters immediately
        before occurrences of >.

        This filter can be applied to all scripts which contain both English
        characters and extended ASCII characters, but is currently only used by
        ``Latin1Prober``.
        Frr�>�<Tr(r"N)r)�range�lenr,r+)rr-�in_tag�prev�curr�buf_chars      r�filter_with_english_lettersz)CharSetProber.filter_with_english_lettersgs����;�������#�c�(�O�D��4��q��)�H��4�����T�!����'�!�(�*:�*:�*<��$�;�v��O�O�C��T�N�3��O�O�D�)��a�x��%$�*�
�O�O�C���J�'��rr)r
�
__module__�__qualname__�SHORTCUT_THRESHOLDrr�propertyrrrr �staticmethodr%r1r;rrrrr#s�����2�
-�����
�����������"��"�H�)��)rr)rr#�enumsr�objectrrrr�<module>rCs��:�	��n�F�nr