HEX

File: //home/arjun/projects/env/lib/python3.10/site-packages/fuzzywuzzy/__pycache__/fuzz.cpython-310.pyc
o

wew%�@s ddlmZddlZddlZzddlmZWney0e��dkr(e�d�ddl	mZYnwddl
mZejej
ejd	d
����Zejej
ejdd����Zd$dd�Zejd%dd��Zd&dd�Zd&dd�Zejd%dd��Zd&dd�Zd&dd�Zd&dd�Zd$dd�Zd&d d!�Zd$d"d#�ZdS)'�)�unicode_literalsN�)�
StringMatcher�PyPyzYUsing slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning)�SequenceMatcher)�utilscCs.t�||�\}}td||�}t�d|���S)N�d)r�make_type_consistentr�intr�ratio)�s1�s2�m�r�H/home/arjun/projects/env/lib/python3.10/site-packages/fuzzywuzzy/fuzz.pyrsrc
Cs�t�||�\}}t|�t|�kr|}|}n|}|}td||�}|��}g}|D]8}|d|ddkr;|d|dnd}|t|�}	|||	�}
td||
�}|��}|dkrZdS|�|�q't�dt|��S)zR"Return the ratio of the most similar substring
    as a number between 0 and 100.Nrrgףp=
��?r)	rr	�lenr�get_matching_blocksr�appendr
�max)
rr
�shorter�longerr�blocks�scores�block�
long_start�long_end�long_substr�m2�rrrr�
partial_ratios&(rTcCs4|r	tj||d�n|}|��}d�t|��}|��S)z*Return a cleaned string with token sorted.��force_ascii� )r�full_process�split�join�sorted�strip)�sr!r#�ts�tokens�
sorted_stringrrr�_process_and_sortKsr,cCs4t|||d�}t|||d�}|rt||�St||�S)N�r#)r,rr)rr
�partialr!r#�sorted1�sorted2rrr�_token_sortZs


r1cC�t||d||d�S)zpReturn a measure of the sequences' similarity between 0 and 100
    but sorting the token before comparing.
    F�r.r!r#�r1�rr
r!r#rrr�token_sort_ratioe�r6cCr2)z}Return the ratio of the most similar substring as a number between
    0 and 100 but sorting the token before comparing.
    Tr3r4r5rrr�partial_token_sort_ratiolr7r8cCs|s||krdS|rtj||d�n|}|rtj||d�n|}t�|�s%dSt�|�s,dSt|���}t|���}|�|�}	|�|�}
|�|�}d�t|	��}d�t|
��}
d�t|��}|d|
}|d|}|�	�}|�	�}|�	�}|ryt
}nt}|||�|||�|||�g}t|�S)a	Find all alphanumeric tokens in each string...
        - treat them as a set
        - construct two strings of the form:
            <sorted_intersection><sorted_remainder>
        - take ratios of those two strings
        - controls for unordered partial matchesrr rr")
rr#�validate_string�setr$�intersection�
differencer%r&r'rrr)rr
r.r!r#�p1�p2�tokens1�tokens2r;�diff1to2�diff2to1�sorted_sect�sorted_1to2�sorted_2to1�
combined_1to2�
combined_2to1�
ratio_func�pairwiserrr�
_token_setss:	




�rJcCr2)NFr3�rJr5rrr�token_set_ratio��rLcCr2)NTr3rKr5rrr�partial_token_set_ratio�rMrNcCsP|rtj||d�}tj||d�}n|}|}t�|�sdSt�|�s#dSt||�S)a�
    Quick ratio comparison between two strings.

    Runs full_process from utils on both strings
    Short circuits if either of the strings is empty after processing.

    :param s1:
    :param s2:
    :param force_ascii: Allow only ASCII characters (Default: True)
    :full_process: Process inputs, used here to avoid double processing in extract functions (Default: True)
    :return: similarity ratio
    r r)rr#r9r)rr
r!r#r=r>rrr�QRatio�s


rOcC�t||d|d�S)z�
    Unicode quick ratio

    Calls QRatio with force_ascii set to False

    :param s1:
    :param s2:
    :return: similarity ratio
    F�r!r#)rO�rr
r#rrr�UQRatio�s
rScCs$|rtj||d�}tj||d�}n|}|}t�|�sdSt�|�s#dSd}d}d}t||�}	ttt|�t|���tt|�t|��}
|
dkrHd}|
dkrNd	}|rwt||�|}t	||dd
�||}t
||dd
�||}
t�t|	|||
��St||dd
�|}t
||dd
�|}t�t|	||��S)aj
    Return a measure of the sequences' similarity between 0 and 100, using different algorithms.

    **Steps in the order they occur**

    #. Run full_process from utils on both strings
    #. Short circuit if this makes either string empty
    #. Take the ratio of the two processed strings (fuzz.ratio)
    #. Run checks to compare the length of the strings
        * If one of the strings is more than 1.5 times as long as the other
          use partial_ratio comparisons - scale partial results by 0.9
          (this makes sure only full results can return 100)
        * If one of the strings is over 8 times as long as the other
          instead scale by 0.6

    #. Run the other ratio functions
        * if using partial ratio functions call partial_ratio,
          partial_token_sort_ratio and partial_token_set_ratio
          scale all of these by the ratio based on length
        * otherwise call token_sort_ratio and token_set_ratio
        * all token based comparisons are scaled by 0.95
          (on top of any partial scalars)

    #. Take the highest value from these results
       round it and return it as an integer.

    :param s1:
    :param s2:
    :param force_ascii: Allow only ascii characters
    :type force_ascii: bool
    :full_process: Process inputs, used here to avoid double processing in extract functions (Default: True)
    :return:
    r rTgffffff�?g�������?g�?F�g333333�?r-)rr#r9r�floatrr�minrr8rNr
r6rL)rr
r!r#r=r>�try_partial�unbase_scale�
partial_scale�base�	len_ratior.�ptsor�ptser�tsor�tserrrr�WRatio�sD#


(����r`cCrP)z�Return a measure of the sequences' similarity between 0 and 100,
    using different algorithms. Same as WRatio but preserving unicode.
    FrQ)r`rRrrr�UWRatio.sra)T)TTT)TT)�
__future__r�platform�warningsrr�ImportError�python_implementation�warn�difflib�r�check_for_none�check_for_equivalence�check_empty_stringrrr,r1r6r8rJrLrNrOrSr`rarrrr�<module>s@
�
)



4

	

N