File: //home/arjun/projects/env/lib64/python3.10/site-packages/lxml/html/__pycache__/diff.cpython-310.pyc
o
weYw � @ s� d dl mZ d dlZd dlmZ d dlmZ d dlZddgZzd dl m
Z W n ey5 d dl
m
Z Y nw zeZW n eyE eZY nw ze W n eyU eZY nw dd � Zefd
d�Zdd� Zd
d� Zdd� Zdd� Zdd� Zdd� Zdd� Zdd� Zdidd�Zdd� ZG dd � d �ZG d!d"� d"�Z G d#d$� d$e!�Z"d%d&� Z#d'd(� Z$d)d*� Z%d+d,� Z&d-d.� Z'd/d0� Z(G d1d2� d2e�Z)G d3d4� d4e)�Z*G d5d6� d6e)�Z+djd8d9�Z,djd:d;�Z-e�.d<ej/ej0B �Z1e�.d=ej/ej0B �Z2e�.d>ej/ej0B �Z3d?d@� Z4e�.dA�Z5dBdC� Z6dDdE� Z7dFZ8dGZ9dHZ:didIdJ�Z;e�.dKej<�Z=dLdM� Z>e�.dN�Z?dOdP� Z@dQdR� ZAdSdT� ZBdUdV� ZCdWdX� ZDdYdZ� ZEdid[d\�ZFd]d^� ZGd_d`� ZHdadb� ZIdcdd� ZJG dedf� dfejK�ZLeMdgk�rjd dhlmNZN eN�O� dS dS )k� )�absolute_importN)�etree)�fragment_fromstring�
html_annotate�htmldiff)�escapec C s dt t|�d�| f S )Nz<span title="%s">%s</span>� )�html_escape�_unicode)�text�version� r
�G/home/arjun/projects/env/lib/python3.10/site-packages/lxml/html/diff.py�default_markup s �r c C sV dd� | D �}|d }|dd� D ] }t ||� |}qt|�}t||�}d�|��� S )a
doclist should be ordered from oldest to newest, like::
>>> version1 = 'Hello World'
>>> version2 = 'Goodbye World'
>>> print(html_annotate([(version1, 'version 1'),
... (version2, 'version 2')]))
<span title="version 2">Goodbye</span> <span title="version 1">World</span>
The documents must be *fragments* (str/UTF8 or unicode), not
complete documents
The markup argument is a function to markup the spans of words.
This function is called like markup('Hello', 'version 2'), and
returns HTML. The first argument is text and never includes any
markup. The default uses a span with a title:
>>> print(default_markup('Some Text', 'by Joe'))
<span title="by Joe">Some Text</span>
c S s g | ] \}}t ||��qS r
)�tokenize_annotated)�.0�docr r
r
r �
<listcomp>= s �z!html_annotate.<locals>.<listcomp>r r N� )�html_annotate_merge_annotations�compress_tokens�markup_serialize_tokens�join�strip)�doclist�markup� tokenlist�
cur_tokens�tokens�resultr
r
r r # s �
c C s t | dd�}|D ]}||_q|S )zFTokenize a document and add an annotation attribute to each token
F��
include_hrefs)�tokenize�
annotation)r r# r �tokr
r
r r K s r c C sV t | |d�}|�� }|D ]\}}}}}|dkr(| ||� } |||� }
t| |
� qdS )z�Merge the annotations from tokens_old into tokens_new, when the
tokens in the new document already existed in the old document.
��a�b�equalN)�InsensitiveSequenceMatcher�get_opcodes�copy_annotations)�
tokens_old�
tokens_new�s�commands�command�i1�i2�j1�j2�eq_old�eq_newr
r
r r S s
��r c C s4 t | �t |�ks
J �t| |�D ]\}}|j|_qdS )zN
Copy annotations from the tokens listed in src to the tokens in dest
N)�len�zipr# )�src�dest�src_tok�dest_tokr
r
r r+ ` s
�r+ c C sV | d g}| dd� D ]}|d j s#|js#|d j|jkr#t||� q|�|� q|S )zm
Combine adjacent tokens when there is no HTML between the tokens,
and they share an annotation
r r N���)� post_tags�pre_tagsr# �compress_merge_back�append)r r r$ r
r
r r h s
�r c C sx | d }t |�tust |�tur| �|� dS t|�}|jr#||j7 }||7 }t||j|j|jd�}|j|_|| d<