File: //proc/1233/cwd/usr/local/lib/python3.10/dist-packages/tiktoken/__pycache__/core.cpython-310.pyc
o
;��g~C � @ s� d dl mZ d dlZd dlmZ d dlmZmZmZm Z m
Z
mZ d dlZd dl
mZ er6d dlZd dlmZ G dd� d�Zejdd �ddd��Zddd�ZdS )� )�annotationsN)�ThreadPoolExecutor)�
TYPE_CHECKING�AbstractSet�
Collection�Literal�NoReturn�Sequence)� _tiktokenc @ sd e Zd Zdd�dddd
�Zdedd�Zdfdd�Ze� dd�dgdd�Ze� dd�dhdd�Zd d!�did&d'�Z d e� dd(�djd)d*�Z
e� dd�dkd,d-�Zdld0d1�Zdmd5d6�Z
dndod9d:�Zdpd<d=�Zdqd?d@�ZdrdBdC�Zd7d dD�dsdGdH�Zd d!�dtdIdJ�ZdudKdL�ZedvdMdN��ZejdwdPdQ��ZdxdSdT�ZedvdUdV��ZdydWdX�ZdfdYdZ�Zdzd[d\�Zd{d^d_�Zd|dbdc�Z dS )}�EncodingN)�explicit_n_vocab�name�str�pat_str�mergeable_ranks�dict[bytes, int]�special_tokens�dict[str, int]r �
int | Nonec C sz || _ || _|| _|| _tt|�� �t|�� dd��| _|r3t|�t|� |ks*J �| j|d ks3J �t� |||�| _
dS )aw Creates an Encoding object.
See openai_public.py for examples of how to construct an Encoding object.
Args:
name: The name of the encoding. It should be clear from the name of the encoding
what behaviour to expect, in particular, encodings with different special tokens
should have different names.
pat_str: A regex pattern string that is used to split the input text.
mergeable_ranks: A dictionary mapping mergeable token bytes to their ranks. The ranks
must correspond to merge priority.
special_tokens: A dictionary mapping special token strings to their token values.
explicit_n_vocab: The number of tokens in the vocabulary. If provided, it is checked
that the number of mergeable tokens and special tokens is equal to this number.
r )�default� N)r
�_pat_str�_mergeable_ranks�_special_tokens�max�values�max_token_value�lenr
�CoreBPE� _core_bpe)�selfr
r r r r � r! �8/usr/local/lib/python3.10/dist-packages/tiktoken/core.py�__init__ s �zEncoding.__init__�returnc C s d| j �d�S )Nz
<Encoding �>)r
�r r! r! r"