HEX

File: //home/arjun/.local/lib/python3.10/site-packages/langsmith/beta/__pycache__/_evals.cpython-310.pyc
o

���g��@s�dZddlZddlZddlZddlZddlmZmZmZm	Z	m
Z
mZddlm
ZddlmZddlmZddlmZddlmZdedefd	d
�Zdejded
eefdd�Zeddddd�de	ejdedeedeededed
ejfdd��Z deded
eejfdd�Z!ed�Z"ed�Z#dee"d ee#d
ee
e"e#ffd!d"�Z$ed#dd$�ded%e%d&ee&deed
df
d'd(��Z'dS))zfBeta utility functions to assist in common eval workflows.

These functions may change in the future.
�N)�DefaultDict�List�Optional�Sequence�Tuple�TypeVar)�
evaluation)�	warn_beta)�Client�run_dict�id_mapcCsf|d}|��D]\}}|�t|�t|��}q||d<|�d�r(||d|d<|�d�s1i|d<|S)aConvert the IDs in the run dictionary using the provided ID map.

    Parameters:
    - run_dict (dict): The dictionary representing a run.
    - id_map (dict): The dictionary mapping old IDs to new IDs.

    Returns:
    - dict: The updated run dictionary.
    �dotted_order�
parent_run_id�extra)�items�replace�str�get)rr�do�k�v�r�H/home/arjun/.local/lib/python3.10/site-packages/langsmith/beta/_evals.py�_convert_idss


r�root�run_to_example_map�returncs�|g}t��}|j|i�g}|rJ|��}|jhd�d�}��|dt����|d<�|d|d<�|d|d<|jrC|�|j�|�|�|s�fdd�|D�}||j	|dd<|S)	a&Convert the root run and its child runs to a list of dictionaries.

    Parameters:
    - root (ls_schemas.Run): The root run to convert.
    - run_to_example_map (dict): The dictionary mapping run IDs to example IDs.

    Returns:
    - List[dict]: The list of converted run dictionaries.
    >�
session_id�
child_run_ids�parent_run_ids)�exclude�id�trace_idcsg|]}t|���qSr)r��.0�r�rrr�
<listcomp>@�z%_convert_root_run.<locals>.<listcomp>r�reference_example_id)
�uuid�uuid4r"�pop�dictr�
child_runs�extend�appendr!)rr�runs_r"�results�src�src_dict�resultrr&r�_convert_root_run)s"


�	r6F)�test_project_name�client�load_child_runs�include_outputs�runs�dataset_namer7r8r9r:cst|s	td|�����pt����j|d�}|rdd�|D�nd}�jdd�|D�|dd�|D�|jd��s9|}n
��fd	d�|D�}|pPd
t��jdd���}t	�j
|d��}	dd
�|	D��|	djrj|	djn|	dj}
�fdd�|D�}�j
||jd|
��d�d�}|D])}
|
d|
d}tjjtjjd�|
d<|
d||
d<�jdi|
�d|i��q���|j�}|S)a�Convert the following runs to a dataset + test.

    This makes it easy to sample prod runs into a new regression testing
    workflow and compare against a candidate system.

    Internally, this function does the following:
        1. Create a dataset from the provided production run inputs.
        2. Create a new test project.
        3. Clone the production runs and re-upload against the dataset.

    Parameters:
    - runs (Sequence[ls_schemas.Run]): A sequence of runs to be executed as a test.
    - dataset_name (str): The name of the dataset to associate with the test runs.
    - client (Optional[Client]): An optional LangSmith client instance. If not provided,
        a new client will be created.
    - load_child_runs (bool): Whether to load child runs when copying runs.
        Defaults to False.

    Returns:
    - ls_schemas.TracerSession: The project containing the cloned runs.

    Examples:
    --------
    .. code-block:: python

        import langsmith
        import random

        client = langsmith.Client()

        # Randomly sample 100 runs from a prod project
        runs = list(client.list_runs(project_name="My Project", execution_order=1))
        sampled_runs = random.sample(runs, min(len(runs), 100))

        runs_as_test(runs, dataset_name="Random Runs")

        # Select runs named "extractor" whose root traces received good feedback
        runs = client.list_runs(
            project_name="<your_project>",
            filter='eq(name, "extractor")',
            trace_filter='and(eq(feedback_key, "user_score"), eq(feedback_score, 1))',
        )
        runs_as_test(runs, dataset_name="Extraction Good")
    z1Expected a non-empty sequence of runs. Received: )r<cS�g|]}|j�qSr)�outputsr#rrrr'�z(convert_runs_to_test.<locals>.<listcomp>NcSr=r)�inputsr#rrrr'�r?cSr=r)r!r#rrrr'�r?)r@r>�source_run_ids�
dataset_idcsg|]
}�j|j�d��qS))r9)�read_runr!r#)r8r9rrr'�s�zprod-baseline-�cSsi|]}|j|j�qSr)�
source_run_idr!)r$�errr�
<dictcomp>�r(z(convert_runs_to_test.<locals>.<dictcomp>rcs g|]}t|��D]}|�q	qSr)r6)r$�root_runr)rrrr'�s���z
prod-baseline)�which�dataset_version)�project_name�reference_dataset_id�metadata�end_time�
start_time)�tzrKr)�
ValueError�rt�get_cached_client�create_dataset�create_examplesr!r*r+�hex�list�
list_examples�modified_at�
created_at�create_project�	isoformat�datetime�now�timezone�utc�
create_run�update_project)r;r<r7r8r9r:�dsr>�runs_to_copy�examplesrJ�	to_create�project�new_run�latency�_r)r8r9rr�convert_runs_to_testEsP6���
���	�rkrKc	Cs�|j|d�}t�t�}g}i}|D]}|jdur!||j�|�n|�|�|||j<q|��D]\}}t|dd�d�||_	q0|S)N)rKcSs|jS�N)r
)r%rrr�<lambda>�sz%_load_nested_traces.<locals>.<lambda>)�key)
�	list_runs�collections�defaultdictrWrr0r!r�sortedr.)	rKr8r;�treemapr2�all_runs�run�run_idr.rrr�_load_nested_traces�s�

rw�T�U�list1�list2cCstt�||��Srl)rW�	itertools�product)rzr{rrr�_outer_product�sr~�
)�max_concurrencyr8�
evaluatorsr�cCs�ddlm}g}|D]#}t|tj�r|�|�q
t|�r%|�t�|��q
tdt	|�����|p3t
��}t||�}||d��}|j
|jgtt||���R�}	Wd�n1sXwY|	D]}
q_dS)aCompute test metrics for a given test name using a list of evaluators.

    Args:
        project_name (str): The name of the test project to evaluate.
        evaluators (list): A list of evaluators to compute metrics with.
        max_concurrency (Optional[int], optional): The maximum number of concurrent
            evaluations. Defaults to 10.
        client (Optional[Client], optional): The client to use for evaluations.
            Defaults to None.

    Returns:
        None: This function does not return any value.
    r)�ContextThreadPoolExecutorz5Evaluation not yet implemented for evaluator of type )�max_workersN)�	langsmithr��
isinstance�ls_eval�RunEvaluatorr0�callable�
run_evaluator�NotImplementedError�typerRrSrw�map�evaluate_run�zipr~)rKr�r�r8r��evaluators_�func�traces�executorr2rjrrr�compute_test_metrics�s,�
�
���r�)(�__doc__rpr]r|r*�typingrrrrrr�langsmith.run_trees�	run_treesrR�langsmith.schemas�schemas�
ls_schemasr�rr��#langsmith._internal._beta_decoratorr	�langsmith.clientr
r-r�Runr6r�bool�
TracerSessionrkrwrxryr~rW�intr�rrrr�<module>sf ��������k*������