HEX
Server: Apache/2.4.52 (Ubuntu)
System: Linux spn-python 5.15.0-89-generic #99-Ubuntu SMP Mon Oct 30 20:42:41 UTC 2023 x86_64
User: arjun (1000)
PHP: 8.1.2-1ubuntu2.20
Disabled: NONE
Upload Files
File: //usr/local/lib/python3.10/dist-packages/langsmith/evaluation/__pycache__/evaluator.cpython-310.pyc
o

���goy�@s<dZddlmZddlZddlZddlZddlmZddlm	Z	m
Z
mZmZm
Z
mZmZmZmZmZddlmZddlmZzddlmZmZmZmZWneyaddlmZmZmZmZYnwddlZdd	lm Z dd
l!m"Z"m#Z#m$Z$m%Z%e�&e'�Z(Gdd�de�Z)Gd
d�dedd�Z*Gdd�de�Z+Gdd�dedd�Z,Gdd�d�Z-ee+e,e.fZ/Gdd�de�Z0ee0e.fZ1Gdd�de-�Z2d=dd�Z3dZ4d>d"d#�Z5d?d'd(�Z6Gd)d*�d*�Z7d@d,d-�Z8dAd0d1�Z9dBd3d4�Z:dCd8d9�Z;eeeej%eej$gee+e,ffee
ej%e
ej$gee+e,fffZ<dDd;d<�Z=dS)Ez?This module contains the evaluator classes for evaluating runs.�)�annotationsN)�abstractmethod)
�Any�	Awaitable�Callable�Dict�List�Literal�Optional�Sequence�Union�cast)�	TypedDict)�schemas)�	BaseModel�Field�ValidationError�	validator)�wraps)�
SCORE_TYPE�
VALUE_TYPE�Example�Runc@s$eZdZUdZded<	ded<dS)�Categoryz$A category for categorical feedback.�Optional[Union[float, int]]�value�str�labelN��__name__�
__module__�__qualname__�__doc__�__annotations__�r$r$�I/usr/local/lib/python3.10/dist-packages/langsmith/evaluation/evaluator.pyr1s
rc@s8eZdZUdZded<	ded<	ded<	ded<d	S)
�FeedbackConfigziConfiguration to define a type of feedback.

    Applied on on the first creation of a feedback_key.
    z0Literal['continuous', 'categorical', 'freeform']�typer�min�maxz%Optional[List[Union[Category, dict]]]�
categoriesNrr$r$r$r%r&:s
r&F)�totalc@s�eZdZUdZded<	dZded<	dZded<	dZd	ed
<	dZded<	e	e
d
�Zded<	dZded<	dZ
ded<	dZded<	dZded<	Gdd�d�Zeddd�dd��ZdS)�EvaluationResultzEvaluation result.r�keyNr�scorerrz
Optional[str]�commentzOptional[Dict]�
correction)�default_factoryr�evaluator_infoz%Optional[Union[FeedbackConfig, dict]]�feedback_config�Optional[Union[uuid.UUID, str]]�
source_run_id�
target_run_id�extrac@seZdZdZdZdS)zEvaluationResult.ConfigzPydantic model configuration.FN)rr r!r"�allow_extrar$r$r$r%�Configdsr9T)�precCs6d|vs
|ddurt|ttf�rt�d|���|S)z$Check that the value is not numeric.r.NzJNumeric values should be provided in the 'score' field, not 'value'. Got: )�
isinstance�int�float�logger�warning)�cls�v�valuesr$r$r%�check_value_non_numericis��z(EvaluationResult.check_value_non_numeric)rr r!r"r#r.rr/r0r�dictr2r3r5r6r7r9rrCr$r$r$r%r,Is2

r,c@seZdZUdZded<dS)�EvaluationResultszqBatch evaluation results.

    This makes it easy for your evaluator to return multiple
    metrics at once.
    zList[EvaluationResult]�resultsNrr$r$r$r%rExs
rEc@s0eZdZdZe	d
dd	d
��Z	d
ddd�ZdS)�RunEvaluatorzEvaluator interface class.N�runr�example�Optional[Example]�return�*Union[EvaluationResult, EvaluationResults]cCsdS)zEvaluate an example.Nr$��selfrHrIr$r$r%�evaluate_run�szRunEvaluator.evaluate_runc�s�t���d|j||�IdHS)z#Evaluate an example asynchronously.N)�asyncio�get_running_loop�run_in_executorrOrMr$r$r%�
aevaluate_run�s�

�zRunEvaluator.aevaluate_run�N�rHrrIrJrKrL)rr r!r"rrOrSr$r$r$r%rG�s��rGc@s@eZdZUdZded<	ded<	dZded<	dZd	ed
<dS)�ComparisonEvaluationResultz�Feedback scores for the results of comparative evaluations.

    These are generated by functions that compare two or more runs,
    returning a ranking or other feedback.
    rr-z'Dict[Union[uuid.UUID, str], SCORE_TYPE]�scoresNr4r5z6Optional[Union[str, Dict[Union[uuid.UUID, str], str]]]r/)rr r!r"r#r5r/r$r$r$r%rV�s
rVcs�eZdZdZ	d+d,dd�Z		d-d.dd�Zd/dd�Zd0dd�Zed1dd��Z		d+d2d"d#�Z
d+d3�fd$d%�
Z	d+d2d&d'�Zd4d)d*�Z
�ZS)5�DynamicRunEvaluatora�A dynamic evaluator that wraps a function and transforms it into a `RunEvaluator`.

    This class is designed to be used with the `@run_evaluator` decorator, allowing
    functions that take a `Run` and an optional `Example` as arguments, and return
    an `EvaluationResult` or `EvaluationResults`, to be used as instances of `RunEvaluator`.

    Attributes:
        func (Callable): The function that is wrapped by this evaluator.
    N�func�XCallable[[Run, Optional[Example]], Union[_RUNNABLE_OUTPUT, Awaitable[_RUNNABLE_OUTPUT]]]�afunc�IOptional[Callable[[Run, Optional[Example]], Awaitable[_RUNNABLE_OUTPUT]]]cCs�t|�}|r
t|�}t|�|�ddlm}|dur)|j|td�|_t|dd�|_t	�
|�rG|dur6td��|j|td�|_t|dd�|_dS|jtt
tttgtf|�td�|_t|dd�|_dS)z�Initialize the DynamicRunEvaluator with a given function.

        Args:
            func (Callable): A function that takes a `Run` and an optional `Example` as
            arguments, and returns a dict or `ComparisonEvaluationResult`.
        r��run_helpersN��process_inputsrrX��Func was provided as a coroutine function, but afunc was also provided. If providing both, func should be a regular function to avoid ambiguity.)�_normalize_evaluator_funcr�	langsmithr^�ensure_traceable�_serialize_inputsr[�getattr�_name�inspect�iscoroutinefunction�	TypeErrorr
rrr
r�_RUNNABLE_OUTPUTrY�rNrYr[r^r$r$r%�__init__�s0�
���zDynamicRunEvaluator.__init__F�result�Union[EvaluationResult, dict]r5�	uuid.UUID�allow_no_key�boolrKr,c
s�t�t�r
�js|�_�Sz2�std�����d�vr"|r"|j�d<t�fdd�dD��r4td�����td	id|i����WStyS}ztd����|�d}~ww)
NziExpected an EvaluationResult object, or dict with a metric 'key' and optional 'score'; got empty result: r-c3s�|]}|�vVqdSrTr$)�.0�k�rnr$r%�	<genexpr>���z@DynamicRunEvaluator._coerce_evaluation_result.<locals>.<genexpr>)r.rr/zrExpected an EvaluationResult object, or dict with a metric 'key' and optional 'score' or categorical 'value'; got r5z[Expected an EvaluationResult object, or dict with a metric 'key' and optional 'score'; got r$)r;r,r5�
ValueErrorrg�allr)rNrnr5rq�er$rur%�_coerce_evaluation_result�s<
��
�������z-DynamicRunEvaluator._coerce_evaluation_resultrF�Union[dict, EvaluationResults]rLcsPd|vr|��}��fdd�|dD�|d<tdi|��S�jtt|��dd�S)NrFcsg|]	}�j|�d��qS))r5)r{)rs�r�rNr5r$r%�
<listcomp>s��zBDynamicRunEvaluator._coerce_evaluation_results.<locals>.<listcomp>T)r5rqr$)�copyrEr{r
rD)rNrFr5�cpr$r~r%�_coerce_evaluation_resultss
��z.DynamicRunEvaluator._coerce_evaluation_results�MUnion[EvaluationResult, EvaluationResults, dict, str, int, bool, float, list]cCs.t|t�r
|js||_|St|�}|�||�SrT)r;r,r5�_format_evaluator_resultr�)rNrnr5r$r$r%�_format_results
z"DynamicRunEvaluator._format_resultcC�
t|d�S�z�Check if the evaluator function is asynchronous.

        Returns:
            bool: True if the evaluator function is asynchronous, False otherwise.
        r[��hasattr�rNr$r$r%�is_async'�
zDynamicRunEvaluator.is_asyncrHrrIrJcCs�t|d�st��}|��rtd��|�|�||��St��}d|j	i}t
|dd�r0t|j�|d<|j
||||d�d�}|�||�S)	a�Evaluate a run using the wrapped function.

        This method directly invokes the wrapped function with the provided arguments.

        Args:
            run (Run): The run to be evaluated.
            example (Optional[Example]): An optional example to be used in the evaluation.

        Returns:
            Union[EvaluationResult, EvaluationResults]: The result of the evaluation.
        rY�tCannot call `evaluate_run` on an async run evaluator from within an running event loop. Use `aevaluate_run` instead.r6�
session_idN�
experiment��run_id�metadata��langsmith_extra)r�rP�get_event_loop�
is_running�RuntimeError�run_until_completerS�uuid�uuid4�idrfrr�rYr�)rNrHrI�running_loopr5r�rnr$r$r%rO0s"
�
�z DynamicRunEvaluator.evaluate_runc�st�t|d�st��||�IdHSt��}d|ji}t|dd�r&t|j�|d<|j	||||d�d�IdH}|�
||�S)a�Evaluate a run asynchronously using the wrapped async function.

        This method directly invokes the wrapped async function with the
            provided arguments.

        Args:
            run (Run): The run to be evaluated.
            example (Optional[Example]): An optional example to be used
                in the evaluation.

        Returns:
            Union[EvaluationResult, EvaluationResults]: The result of the evaluation.
        r[Nr6r�r�r�r�)r��superrSr�r�r�rfrr�r[r�)rNrHrIr5r�rn��	__class__r$r%rSRs�

�z!DynamicRunEvaluator.aevaluate_runcC�|�||�S)a�Make the evaluator callable, allowing it to be used like a function.

        This method enables the evaluator instance to be called directly, forwarding the
        call to `evaluate_run`.

        Args:
            run (Run): The run to be evaluated.
            example (Optional[Example]): An optional example to be used in the evaluation.

        Returns:
            Union[EvaluationResult, EvaluationResults]: The result of the evaluation.
        )rOrMr$r$r%�__call__m�zDynamicRunEvaluator.__call__rcC�d|j�d�S)�)Represent the DynamicRunEvaluator object.z<DynamicRunEvaluator �>�rgr�r$r$r%�__repr__~�zDynamicRunEvaluator.__repr__rT)rYrZr[r\)F)rnror5rprqrrrKr,)rFr|r5rprKrL)rnr�r5rprKrL�rKrrrU)rHrrIrJ�rKr)rr r!r"rmr{r�r��propertyr�rOrSr�r��
__classcell__r$r$r�r%rX�s �6�

	�"�rXrYrZcC�t|�S)zmCreate a run evaluator from a function.

    Decorator that transforms a function into a `RunEvaluator`.
    )rX�rYr$r$r%�
run_evaluator�s	r�i'�objrcCs,t|�}t|�tkr|dtd�d}|S)N�z...))�repr�len�_MAXSIZE)r��sr$r$r%�
_maxsize_repr�sr��inputsrDrKcCs&t|�d��}t|�d��}||d�S)NrHrI)rHrI)r��get)r��
run_truncated�example_truncatedr$r$r%re�s
rec@sxeZdZdZ	d$d%dd�Zed&dd��Z	d$d'dd�Z	d$d'dd�Z	d$d'dd�Z	d(dd�Z
ed)dd��Zd*d"d#�Z
dS)+�DynamicComparisonRunEvaluatorz4Compare predictions (as traces) from 2 or more runs.NrY�fCallable[[Sequence[Run], Optional[Example]], Union[_COMPARISON_OUTPUT, Awaitable[_COMPARISON_OUTPUT]]]r[�UOptional[Callable[[Sequence[Run], Optional[Example]], Awaitable[_COMPARISON_OUTPUT]]]cCs�t|�}|r
t|�}t|�|�ddlm}|dur)|j|td�|_t|dd�|_t	�
|�rG|dur6td��|j|td�|_t|dd�|_dS|jtt
ttttgtf|�td�|_t|dd�|_dS)z�Initialize the DynamicRunEvaluator with a given function.

        Args:
            func (Callable): A function that takes a `Run` and an optional `Example` as
            arguments, and returns an `EvaluationResult` or `EvaluationResults`.
        rr]Nr_rrXra)�$_normalize_comparison_evaluator_funcrrcr^rdrer[rfrgrhrirjr
rrrr
r�_COMPARISON_OUTPUTrYrlr$r$r%rm�s>�
������
z&DynamicComparisonRunEvaluator.__init__rKrrcCr�r�r�r�r$r$r%r��r�z&DynamicComparisonRunEvaluator.is_async�runs�
Sequence[Run]rIrJrVcCsjt|d�st��}|��rtd��|�|�||��St��}|�	|�}|j
||||d�d�}|�|||�S)z�Compare runs to score preferences.

        Args:
            runs: A list of runs to compare.
            example: An optional example to be used in the evaluation.

        rYr��r��tagsr�)r�rPr�r�r�r��
acompare_runsr�r��	_get_tagsrY�_format_results)rNr�rIr�r5r�rnr$r$r%�compare_runs�s"

�
�
�z*DynamicComparisonRunEvaluator.compare_runsc�sT�t|d�s|�||�St��}|�|�}|j||||d�d�IdH}|�|||�S)a�Evaluate a run asynchronously using the wrapped async function.

        This method directly invokes the wrapped async function with the
            provided arguments.

        Args:
            runs (Run): The runs to be evaluated.
            example (Optional[Example]): An optional example to be used
                in the evaluation.

        Returns:
            ComparisonEvaluationResult: The result of the evaluation.
        r[r�r�N)r�r�r�r�r�r[r�)rNr�rIr5r�rnr$r$r%r�s�

�z+DynamicComparisonRunEvaluator.acompare_runscCr�)a�Make the evaluator callable, allowing it to be used like a function.

        This method enables the evaluator instance to be called directly, forwarding the
        call to `evaluate_run`.

        Args:
            run (Run): The run to be evaluated.
            example (Optional[Example]): An optional example to be used in the evaluation.

        Returns:
            ComparisonEvaluationResult: The result of the evaluation.
        )r�)rNr�rIr$r$r%r�r�z&DynamicComparisonRunEvaluator.__call__rcCr�)r�z<DynamicComparisonRunEvaluator r�r�r�r$r$r%r�/r�z&DynamicComparisonRunEvaluator.__repr__�	List[str]cCsFg}|D]}|�dt|j��t|dd�r |�dt|j��q|S)zExtract tags from runs.zrun:r�Nzexperiment:)�appendrr�rfr�)r�r�rHr$r$r%r�3s�z'DynamicComparisonRunEvaluator._get_tagsrn�-Union[dict, list, ComparisonEvaluationResult]r5rpc
Cs�t|t�r
|js||_|St|t�r"dd�t||�D�|j|d�}nt|t�r1d|vr0|j|d<n	d|��}t|��ztdid|i|���WStyZ}ztd|���|�d}~ww)	NcSsi|]\}}|j|�qSr$)r�)rsrHr.r$r$r%�
<dictcomp>JszADynamicComparisonRunEvaluator._format_results.<locals>.<dictcomp>)rWr-r5r-zXExpected 'dict', 'list' or 'ComparisonEvaluationResult' result object. Received: result=r5z�Expected a dictionary with a 'key' and dictionary of scores mappingrun IDs to numeric scores, or ComparisonEvaluationResult object, got r$)	r;rVr5�list�ziprgrDrxr)rNrnr5r��msgrzr$r$r%r�>s@

�

���
������z-DynamicComparisonRunEvaluator._format_resultsrT)rYr�r[r�r�)r�r�rIrJrKrVr�)r�r�rKr�)rnr�r5rpr�r�rKrV)rr r!r"rmr�r�r�r�r�r��staticmethodr�r�r$r$r$r%r��s �8	� ��

r�r�cCr�)z.Create a comaprison evaluator from a function.)r�r�r$r$r%�comparison_evaluatorcsr�r�|Union[Callable[[Run, Optional[Example]], _RUNNABLE_OUTPUT], Callable[[Run, Optional[Example]], Awaitable[_RUNNABLE_OUTPUT]]]c�d�t���}dd�|j��D���r$t�fdd��D��s.t��dkr.d��d�}t|��t�fd	d��D��r?�d
dgkrA�St���rad��fdd�}t�d�r[t	�d�|_
|S|j
|_
|Sd��fdd�}t�d�rvt	�d�|_
|S|j
|_
|S)N)rHrIr��outputs�reference_outputs�attachmentscS�&g|]\}}|j|j|jfvr|�qSr$��kind�POSITIONAL_OR_KEYWORD�POSITIONAL_ONLY�rs�pname�pr$r$r%r|�
�z-_normalize_evaluator_func.<locals>.<listcomp>c3��|]}|�vVqdSrTr$�rsr���supported_argsr$r%rv�rwz,_normalize_evaluator_func.<locals>.<genexpr>��kInvalid evaluator function. Must have at least one positional argument. Supported positional arguments are ��. Please see https://docs.smith.langchain.com/evaluation/how_to_guides/evaluation/evaluate_llm_application#use-custom-evaluatorsc3r�rTr$r�r�r$r%rv���
�rHrIrrJrKrkc�s`�|||r|jni|jp
i|r|jpini|r|jpinid���fdd��D�}�|�IdHS)N�rHrIr�r�r�r�c3��|]}�|VqdSrTr$�rs�arg��arg_mapr$r%rv�rwz>_normalize_evaluator_func.<locals>.awrapper.<locals>.<genexpr>�r�r�r��rHrI�args�rY�positional_argsr�r%�awrapper�s����z+_normalize_evaluator_func.<locals>.awrapperrrcsP|||r|jni|jpi|jpi|r|jpinid���fdd��D�}�|�S)Nr�c3r�rTr$r�r�r$r%rv�rwz=_normalize_evaluator_func.<locals>.wrapper.<locals>.<genexpr>r�r�r�r�r%�wrapper�s��z*_normalize_evaluator_func.<locals>.wrapper)rHrrIrJrKrk)rHrrIrrKrk�rh�	signature�
parameters�itemsryr�rxrir�rfr�rY�sigr�r�r�r$�rYr�r�r%rbmsH
�����

����
����rb��Union[Callable[[Sequence[Run], Optional[Example]], _COMPARISON_OUTPUT], Callable[[Sequence[Run], Optional[Example]], Awaitable[_COMPARISON_OUTPUT]]]cr�)N�r�rIr�r�r�cSr�r$r�r�r$r$r%r�r�z8_normalize_comparison_evaluator_func.<locals>.<listcomp>c3r�rTr$r�r�r$r%rv�rwz7_normalize_comparison_evaluator_func.<locals>.<genexpr>r�r�r�c3r�rTr$r�r�r$r%rv�r�r�rIr�rJrKr�c�sT�|||r|jnidd�|D�|r|jpinid���fdd��D�}�|�IdHS)NcS�g|]}|jpi�qSr$�r��rsrHr$r$r%r��zJ_normalize_comparison_evaluator_func.<locals>.awrapper.<locals>.<listcomp>r�c3r�rTr$r�r�r$r%rv�rwzI_normalize_comparison_evaluator_func.<locals>.awrapper.<locals>.<genexpr>�r�r��r�rIr�r�r�r%r��s���z6_normalize_comparison_evaluator_func.<locals>.awrapperrrcsL|||r|jnidd�|D�|r|jpinid���fdd��D�}�|�S)NcSr�r$r�r�r$r$r%r�rzI_normalize_comparison_evaluator_func.<locals>.wrapper.<locals>.<listcomp>r�c3r�rTr$r�r�r$r%rv�rwzH_normalize_comparison_evaluator_func.<locals>.wrapper.<locals>.<genexpr>rrr�r�r%r��s��z5_normalize_comparison_evaluator_func.<locals>.wrapper)r�r�rIrJrKr�)r�r�rIrrKr�r�r�r$r�r%r��sH
�����

����
����r�rn�;Union[EvaluationResults, dict, str, int, bool, float, list]�Union[EvaluationResults, dict]cCs�t|tttf�rd|i}|S|std|����t|t�r3tdd�|D��s-td|�d���d|i}|St|t�r>d|i}|St|t�rF	|Std	|����)
Nr.zdExpected a non-empty dict, str, bool, int, float, list, EvaluationResult, or EvaluationResults. Got css�|]}t|t�VqdSrT)r;rD)rs�xr$r$r%rvs�z+_format_evaluator_result.<locals>.<genexpr>z8Expected a list of dicts or EvaluationResults. Received �.rFrzZExpected a dict, str, bool, int, float, list, EvaluationResult, or EvaluationResults. Got )	r;rrr=r<rxr�ryrrDrur$r$r%r�
s6���

�

�
����r��SUMMARY_EVALUATOR_Tcs�d�t���}dd�|j��D���r$t�fdd��D��s8t��dkr8d��d�}�r4|d	��d�7}t|��t�fd
d��D��rI�ddgkrK�Sd��fdd�}t�d�r`t�d�|_	|S|j	|_	|S)N�r��examplesr�r�r�cSr�r$r�r�r$r$r%r5r�z0_normalize_summary_evaluator.<locals>.<listcomp>c3r�rTr$r�r�r$r%rv;rwz/_normalize_summary_evaluator.<locals>.<genexpr>r�r�rz Received positional arguments c3r�rTr$r�r�r$r%rvGr�r�r	�Sequence[schemas.Run]�Sequence[schemas.Example]rKrLcs^||dd�|D�dd�|D�dd�|D�d���fdd��D�}�|�}t|t�r+|St|�S)NcSsg|]}|j�qSr$)r��rsrIr$r$r%rSszA_normalize_summary_evaluator.<locals>.wrapper.<locals>.<listcomp>cSr�r$r�r�r$r$r%rTrcSr�r$r�rr$r$r%rUrrc3r�rTr$r�r�r$r%rvWrwz@_normalize_summary_evaluator.<locals>.wrapper.<locals>.<genexpr>)r;r,r�)r�r	r�rnr�r�r%r�Ms�
z-_normalize_summary_evaluator.<locals>.wrapperr)r�r
r	rrKrL)
rhr�r�r�ryr�rxr�rfr)rYr�r�r�r$r�r%�_normalize_summary_evaluator2s8
��������r
)rYrZ)r�r)r�rDrKrD)rYr�rKr�)rYrrKr�)rYrrKr�)rnrrKr)rYrrKr)>r"�
__future__rrPrhr��abcr�typingrrrrrr	r
rrr
�typing_extensionsrrcr�pydantic.v1rrrr�ImportError�pydantic�logging�	functoolsr�langsmith.schemasrrrr�	getLoggerrr>rr&r,rErGrDrkrVr�rXr�r�r�rer�r�rbr�r�rr
r$r$r$r%�<module>sh0
�
	/
W


E


S
J
��
����