Skip to content

annotation_spec

Annotation specs.

Note

You can find a higher level documentation about how library annotations in lineapy work, and how to contribute here.

Note

Developer Note:

  • All the classes in the ValuePointer follow this weird structure where their field entries duplicate the class name — this is so that when we load the YAMLs, they can differentiate the class based just by the field names.

  • Also the string values for AllPositionalArgs, BoundSelfOfFunction, and Result are useless as well — just there so that we abide by the yaml structure. It's not very elegant and we can refactor this later.

AllPositionalArgs

Bases: BaseModel

References all positional arguments. E.g., in foo(a, b), a and b.

Expecting the string to be set the value "ALL_POSITIONAL_ARGUMENTS"---see Result for an explanation

Source code in lineapy/instrumentation/annotation_spec.py
58
59
60
61
62
63
64
65
class AllPositionalArgs(BaseModel):
    """
    References all positional arguments. E.g., in `foo(a, b)`, `a` and `b`.

    Expecting the string to be set the value "ALL_POSITIONAL_ARGUMENTS"---see [Result][lineapy.instrumentation.annotation_spec.Result] for an explanation
    """

    all_positional_arguments: str

Annotation

Bases: BaseModel

An annotation contains a single criteria for the function call, and the corresponding side_effects of the function call.

There are currently six types of criteria, all of which are explained in their respective classes:

There are currently three types of side_effects:

Source code in lineapy/instrumentation/annotation_spec.py
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
class Annotation(BaseModel):
    """
    An annotation contains a single criteria for the function call,
    and the corresponding `side_effects` of the function call.

    There are currently six types of criteria, all of which are explained in their respective classes:

    - [KeywordArgumentCriteria][lineapy.instrumentation.annotation_spec.KeywordArgumentCriteria]
    - [FunctionNames][lineapy.instrumentation.annotation_spec.FunctionNames]
    - [ClassMethodNames][lineapy.instrumentation.annotation_spec.ClassMethodNames]
    - [FunctionName][lineapy.instrumentation.annotation_spec.FunctionName]
    - [ClassMethodName][lineapy.instrumentation.annotation_spec.ClassMethodName]

    There are currently three types of side_effects:

    - [ViewOfValues][lineapy.instrumentation.annotation_spec.ViewOfValues]
    - [MutatedValue][lineapy.instrumentation.annotation_spec.MutatedValue]
    - [ImplicitDependencyValue][lineapy.instrumentation.annotation_spec.ImplicitDependencyValue]
    """

    criteria: Criteria
    side_effects: List[InspectFunctionSideEffect]

BaseModel

Bases: pydantic.BaseModel

Forbid extras on baseclass so typos will raise an error

Source code in lineapy/instrumentation/annotation_spec.py
31
32
33
34
35
36
37
class BaseModel(pydantic.BaseModel):
    """
    Forbid extras on baseclass so typos will raise an error
    """

    class Config:
        extra = "forbid"

BoundSelfOfFunction

Bases: BaseModel

References the bound self of a function. E.g., in foo.test(a, b), foo would be the bound self.

If the function is a bound method, this refers to the instance that was bound of the method.

We are expecting "SELF_REF"---see Result for an explanation.

Source code in lineapy/instrumentation/annotation_spec.py
68
69
70
71
72
73
74
75
76
77
78
79
80
class BoundSelfOfFunction(BaseModel):
    """

    References the bound self of a function. E.g., in ``foo.test(a, b)``,
    `foo` would be the bound self.

    If the function is a bound method, this refers to the instance that was
    bound of the method.

    We are expecting "SELF_REF"---see [Result][lineapy.instrumentation.annotation_spec.Result] for an explanation.
    """

    self_ref: str

ClassMethodName

Bases: BaseModel

Specifies a class method name (as opposed to a function). An example is df.to_sql:

    - criteria:
        class_method_name: to_sql
        class_instance: DataFrame
Source code in lineapy/instrumentation/annotation_spec.py
234
235
236
237
238
239
240
241
242
243
244
245
246
247
class ClassMethodName(BaseModel):
    """
    Specifies a **class** method name (as opposed to a function). An example is `df.to_sql`:

    ``` yaml

        - criteria:
            class_method_name: to_sql
            class_instance: DataFrame
    ```
    """

    class_instance: str
    class_method_name: str

ClassMethodNames

Bases: BaseModel

A shorthand for a list of class method names (vs. a single one as in ClassMethodName).

    - criteria:
        class_method_names:
        - to_csv
        - to_parquet
        class_instance: DataFrame
Source code in lineapy/instrumentation/annotation_spec.py
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
class ClassMethodNames(BaseModel):
    """
    A shorthand for a list of class method names (vs. a single one
    as in [ClassMethodName][lineapy.instrumentation.annotation_spec.ClassMethodName]).

    ``` yaml

        - criteria:
            class_method_names:
            - to_csv
            - to_parquet
            class_instance: DataFrame
    ```

    """

    class_instance: str
    class_method_names: List[str]

ExternalState

Bases: BaseModel

Represents some reference to some state outside of the Python program. The two types of external state supported are DB and file_system.

If we ever make a reference to an external state instance, we assume that it depends on any mutations of previous references.

Source code in lineapy/instrumentation/annotation_spec.py
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
class ExternalState(BaseModel):
    """
    Represents some reference to some state outside of the Python program. The two types of external state supported are ``DB`` and ``file_system``.

    If we ever make a reference to an external state instance, we assume
    that it depends on any mutations of previous references.

    """

    external_state: str

    @property
    def __name__(self):
        return self.external_state

    def __hash__(self):
        """
        Elsewhere we need ``ExternalState`` to be hashable, it was pretty easy
        with Dataclass (frozen option), but with Pydantic, we have to add an
        extra hash function
        [link][https://github.com/samuelcolvin/pydantic/issues/1303]
        """
        return hash((type(self),) + tuple(self.__dict__.values()))

__hash__()

Elsewhere we need ExternalState to be hashable, it was pretty easy with Dataclass (frozen option), but with Pydantic, we have to add an extra hash function [link][https://github.com/samuelcolvin/pydantic/issues/1303]

Source code in lineapy/instrumentation/annotation_spec.py
111
112
113
114
115
116
117
118
def __hash__(self):
    """
    Elsewhere we need ``ExternalState`` to be hashable, it was pretty easy
    with Dataclass (frozen option), but with Pydantic, we have to add an
    extra hash function
    [link][https://github.com/samuelcolvin/pydantic/issues/1303]
    """
    return hash((type(self),) + tuple(self.__dict__.values()))

FunctionName

Bases: BaseModel

A single function name (vs. a list in FunctionNames).

Source code in lineapy/instrumentation/annotation_spec.py
226
227
228
229
230
231
class FunctionName(BaseModel):
    """
    A single function name (vs. a list in [FunctionNames][lineapy.instrumentation.annotation_spec.FunctionNames]).
    """

    function_name: str

FunctionNames

Bases: BaseModel

References a list of function names (vs. a single one in FunctionName).

One example is for the module boto3 (you can find all the annotations [here][https://github.com/LineaLabs/lineapy/blob/main/lineapy/annotations/external]):

    - criteria:
        function_names:
        - upload_file
        - upload_fileobj
Source code in lineapy/instrumentation/annotation_spec.py
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
class FunctionNames(BaseModel):
    """
    References a list of function names (vs. a single one in [FunctionName][lineapy.instrumentation.annotation_spec.FunctionName]).

    One example is for the module `boto3` (you can find all the annotations [here][https://github.com/LineaLabs/lineapy/blob/main/lineapy/annotations/external]):

    ``` yaml

        - criteria:
            function_names:
            - upload_file
            - upload_fileobj
    ```
    """

    function_names: List[str]

ImplicitDependencyValue

Bases: BaseModel

References state that is implicitly depended on by the function. Currently it's used for external state like db + filesystem.

Source code in lineapy/instrumentation/annotation_spec.py
183
184
185
186
187
188
189
class ImplicitDependencyValue(BaseModel):
    """
    References state that is implicitly depended on by the function.
    Currently it's used for external state like db + filesystem.
    """

    dependency: ValuePointer

KeywordArgument

Bases: BaseModel

References a keyword argument. E.g., in foo(a=1, b=2), a would have a keyword argument of a.

Source code in lineapy/instrumentation/annotation_spec.py
49
50
51
52
53
54
55
class KeywordArgument(BaseModel):
    """
    References a keyword argument. E.g., in `foo(a=1, b=2)`, `a` would have a
    keyword argument of `a`.
    """

    argument_keyword: str

KeywordArgumentCriteria

Bases: BaseModel

Currently only used for the pandas in-place argument. We might need to augment how we parse it in the future for other inputs.

Source code in lineapy/instrumentation/annotation_spec.py
197
198
199
200
201
202
203
204
205
class KeywordArgumentCriteria(BaseModel):
    """
    Currently only used for the pandas in-place argument.
    We might need to augment how we parse it in the future for other inputs.
    """

    keyword_arg_name: str
    keyword_arg_value: int
    class_instance: str

ModuleAnnotation

Bases: BaseModel

An annotation yaml file is composed of a list of ModuleAnnotation, which is to say that the annotations are hierarchically organized by what module the annotation is associated with, such as pandas and boto3.

Source code in lineapy/instrumentation/annotation_spec.py
304
305
306
307
308
309
310
311
312
313
314
315
class ModuleAnnotation(BaseModel):
    """
    An annotation yaml file is composed of a list of [ModuleAnnotation][lineapy.instrumentation.annotation_spec.ModuleAnnotation], which is to say that the annotations are hierarchically organized
    by what module the annotation is associated with, such as ``pandas`` and ``boto3``.
    """

    module: str
    annotations: List[Annotation]

    class Config:
        allow_mutation = False
        extra = "forbid"

MutatedValue

Bases: BaseModel

A value that is mutated when the function is called. Consider the example of the dump function in joblib. It mutates the file_system, which is represented by ExternalState.

    - module: joblib
      annotations:
      - criteria:
          function_name: dump
        side_effects:
        - mutated_value:
            external_state: file_system
Source code in lineapy/instrumentation/annotation_spec.py
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
class MutatedValue(BaseModel):
    """
    A value that is mutated when the function is called. Consider the example
    of the ``dump`` function in ``joblib``. It mutates the file_system, which
    is represented by [ExternalState][lineapy.instrumentation.annotation_spec.ExternalState].

    ``` yaml

        - module: joblib
          annotations:
          - criteria:
              function_name: dump
            side_effects:
            - mutated_value:
                external_state: file_system
    ```
    """

    mutated_value: ValuePointer

PositionalArg

Bases: BaseModel

References a positional argument. E.g., in foo(a, b), a would have a positional argument of 0.

Source code in lineapy/instrumentation/annotation_spec.py
40
41
42
43
44
45
46
class PositionalArg(BaseModel):
    """
    References a positional argument. E.g., in `foo(a, b)`, `a` would have a
    positional argument of 0.
    """

    positional_argument_index: int

Result

Bases: BaseModel

References the result of a function. E.g., in bar = foo(a, b), bar would The result of a function call.

We are expecting "RESULT" for the field result---though it's not needed for the python class, it is needed for yaml, and setting a default value makes the loader we use, pydantic, confused.

Source code in lineapy/instrumentation/annotation_spec.py
83
84
85
86
87
88
89
90
91
92
93
class Result(BaseModel):
    """
    References the result of a function. E.g., in ``bar = foo(a, b)``, ``bar`` would
    The result of a function call.

    We are expecting "RESULT" for the field ``result``---though it's not needed
    for the python class, it is needed for yaml, and setting a default value
    makes the loader we use, pydantic, confused.
    """

    result: str

ViewOfValues

Bases: BaseModel

A set of values which all potentially refer to shared pointers So that if one is mutated, the rest might be as well. They are unique, like a set, but ordered for deterministic behavior, hence a list.

Take the fit function in scikit-learn, if its assigned to a new variable, then the variable is aliased to the original variable. So we have the following annotation:

      - base_module: sklearn.base
        annotations:
        - criteria:
            base_class: BaseEstimator
            class_method_name: fit
            side_effects:
            - mutated_value:
                self_ref: SELF_REF # self is a keyword...
            - views:
            - self_ref: SELF_REF
            - result: RESULT
Source code in lineapy/instrumentation/annotation_spec.py
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
class ViewOfValues(BaseModel):
    """
    A set of values which all potentially refer to shared pointers
    So that if one is mutated, the rest might be as well.
    They are unique, like a set, but ordered for deterministic behavior,
    hence a list.

    Take the ``fit`` function in scikit-learn, if its assigned to a new variable,
    then the variable is aliased to the original variable.
    So we have the following annotation:

    ``` yaml
          - base_module: sklearn.base
            annotations:
            - criteria:
                base_class: BaseEstimator
                class_method_name: fit
                side_effects:
                - mutated_value:
                    self_ref: SELF_REF # self is a keyword...
                - views:
                - self_ref: SELF_REF
                - result: RESULT
    ```

    """

    views: List[ValuePointer]

Was this helpful?

Help us improve docs with your feedback!