lineapy.instrumentation package

Submodules

lineapy.instrumentation.annotation_spec module

Note

You can find a higher level documentation about how library annotations in lineapy work, and how to contribute here.

class lineapy.instrumentation.annotation_spec.AllPositionalArgs(*, all_positional_arguments: str)[source]

References all positional arguments. E.g., in foo(a, b), a and b.

Expecting the string to be set the value “ALL_POSITIONAL_ARGUMENTS”—see Result for an explanation

class lineapy.instrumentation.annotation_spec.Annotation(*, criteria: Union[KeywordArgumentCriteria, FunctionNames, ClassMethodNames, FunctionName, ClassMethodName], side_effects: List[Union[ViewOfValues, MutatedValue, ImplicitDependencyValue]])[source]

An annotation contains a single criteria for the function call, and the corresponding side_effects of the function call.

There are currently six types of criteria, all of which are explained in their respective classes:

There are currently three types of side_effects:

class lineapy.instrumentation.annotation_spec.BaseModel[source]

Forbid extras on baseclass so typos will raise an error

class lineapy.instrumentation.annotation_spec.BoundSelfOfFunction(*, self_ref: str)[source]

References the bound self of a function. E.g., in foo.test(a, b), foo would be the bound self.

If the function is a bound method, this refers to the instance that was bound of the method.

We are expecting “SELF_REF”—see Result for an explanation.

class lineapy.instrumentation.annotation_spec.ClassMethodName(*, class_instance: str, class_method_name: str)[source]

Specifies a class method name (as opposed to a function). An example is df.to_sql:

- criteria:
    class_method_name: to_sql
    class_instance: DataFrame
class lineapy.instrumentation.annotation_spec.ClassMethodNames(*, class_instance: str, class_method_names: List[str])[source]

A shorthand for a list of class method names (vs. a single one as in ClassMethodName).

- criteria:
    class_method_names:
    - to_csv
    - to_parquet
    class_instance: DataFrame
class lineapy.instrumentation.annotation_spec.ExternalState(*, external_state: str)[source]

Represents some reference to some state outside of the Python program. The two types of external state supported are DB and file_system.

If we ever make a reference to an external state instance, we assume that it depends on any mutations of previous references.

class lineapy.instrumentation.annotation_spec.FunctionName(*, function_name: str)[source]

A single function name (vs. a list in FunctionNames).

class lineapy.instrumentation.annotation_spec.FunctionNames(*, function_names: List[str])[source]

References a list of function names (vs. a single one in FunctionName).

One example is for the module pandas (and you can find the code here):

- criteria:
    function_names:
    - upload_file
    - upload_fileobj
class lineapy.instrumentation.annotation_spec.ImplicitDependencyValue(*, dependency: Union[PositionalArg, KeywordArgument, Result, BoundSelfOfFunction, ExternalState, AllPositionalArgs])[source]

References state that is implicitly depended on by the function. Currently it’s used for external state like db + filesystem.

class lineapy.instrumentation.annotation_spec.KeywordArgument(*, argument_keyword: str)[source]

References a keyword argument. E.g., in foo(a=1, b=2), a would have a keyword argument of a.

class lineapy.instrumentation.annotation_spec.KeywordArgumentCriteria(*, keyword_arg_name: str, keyword_arg_value: int, class_instance: str)[source]

Currently only used for the pandas in-place argument. We might need to augment how we parse it in the future for other inputs.

class lineapy.instrumentation.annotation_spec.ModuleAnnotation(*, module: str, annotations: List[Annotation])[source]

An annotation yaml file is composed of a list of ModuleAnnotations (this class), which is to say that the annotations are hierarchically organized by what module the annotation is associated with, such as pandas and boto3.

class lineapy.instrumentation.annotation_spec.MutatedValue(*, mutated_value: Union[PositionalArg, KeywordArgument, Result, BoundSelfOfFunction, ExternalState, AllPositionalArgs])[source]

A value that is mutated when the function is called. Consider the example of the dump function in joblib. It mutates the file_system, which is represented by ExternalState.

- module: joblib
  annotations:
  - criteria:
      function_name: dump
    side_effects:
    - mutated_value:
        external_state: file_system
class lineapy.instrumentation.annotation_spec.PositionalArg(*, positional_argument_index: int)[source]

References a positional argument. E.g., in foo(a, b), a would have a positional argument of 0.

class lineapy.instrumentation.annotation_spec.Result(*, result: str)[source]

References the result of a function. E.g., in bar = foo(a, b), bar would The result of a function call.

We are expecting “RESULT” for the field result—though it’s not needed for the python class, it is needed for yaml, and setting a default value makes the loader we use, pydantic, confused.

class lineapy.instrumentation.annotation_spec.ViewOfValues(*, views: List[Union[PositionalArg, KeywordArgument, Result, BoundSelfOfFunction, ExternalState, AllPositionalArgs]])[source]

A set of values which all potentially refer to shared pointers So that if one is mutated, the rest might be as well. They are unique, like a set, but ordered for deterministic behavior, hence a list.

Take the fit function in scikit-learn, if its assigned to a new variable, then the variable is aliased to the original variable. So we have the following annotation:

- base_module: sklearn.base
  annotations:
  - criteria:
      base_class: BaseEstimator
      class_method_name: fit
      side_effects:
      - mutated_value:
          self_ref: SELF_REF # self is a keyword...
      - views:
      - self_ref: SELF_REF
      - result: RESULT

lineapy.instrumentation.mutation_tracker module

class lineapy.instrumentation.mutation_tracker.MutationTracker(source_to_mutate: Dict[lineapy.data.types.LineaID, lineapy.data.types.LineaID] = <factory>, viewers: Dict[lineapy.data.types.LineaID, List[lineapy.data.types.LineaID]] = <factory>)[source]
get_latest_mutate_node(node_id: LineaID) LineaID[source]

Find the latest mutate node, that refers to it. Call this before creating a object based on another node.

set_as_mutated(source_id: LineaID) Iterable[Tuple[LineaID, LineaID]][source]

To process mutating a node, we create new mutate nodes for each views of this node and update the source to view mapping to point to the new nodes.

set_as_viewers_of_each_other(*ids: LineaID) None[source]

To process adding views between nodes, update the viewers data structure with all new viewers.

lineapy.instrumentation.mutation_tracker.set_as_viewers_generic(ids: List[T], viewers: Dict[T, List[T]]) None[source]

Generic version of method, so that we can use it in the settrace tracker as well

lineapy.instrumentation.tracer module

class lineapy.instrumentation.tracer.Tracer(db: lineapy.db.db.RelationalLineaDB, session_type: dataclasses.InitVar[SessionType], session_name: dataclasses.InitVar[typing.Optional[str]] = None, globals_: dataclasses.InitVar[typing.Optional[typing.Dict[str, object]]] = None, variable_name_to_node: Dict[str, Union[lineapy.data.types.ImportNode, lineapy.data.types.CallNode, lineapy.data.types.LiteralNode, lineapy.data.types.LookupNode, lineapy.data.types.MutateNode, lineapy.data.types.GlobalNode]] = <factory>, mutation_tracker: lineapy.instrumentation.mutation_tracker.MutationTracker = <factory>)[source]
assign(variable_name: str, value_node: Union[ImportNode, CallNode, LiteralNode, LookupNode, MutateNode, GlobalNode]) None[source]

Assign updates a local mapping of variable nodes.

It doesn’t save this to the graph, and currently the source location for the assignment is discarded. In the future, if we need to trace where in some code a node is assigned, we can record that again.

call(function_node: Union[ImportNode, CallNode, LiteralNode, LookupNode, MutateNode, GlobalNode], source_location: Optional[SourceLocation], *arguments: Union[ImportNode, CallNode, LiteralNode, LookupNode, MutateNode, GlobalNode, Tuple[bool, Union[ImportNode, CallNode, LiteralNode, LookupNode, MutateNode, GlobalNode]]], **keyword_arguments: Union[ImportNode, CallNode, LiteralNode, LookupNode, MutateNode, GlobalNode]) CallNode[source]
Parameters
  • function_node – the function node to call/execute

  • source_location – the source info from user code

  • arguments – positional arguments. These are passed as either Nodes (named nodes, constants, etc) or tuples (starred, the node) where the starred is a boolean to indicate whether the argument is supposed to be splatted before passing to the function (This is the case where you might call a function like so foo(1, *[2, 3]) ). The boolean is made optional simply to support the legacy way of calling this function and not having to pass the tuples for every single case from node_transformer

  • keyword_arguments – keyword arguments. These are passed as a dictionary of keyword arguments to the function. Similar to *positional_arguments, the keyword arguments can also be splatted by naming the key as unpack_<index> where <index> is the index of the argument. In this case, the dictionary will be unpacked and passed as keyword arguments to the function. The keyword arguments are processed in order of passing so any keyword conflicts will result in the last value accepted as the value for the keyword.

Returns

a call node

Note

  • It’s important for the call to return the call node so that we can programmatically chain the the nodes together, e.g., for the assignment call to modify the previous call node.

  • The call looks up if it’s a locally defined function. We decided that this is better for program slicing.

import_module(name: str, source_location: Optional[SourceLocation] = None) Union[ImportNode, CallNode, LiteralNode, LookupNode, MutateNode, GlobalNode][source]

Import a module. If we have already imported it, just return its ID. Otherwise, create new module nodes for each submodule in its parents and return it.

lookup_node(variable_name: str, source_location: Optional[SourceLocation] = None) Union[ImportNode, CallNode, LiteralNode, LookupNode, MutateNode, GlobalNode][source]

Cases for the node that we are looking up:

  • user defined variable & function definitions

  • imported libs

  • unknown runtime magic functions—special case to LookupNode

    • builtin functions, e.g., min

    • custom runtime, e.g., get_ipython

process_node(node: Union[ImportNode, CallNode, LiteralNode, LookupNode, MutateNode, GlobalNode]) None[source]

Execute a node, and adds it to the database.

trace_import(name: str, source_location: Optional[SourceLocation] = None, alias: Optional[str] = None, attributes: Optional[Dict[str, str]] = None) None[source]
  • name: the name of the module

  • alias: the module could be aliased, e.g., import pandas as pd

  • attributes: a list of functions imported from the library.

    It keys the aliased name to the original name.

Note

  • The input args would _either_ have alias or attributes, but not both

  • Didn’t call the function import because I think that’s a protected name

note that version and path will be introspected at runtime

property values: Dict[str, object]

Returns a mapping of variable names to their values, by joining the scoping information with the executor values.

lineapy.instrumentation.tracer_context module

class lineapy.instrumentation.tracer_context.TracerContext(db: RelationalLineaDB, session_context: SessionContext)[source]

This context will be used by tracer to store any data that is needed after the tracer has exited. This will hold reference to any internal dicts that are used outside the tracer and its session.

property artifacts: Dict[str, str]

Returns a mapping of artifact names to their sliced code.

property graph: Graph

Creates a graph by fetching all the nodes about this session from the DB.

Module contents