lineapy package
Subpackages
- lineapy.api package
- lineapy.cli package
- lineapy.data package
- lineapy.db package
- lineapy.editors package
- lineapy.exceptions package
- lineapy.execution package
- lineapy.graph_reader package
- lineapy.instrumentation package
- lineapy.plugins package
- lineapy.system_tracing package
- lineapy.transformer package
- lineapy.utils package
- lineapy.visualizer package
Module contents
- class lineapy.SessionType(value)[source]
Session types allow the tracer to know what to expect - JUPYTER: the tracer need to progressively add more nodes to the graph - SCRIPT: the easiest case, run everything until the end
- class lineapy.Tracer(db: lineapy.db.db.RelationalLineaDB, session_type: dataclasses.InitVar[SessionType], session_name: dataclasses.InitVar[typing.Optional[str]] = None, globals_: dataclasses.InitVar[typing.Optional[typing.Dict[str, object]]] = None, variable_name_to_node: Dict[str, Union[lineapy.data.types.ImportNode, lineapy.data.types.CallNode, lineapy.data.types.LiteralNode, lineapy.data.types.LookupNode, lineapy.data.types.MutateNode, lineapy.data.types.GlobalNode]] = <factory>, mutation_tracker: lineapy.instrumentation.mutation_tracker.MutationTracker = <factory>)[source]
- assign(variable_name: str, value_node: Union[ImportNode, CallNode, LiteralNode, LookupNode, MutateNode, GlobalNode]) None [source]
Assign updates a local mapping of variable nodes.
It doesn’t save this to the graph, and currently the source location for the assignment is discarded. In the future, if we need to trace where in some code a node is assigned, we can record that again.
- call(function_node: Union[ImportNode, CallNode, LiteralNode, LookupNode, MutateNode, GlobalNode], source_location: Optional[SourceLocation], *arguments: Union[ImportNode, CallNode, LiteralNode, LookupNode, MutateNode, GlobalNode, Tuple[bool, Union[ImportNode, CallNode, LiteralNode, LookupNode, MutateNode, GlobalNode]]], **keyword_arguments: Union[ImportNode, CallNode, LiteralNode, LookupNode, MutateNode, GlobalNode]) CallNode [source]
- Parameters
function_node – the function node to call/execute
source_location – the source info from user code
arguments – positional arguments. These are passed as either Nodes (named nodes, constants, etc) or tuples (starred, the node) where the starred is a boolean to indicate whether the argument is supposed to be splatted before passing to the function (This is the case where you might call a function like so
foo(1, *[2, 3])
). The boolean is made optional simply to support the legacy way of calling this function and not having to pass the tuples for every single case from node_transformerkeyword_arguments – keyword arguments. These are passed as a dictionary of keyword arguments to the function. Similar to
*positional_arguments
, the keyword arguments can also be splatted by naming the key asunpack_<index>
where <index> is the index of the argument. In this case, the dictionary will be unpacked and passed as keyword arguments to the function. The keyword arguments are processed in order of passing so any keyword conflicts will result in the last value accepted as the value for the keyword.
- Returns
a call node
Note
It’s important for the call to return the call node so that we can programmatically chain the the nodes together, e.g., for the assignment call to modify the previous call node.
The call looks up if it’s a locally defined function. We decided that this is better for program slicing.
- import_module(name: str, source_location: Optional[SourceLocation] = None) Union[ImportNode, CallNode, LiteralNode, LookupNode, MutateNode, GlobalNode] [source]
Import a module. If we have already imported it, just return its ID. Otherwise, create new module nodes for each submodule in its parents and return it.
- lookup_node(variable_name: str, source_location: Optional[SourceLocation] = None) Union[ImportNode, CallNode, LiteralNode, LookupNode, MutateNode, GlobalNode] [source]
Cases for the node that we are looking up:
user defined variable & function definitions
imported libs
unknown runtime magic functions—special case to LookupNode
builtin functions, e.g., min
custom runtime, e.g., get_ipython
- process_node(node: Union[ImportNode, CallNode, LiteralNode, LookupNode, MutateNode, GlobalNode]) None [source]
Execute a node, and adds it to the database.
- trace_import(name: str, source_location: Optional[SourceLocation] = None, alias: Optional[str] = None, attributes: Optional[Dict[str, str]] = None) None [source]
name: the name of the module
alias: the module could be aliased, e.g., import pandas as pd
- attributes: a list of functions imported from the library.
It keys the aliased name to the original name.
Note
The input args would _either_ have alias or attributes, but not both
Didn’t call the function import because I think that’s a protected name
note that version and path will be introspected at runtime
- property values: Dict[str, object]
Returns a mapping of variable names to their values, by joining the scoping information with the executor values.
- class lineapy.ValueType(value)[source]
Lower case because the API with the frontend assume the characters “chart” exactly as is.
Todo
FIXME
- rename (need coordination with linea-server):
really dataset is a table
value means its a literal (e.g., int/str)
- lineapy.catalog() LineaCatalog [source]
- Returns
An object of the class LineaCatalog that allows for printing and exporting artifacts metadata.
- Return type
- lineapy.get(artifact_name: str, version: Optional[int] = None) LineaArtifact [source]
Gets an artifact from the DB.
- Parameters
artifact_name (str) – name of the artifact. Note that if you do not remember the artifact, you can use the catalog to browse the options
version (Optional[str]) – version of the artifact. If None, the latest version will be returned.
- Returns
returned value offers methods to access information we have stored about the artifact
- Return type
- lineapy.save(reference: object, name: str) LineaArtifact [source]
Publishes the object to the Linea DB.
- Parameters
reference (Union[object, ExternalState]) – The reference could be a variable name, in which case Linea will save the value of the variable, with out default serialization mechanism. Alternatively, it could be a “side effect” reference, which currently includes either
lineapy.file_system
orlineapy.db
. Linea will save the associated process that creates the final side effects. We are in the process of adding more side effect references, including assert statements.name (str) – The name is used for later retrieving the artifact and creating new versions if an artifact of the name has been created before.
- Returns
returned value offers methods to access information we have stored about the artifact (value, version), and other automation capabilities, such as
to_pipeline()
.- Return type
- lineapy.to_pipeline(artifacts: List[str], framework: str = 'SCRIPT', pipeline_name: Optional[str] = None, dependencies: Dict[str, Set[str]] = {}, pipeline_dag_config: Optional[AirflowDagConfig] = {}, output_dir: Optional[str] = None) Path [source]
Writes the pipeline job to a path on disk.
- Parameters
artifacts – list of artifact names to be included in the DAG.
framework – ‘AIRFLOW’ or ‘SCRIPT’
pipeline_name – name of the pipeline
dependencies – tasks dependencies in graphlib format {‘B’:{‘A’,’C’}}, this means task A and C are prerequisites for task B.
output_dir_path – Directory of the DAG and the python file it is saved in; only use for PipelineType.AIRFLOW
- Returns
string containing the path of the DAG file that was exported.
- lineapy.visualize(*, live=False) None [source]
Display a visualization of the Linea graph from this session using Graphviz.
If live=True, then this visualization will live update after cell execution. Note that this comes with a substantial performance penalty, so it is False by default.
Note: If the visualization is not live, it will print out the visualization as of the previous cell execution, not the one where visualize is executed.