lineapy.api package

Subpackages

Submodules

lineapy.api.api module

User facing APIs.

lineapy.api.api.artifact_store() lineapy.api.models.linea_artifact_store.LineaArtifactStore[source]
Returns

An object of the class LineaArtifactStore that allows for printing and exporting artifacts metadata.

Return type

LineaArtifactStore

lineapy.api.api.delete(artifact_name: str, version: Union[int, str]) None[source]

Deletes an artifact from artifact store. If no other artifacts refer to the value, the value is also deleted from both the value node store and the pickle store.

Parameters
  • artifact_name – Key used to while saving the artifact

  • version – version number or ‘latest’ or ‘all’

Raises

ValueError – if arifact not found or version invalid

lineapy.api.api.get(artifact_name: str, version: Optional[int] = None) lineapy.api.models.linea_artifact.LineaArtifact[source]

Gets an artifact from the DB.

Parameters
  • artifact_name (str) – name of the artifact. Note that if you do not remember the artifact, you can use the artifact_store to browse the options

  • version (Optional[str]) – version of the artifact. If None, the latest version will be returned.

Returns

returned value offers methods to access information we have stored about the artifact

Return type

LineaArtifact

lineapy.api.api.get_function(artifacts: List[Union[str, Tuple[str, int]]], input_parameters: List[str] = [], reuse_pre_computed_artifacts: List[Union[str, Tuple[str, int]]] = []) Callable[source]

Extract the process that creates selected artifacts as a python function

Parameters
  • artifacts (List[Union[str, Tuple[str, int]]]) – List of artifact names(with optional version) to be included in the function return.

  • input_parameters (List[str]) – List of variable names to be used in the function arguments. Currently, only accept variable from literal assignment; such as a=’123’. There should be only one literal assignment for each variable within all artifact calculation code. For instance, if both a=’123’ and a=’abc’ are existing in the code, we cannot specify a as input variables since it is confusing to specify which literal assignment we want to replace.

  • reuse_pre_computed_artifacts (List[Union[str, Tuple[str, int]]]) – List of artifacts(name with optional version) for which we will use pre-computed values from the artifact store instead of recomputing from original code.

Returns

  • Callable – A python function that takes input_parameters as args and returns a dictionary with each artifact name as the dictionary key and artifact value as the value.

  • Note that,

  • 1. If an input parameter is only used to calculate artifacts in thereuse_pre_computed_artifacts list, that input parameter will be passed around as a dummy variable. LineaPy will create a warning.

  • 2. If an artifact name has been saved multiple times within a session, – multiple sessions or mutated. You might want to specify version number in artifacts or reuse_pre_computed_artifacts. The best practice to avoid searching artifact version is don’t reuse artifact name in different notebooks and don’t save same artifact multiple times within the same session.

lineapy.api.api.get_module_definition(artifacts: List[Union[str, Tuple[str, int]]], input_parameters: List[str] = [], reuse_pre_computed_artifacts: List[Union[str, Tuple[str, int]]] = []) str[source]

Create a python module that includes the definition of :func::get_function.

Parameters
  • artifacts (List[Union[str, Tuple[str, int]]]) – same as get_function()

  • input_parameters (List[str]) – same as get_function()

  • reuse_pre_computed_artifacts (List[Union[str, Tuple[str, int]]]) – same as get_function()

Returns

A python module that includes the definition of :func::get_function as run_all_sessions.

Return type

str

lineapy.api.api.reload() None[source]

Reloads lineapy context.

Note

Currently only reloads annotations but in the future can be a container for other items like configs etc.

lineapy.api.api.save(reference: object, name: str) lineapy.api.models.linea_artifact.LineaArtifact[source]

Publishes the object to the Linea DB.

Parameters
  • reference (Union[object, ExternalState]) – The reference could be a variable name, in which case Linea will save the value of the variable, with out default serialization mechanism. Alternatively, it could be a “side effect” reference, which currently includes either lineapy.file_system or lineapy.db. Linea will save the associated process that creates the final side effects. We are in the process of adding more side effect references, including assert statements.

  • name (str) – The name is used for later retrieving the artifact and creating new versions if an artifact of the name has been created before.

Returns

returned value offers methods to access information we have stored about the artifact (value, version), and other automation capabilities, such as to_pipeline().

Return type

LineaArtifact

lineapy.api.api.to_pipeline(artifacts: List[str], framework: str = 'SCRIPT', pipeline_name: Optional[str] = None, dependencies: Dict[str, Set[str]] = {}, output_dir: str = '.', input_parameters: List[str] = [], reuse_pre_computed_artifacts: List[str] = [], pipeline_dag_config: Optional[lineapy.plugins.task.AirflowDagConfig] = {}) pathlib.Path[source]

Writes the pipeline job to a path on disk.

Parameters
  • artifacts (List[str]) – Names of artifacts to be included in the pipeline.

  • framework (str) – “AIRFLOW” or “SCRIPT”. Defaults to “SCRIPT” if not specified.

  • pipeline_name (Optional[str]) – Name of the pipeline.

  • dependencies (TaskGraphEdge) – Task dependencies in graphlib format, e.g., {"B": {"A", "C"}} means task A and C are prerequisites for task B. LineaPy is smart enough to figure out dependency relations within the same session, so there is no need to specify this type of dependency information; instead, the user is expected to provide dependency information among artifacts across different sessions.

  • output_dir (str) – Directory path to save DAG and other pipeline files.

  • input_parameters (List[str]) – Names of variables to be used as parameters in the pipeline. Currently, it only accepts variables from literal assignment such as a = '123'. For each variable to be parametrized, there should be only one literal assignment across all artifact code for the pipeline. For instance, if both a = '123' and a = 'abc' exist in the pipeline’s artifact code, we cannot make a an input parameter since its reference is ambiguous, i.e., we are not sure which literal assignment a refers to.

  • reuse_pre_computed_artifacts (List[str]) – Names of artifacts in the pipeline for which pre-computed value is to be used (rather than recomputing the value).

  • pipeline_dag_config (Optional[AirflowDagConfig]) – A dictionary of parameters to configure DAG file to be generated. Not applicable for “SCRIPT” framework as it does not generate a separate DAG file. For “AIRFLOW” framework, Airflow-native config params such as “retries” and “schedule_interval” can be passed in.

Returns

Directory path where DAG and other pipeline files are saved.

Return type

Path

lineapy.api.api_utils module

lineapy.api.api_utils.de_lineate_code(code: str, db: lineapy.db.db.RelationalLineaDB) str[source]

De-linealize the code by removing any lineapy api references

Module contents