Quick Start

Note

Make sure that you have LineaPy installed. You can find relevant instructions in the Installation section.

Once LineaPy is installed, we are ready to start using the package. We start with a simple example using the Iris dataset to demonstrate how to use LineaPy to 1) store a variable’s history, 2) get its cleaned-up code, and 3) build an executable pipeline for the variable.

import lineapy
import pandas as pd
from sklearn.linear_model import LinearRegression, ElasticNet

# Load data
df = pd.read_csv("https://raw.githubusercontent.com/LineaLabs/lineapy/main/examples/tutorials/data/iris.csv")

# Some very basic feature engineering
color_map = {"Setosa": 0, "Versicolor": 1, "Virginica": 2}
df["variety_color"] = df["variety"].map(color_map)
df2 = df.copy()
df2["d_versicolor"] = df["variety"].apply(lambda x: 1 if x == "Versicolor" else 0)
df2["d_virginica"] = df["variety"].apply(lambda x: 1 if x == "Virginica" else 0)

# Initialize two models
model1 = LinearRegression()
model2 = ElasticNet()

# Fit both models
model1.fit(
    X=df2[["petal.width", "d_versicolor", "d_virginica"]],
    y=df2["sepal.width"],
)
model2.fit(
    X = df[["petal.width", "variety_color"]],
    y = df["sepal.width"],
)

Now, we reach the end of our development session and decide to save the ElasticNet model. We can store the model as a LineaPy artifact as follows:

# Store the model as an artifact
lineapy.save(model2, "iris_elasticnet_model")

A LineaPy artifact encapsulates both the value and code, so we can easily retrieve the model’s code, like so:

# Retrieve the model artifact
artifact = lineapy.get("iris_elasticnet_model")

# Check code for the model artifact
print(artifact.get_code())

which will print:

import pandas as pd
from sklearn.linear_model import ElasticNet

df = pd.read_csv(
    "https://raw.githubusercontent.com/LineaLabs/lineapy/main/examples/tutorials/data/iris.csv"
)
color_map = {"Setosa": 0, "Versicolor": 1, "Virginica": 2}
df["variety_color"] = df["variety"].map(color_map)
model2 = ElasticNet()
model2.fit(
    X=df[["petal.width", "variety_color"]],
    y=df["sepal.width"],
)

Note that these are the minimal essential steps to produce the model. That is, LineaPy has automatically cleaned up the original code by removing extraneous operations that do not affect the model.

Say we are now asked to retrain the model on a regular basis to account for any updates in the source data. We need to set up a pipeline to train the model, and LineaPy make it as simple as a single line of code:

lineapy.to_pipeline(
    artifacts=[artifact.name],
    pipeline_name="iris_model_pipeline",
    output_dir="output/",
    framework="AIRFLOW",
)

which generates several files that can be used to execute the pipeline from the UI or CLI.

In sum, LineaPy automates time-consuming, manual steps in a data science workflow, helping us move our work into production more quickly.

Note

To learn more about LineaPy’s API, check out this tutorial.