This document describes how to add new library annotations.

What library annotations are

Linea requires library functions to be annotated of their “side effects”, such as whether a class method mutates the original object.

Linea uses these annotations for both internal libraries, such as os and _operator, and external libraries, such as pandas and boto3.

What we currently support

We are in the process of augmenting external libraries. We plan to add support for widely used languages in the following libraries:

NumPy
SciPy
Pandas
SciKit-Learn
PyTorch
TensorFlow
XGboost
Keras
Matplotlib
Seaborn
Bokeh
Plotly
pydot
Presto
Scrapy
BeautifulSoup

So the rest of the libraries not mentioned are not yet supported or on our roadmap.

How to contribute to library specs

There are a long tail of libraries that data scientists could be using, and we would love to get community support for adding new annotations.

The module lineapy.instrumentation contains all the annotation types we support, along with examples library calls and yaml annotations. We recommend that you start viewing from the top-level entry point is ModuleAnnotation, to see how the different parts work together.

With yaml, you need to be careful with the indentations levels which could get confusing. You can run the following bash script (from root): `` ./tests/tools/test_validate_annotation_spec.py`` to test the annotations you added. It would also be easier to copy off from existing instrumentations.

If there are any syntax that the existing instrumentation doesn’t support, please file an issue on GitHub, with the library you are using and the mutation/side effects documented.