This document describes how to add new library annotations.

What library annotations are

Linea requires library functions to be annotated of their “side effects”, such as whether a class method mutates the original object.

Linea uses these annotations for both internal libraries, such as os and _operator, and external libraries, such as pandas and boto3.

What we currently support

We are in the process of augmenting external libraries. We plan to add support for widely used languages in the following libraries:

  • NumPy

  • SciPy

  • Pandas

  • SciKit-Learn

  • PyTorch

  • TensorFlow

  • XGboost

  • Keras

  • Matplotlib

  • Seaborn

  • Bokeh

  • Plotly

  • pydot

  • Presto

  • Scrapy

  • BeautifulSoup

So the rest of the libraries not mentioned are not yet supported or on our roadmap.

How to contribute to library specs

There are a long tail of libraries that data scientists could be using, and we would love to get community support for adding new annotations.

The module lineapy.instrumentation contains all the annotation types we support, along with examples library calls and yaml annotations. We recommend that you start viewing from the top-level entry point is ModuleAnnotation, to see how the different parts work together.

With yaml, you need to be careful with the indentations levels which could get confusing. You can run the following bash script (from root): `` ./tests/tools/test_validate_annotation_spec.py`` to test the annotations you added. It would also be easier to copy off from existing instrumentations.

If there are any syntax that the existing instrumentation doesn’t support, please file an issue on GitHub, with the library you are using and the mutation/side effects documented.