Microscopy Bulk Simple Annotation (ANN) Objects

The Microscopy Bulk Simple Annotation IOD is an IOD designed specifically to store large numbers of similar annotations and measurements from microscopy images. Annotations of microscopy images typically refer to very large numbers of cells or cellular structures. Storing these in a Structured Report Document, with its highly nested structure, would be very inefficient in storage space and unnecessarily complex and slow to parse. Microscopy Bulk Simple Annotation objects (“bulk annotations”) solve this problem by allowing you to store large number of similar annotations or measurements in efficient arrays without duplication of the descriptive metadata.

Each bulk annotation object contains one or more Annotation Groups, each of which contains a set of graphical annotations, and optionally one or more numerical measurements relating to those graphical annotations.

Constructing Annotation Groups

An Annotation Group is a set of multiple similar annotations from a microscopy image. For example, a single annotation group may contain all annotations of cell nuclei, lymphocytes, or regions of necrosis in the image. In highdicom, an annotation group is represented by a highdicom.ann.AnnotationGroup.

Each annotation group contains some required metadata that describes the contents of the group, as well as some further optional metadata that may contain further details about the group or the derivation of the annotations it contains. The required metadata elements include:

  • A number (int), an integer number for the group.

  • A label (str) giving a human-readable label for the group.

  • A uid (str or highdicom.UID) uniquely identifying the group. Usually, you will want to generate UID for this.

  • An annotated_property_category and annotated_property_type (highdicom.sr.CodedConcept) coded values (see Coding) describing the category and specific structure that has been annotated.

  • A graphic_type (highdicom.ann.GraphicTypeValues) indicating the “form” of the annotations. Permissible values are "ELLIPSE", "POINT", "POLYGON", "RECTANGLE", and "POLYLINE".

  • The algorithm_type (highdicom.ann.AnnotationGroupGenerationTypeValues), the type of the algorithm used to generate the annotations ("MANUAL", "SEMIAUTOMATIC", or "AUTOMATIC").

Further optional metadata may optionally be provided, see the API documentation for more information.

The actual annotation data is passed to the group as a list of numpy.ndarray objects, each of shape (N x D). N is the number of coordinates required for each individual annotation and is determined by the graphic type (see highdicom.ann.GraphicType). D is either 2 – meaning that the coordinates are expressed as a (Column,Row) pair in image coordinates – or 3 – meaning that the coordinates are expressed as a (X,Y,Z) triple in 3D frame of reference coordinates.

When considering which type of coordinate to use, bear in mind that the 2D image coordinates refer only to one image in a image pyramid, whereas 3D frame of reference coordinates are more easily used with any image in the pyramid. Also note that although you can include multiple annotation groups in a single bulk annotation object, they must all use the same coordinate type.

Here is a simple example of constructing an annotation group:

from pydicom.sr.codedict import codes
from pydicom.sr.coding import Code
import highdicom as hd
import numpy as np

# Graphic data containing two nuclei, each represented by a single point
# expressed in 2D image coordinates
graphic_data = [
    np.array([[34.6, 18.4]]),
    np.array([[28.7, 34.9]]),
]

# Nuclei annotations produced by a manual algorithm
nuclei_group = hd.ann.AnnotationGroup(
    number=1,
    uid=hd.UID(),
    label='nuclei',
    annotated_property_category=codes.SCT.AnatomicalStructure,
    annotated_property_type=Code('84640000', 'SCT', 'Nucleus'),
    algorithm_type=hd.ann.AnnotationGroupGenerationTypeValues.MANUAL,
    graphic_type=hd.ann.GraphicTypeValues.POINT,
    graphic_data=graphic_data,
)

Note that including two nuclei would be very unusual in practice: annotations often number in the thousands or even millions within a large whole slide image.

Including Measurements

In addition to the coordinates of the annotations themselves, it is also possible to attach one or more continuous-valued numeric measurements corresponding to those annotations. The measurements are passed as a highdicom.ann.Measurements object, which contains the name of the measurement (as a coded value), the unit of the measurement (also as a coded value) and an array of the measurements themselves (as a numpy.ndarray).

The length of the measurement array for any measurements attached to an annotation group must match exactly the number of annotations in the group. Value i in the array therefore represents the measurement of annotation i.

Here is the above example with an area measurement included:

from pydicom.sr.codedict import codes
from pydicom.sr.coding import Code
import highdicom as hd
import numpy as np

# Graphic data containing two nuclei, each represented by a single point
# expressed in 2D image coordinates
graphic_data = [
    np.array([[34.6, 18.4]]),
    np.array([[28.7, 34.9]]),
]

# Measurement object representing the areas of each of the two nuclei
area_measurement = hd.ann.Measurements(
    name=codes.SCT.Area,
    unit=codes.UCUM.SquareMicrometer,
    values=np.array([20.4, 43.8]),
)

# Nuclei annotations produced by a manual algorithm
nuclei_group = hd.ann.AnnotationGroup(
    number=1,
    uid=hd.UID(),
    label='nuclei',
    annotated_property_category=codes.SCT.AnatomicalStructure,
    annotated_property_type=Code('84640000', 'SCT', 'Nucleus'),
    algorithm_type=hd.ann.AnnotationGroupGenerationTypeValues.MANUAL,
    graphic_type=hd.ann.GraphicTypeValues.POINT,
    graphic_data=graphic_data,
    measurements=[area_measurement],
)

Constructing MicroscopyBulkSimpleAnnotation Objects

When you have constructed the annotation groups, you can include them into a bulk annotation object along with a bit more metadata using the highdicom.ann.MicroscopyBulkSimpleAnnotations constructor. You also need to pass the image from which the annotations were derived so that highdicom can copy all the patient, study and slide-level metadata:

from pydicom import dcmread
import highdicom as hd

# Load a slide microscopy image from the highdicom test data (if you have
# cloned the highdicom git repo)
sm_image = dcmread('data/test_files/sm_image.dcm')

bulk_annotations = hd.ann.MicroscopyBulkSimpleAnnotations(
    source_images=[sm_image],
    annotation_coordinate_type=hd.ann.AnnotationCoordinateTypeValues.SCOORD,
    annotation_groups=[nuclei_group],
    series_instance_uid=hd.UID(),
    series_number=10,
    sop_instance_uid=hd.UID(),
    instance_number=1,
    manufacturer='MGH Pathology',
    manufacturer_model_name='MGH Pathology Manual Annotations',
    software_versions='0.0.1',
    device_serial_number='1234',
    content_description='Nuclei Annotations',
)

bulk_annotations.save_as('nuclei_annotations.dcm')

The result is a complete DICOM object that can be written out as a DICOM file, transmitted over network, etc.

Reading Existing Bulk Annotation Objects

You can read an existing bulk annotation object from file using the highdicom.ann.annread() function:

from pydicom import dcmread
import highdicom as hd

ann = hd.ann.annread('data/test_files/sm_annotations.dcm')

assert isinstance(ann, hd.ann.MicroscopyBulkSimpleAnnotations)

Alternatively you can converting an existing pydicom.Dataset representing a bulk annotation object to the highdicom object like this:

from pydicom import dcmread
import highdicom as hd

ann_dcm = dcmread('data/test_files/sm_annotations.dcm')

ann = hd.ann.MicroscopyBulkSimpleAnnotations.from_dataset(ann_dcm)

assert isinstance(ann, hd.ann.MicroscopyBulkSimpleAnnotations)

Note that these examples (and the following examples) uses an example file that you can access from the test data in the highdicom repository. It was created using exactly the code in the construction example above.

Accessing Annotation Groups

Usually the next step when working with bulk annotation objects is to find the relevant annotation groups. You have two ways to do this.

If you know either the number or the UID of the group, you can access the group directly (since either of these should uniquely identify a group). The highdicom.ann.MicroscopyBulkSimpleAnnotations.get_annotation_group() method is used for this purpose:

# Access a group by its number
group = ann.get_annotation_group(number=1)
assert isinstance(group, hd.ann.AnnotationGroup)

# Access a group by its UID
group = ann.get_annotation_group(
    uid='1.2.826.0.1.3680043.10.511.3.40670836327971302375623613533993686'
)
assert isinstance(group, hd.ann.AnnotationGroup)

Alternatively, you can search for groups that match certain filters such as the annotation property type or category, label, or graphic type. The highdicom.ann.MicroscopyBulkSimpleAnnotations.get_annotation_groups() method (note groups instead of group) is used for this. It returns a list of matching groups, since the filters may match multiple groups.

from pydicom.sr.coding import Code

# Search for groups by annotated property type
groups = ann.get_annotation_groups(
    annotated_property_type=Code('84640000', 'SCT', 'Nucleus'),
)
assert len(groups) == 1 and isinstance(groups[0], hd.ann.AnnotationGroup)

# If there are no matches, an empty list is returned
groups = ann.get_annotation_groups(
    annotated_property_type=Code('53982002', "SCT", "Cell membrane"),
)
assert len(groups) == 0

# Search for groups by label
groups = ann.get_annotation_groups(label='nuclei')
assert len(groups) == 1 and isinstance(groups[0], hd.ann.AnnotationGroup)

# Search for groups by label and graphic type together (results must match
# *all* provided filters)
groups = ann.get_annotation_groups(
    label='nuclei',
    graphic_type=hd.ann.GraphicTypeValues.POINT,
)
assert len(groups) == 1 and isinstance(groups[0], hd.ann.AnnotationGroup)

Extracting Information From Annotation Groups

When you have found a relevant group, you can use the Python properties on the object to conveniently access metadata and the graphic data of the annotations. For example (see highdicom.ann.AnnotationGroup for a full list):

# Access the label
assert group.label == 'nuclei'

# Access the number
assert group.number == 1

# Access the UID
assert group.uid == '1.2.826.0.1.3680043.10.511.3.40670836327971302375623613533993686'

# Access the annotated property type (returns a CodedConcept)
assert group.annotated_property_type == Code('84640000', 'SCT', 'Nucleus')

# Access the graphic type, describing the "form" of each annotation
assert group.graphic_type == hd.ann.GraphicTypeValues.POINT

You can access the entire array of annotations at once using highdicom.ann.AnnotationGroup.get_graphic_data(). You need to pass the annotation coordinate type from the parent bulk annotation object to the group so that it knows how to interpret the coordinate data. This method returns a list of 2D numpy arrays of shape (N x D), mirroring how you would have passed the data in to create the annotation with highdicom.

import numpy as np

graphic_data = group.get_graphic_data(
    coordinate_type=ann.annotation_coordinate_type,
)
assert len(graphic_data) == 2 and isinstance(graphic_data[0], np.ndarray)

Alternatively, you can access the coordinate array for a specific annotation using its (one-based) index in the annotation list:

# Get the number of annotations
assert group.number_of_annotations == 2

# Access an annotation using 1-based index
first_annotation = group.get_coordinates(
    annotation_number=1,
    coordinate_type=ann.AnnotationCoordinateType,
)
assert np.array_equal(first_annotation, np.array([[34.6, 18.4]]))

Extracting Measurements From Annotation Groups

You can use the highdicom.ann.AnnotationGroup.get_measurements() method to access any measurements included in the group. By default, this will return all measurements in the group, but you can also filter for measurements matching a certain name.

Measurements are returned as a tuple of (names, values, units), where names is a list of nnames as highdicom.sr.CondedConcept objects, units is a list of units also as highdicom.sr.CondedConcept objects, and the values is a numpy.ndarray of values of shape (N by M) where N is the number of annotations and M is the number of measurements. This return format is intended to facilitate the loading of measurements into tables or dataframes for further analysis.

from pydicom.sr.codedict import codes

names, values, units = group.get_measurements()
assert names[0] == codes.SCT.Area
assert units[0] == codes.UCUM.SquareMicrometer
assert values.shape == (2, 1)