Welcome to PathFlowAI’s documentation!¶

pathflowai-preprocess¶

pathflowai-preprocess [OPTIONS] COMMAND [ARGS]...

Options

--version¶: Show the version and exit.

alter_masks¶

Map list of values to other values in mask.

pathflowai-preprocess alter_masks [OPTIONS]

Options

-i, --mask_dir <mask_dir>¶: Input directory for masks. [default: ./inputs/]

-o, --output_dir <output_dir>¶: Output directory for new masks. [default: ./outputs/]

-fr, --from_annotations <from_annotations>¶: Annotations to switch from. [default: ]

-to, --to_annotations <to_annotations>¶: Annotations to switch to. [default: ]

collapse_annotations¶

Adds annotation classes areas to other annotation classes in SQL DB when getting rid of some annotation classes.

pathflowai-preprocess collapse_annotations [OPTIONS]

Options

-i, --input_patch_db <input_patch_db>¶: Input db. [default: patch_info_input.db]

-o, --output_patch_db <output_patch_db>¶: Output db. [default: patch_info_output.db]

-fr, --from_annotations <from_annotations>¶: Annotations to switch from. [default: ]

-to, --to_annotations <to_annotations>¶: Annotations to switch to. [default: ]

-ps, --patch_size <patch_size>¶: Patch size. [default: 224]

-rb, --remove_background_annotation <remove_background_annotation>¶: If selected, removes 100% background patches based on this annotation. [default: ]

-ma, --max_background_area <max_background_area>¶: Max background area before exclusion. [default: 0.05]

preprocess_pipeline¶

Preprocessing pipeline that accomplishes 3 things. 1: storage into ZARR format, 2: optional mask adjustment, 3: storage of patch-level information into SQL DB

pathflowai-preprocess preprocess_pipeline [OPTIONS]

Options

-npy, --img2npy¶: Image to numpy for faster read. [default: False]

-b, --basename <basename>¶: Basename of patches. [default: A01]

-i, --input_dir <input_dir>¶: Input directory for patches. [default: ./inputs/]

-a, --annotations <annotations>¶: Annotations in image in order. [default: ]

-pr, --preprocess¶: Run preprocessing pipeline. [default: False]

-pa, --patches¶: Add patches to SQL. [default: False]

-t, --threshold <threshold>¶: Threshold to remove non-purple slides. [default: 0.05]

-ps, --patch_size <patch_size>¶: Patch size. [default: 224]

-it, --intensity_threshold <intensity_threshold>¶: Intensity threshold to rate a pixel as non-white. [default: 100.0]

-g, --generate_finetune_segmentation¶: Generate patches for one segmentation mask class for targeted finetuning. [default: False]

-tc, --target_segmentation_class <target_segmentation_class>¶: Segmentation Class to finetune on, output patches to another db. [default: 0]

-tt, --target_threshold <target_threshold>¶: Threshold to include target for segmentation if saving one class. [default: 0.0]

-odb, --out_db <out_db>¶: Output patch database. [default: ./patch_info.db]

-am, --adjust_mask¶: Remove additional background regions from annotation mask. [default: False]

-nn, --n_neighbors <n_neighbors>¶: If adjusting mask, number of neighbors connectivity to remove. [default: 5]

-bp, --basic_preprocess¶: Basic preprocessing pipeline, annotation areas are not saved. Used for benchmarking tool against comparable pipelines [default: False]

remove_basename_from_db¶

Removes basename/ID from SQL DB.

pathflowai-preprocess remove_basename_from_db [OPTIONS]

Options

-i, --input_patch_db <input_patch_db>¶: Input db. [default: patch_info_input.db]

-o, --output_patch_db <output_patch_db>¶: Output db. [default: patch_info_output.db]

-b, --basename <basename>¶: Basename. [default: A01]

-ps, --patch_size <patch_size>¶: Patch size. [default: 224]

pathflowai-visualize¶

pathflowai-visualize [OPTIONS] COMMAND [ARGS]...

Options

--version¶: Show the version and exit.

extract_patch¶

Extract image of patch of any size/location and output to image file

pathflowai-visualize extract_patch [OPTIONS]

Options

-i, --input_dir <input_dir>¶: Input directory for patches. [default: ./inputs/]

-b, --basename <basename>¶: Basename of patches. [default: A01]

-p, --patch_info_file <patch_info_file>¶: Datbase containing all patches [default: patch_info.db]

-ps, --patch_size <patch_size>¶: Patch size. [default: 224]

-x, --x <x>¶: X Coordinate of patch. [default: 0]

-y, --y <y>¶: Y coordinate of patch. [default: 0]

-o, --outputfname <outputfname>¶: Output extracted image. [default: ./output_image.png]

-s, --segmentation¶: Plot segmentations. [default: False]

-sc, --n_segmentation_classes <n_segmentation_classes>¶: Number segmentation classes [default: 4]

-c, --custom_segmentation <custom_segmentation>¶: Add custom segmentation map from prediction, in npy [default: ]

overlay_new_annotations¶

Custom annotations, in format [Point: x, y, Point: x, y … ] one line like this per polygon, overlap these polygons on top of WSI.

pathflowai-visualize overlay_new_annotations [OPTIONS]

Options

-i, --img_file <img_file>¶: Input image. [default: image.txt]

-a, --annotation_txt <annotation_txt>¶: Column of annotations [default: annotation.txt]

-ocf, --original_compression_factor <original_compression_factor>¶: How much compress image. [default: 1.0]

-cf, --compression_factor <compression_factor>¶: How much compress image. [default: 3.0]

-o, --outputfilename <outputfilename>¶: Output extracted image. [default: ./output_image.png]

plot_embeddings¶

Perform UMAP embeddings of patches and plot using plotly.

pathflowai-visualize plot_embeddings [OPTIONS]

Options

-i, --embeddings_file <embeddings_file>¶: Embeddings. [default: predictions/embeddings.pkl]

-o, --plotly_output_file <plotly_output_file>¶: Plotly output file. [default: predictions/embeddings.html]

-a, --annotations <annotations>¶: Multiple annotations to color image. [default: ]

-rb, --remove_background_annotation <remove_background_annotation>¶: If selected, removes 100% background patches based on this annotation. [default: ]

-ma, --max_background_area <max_background_area>¶: Max background area before exclusion. [default: 0.05]

-b, --basename <basename>¶: Basename of patches. [default: ]

-nn, --n_neighbors <n_neighbors>¶: Number nearest neighbors. [default: 8]

plot_image¶

Plots the whole slide image supplied.

pathflowai-visualize plot_image [OPTIONS]

Options

-i, --image_file <image_file>¶: Input image file. [default: ./inputs/a.svs]

-cf, --compression_factor <compression_factor>¶: How much compress image. [default: 3.0]

-o, --outputfname <outputfname>¶: Output extracted image. [default: ./output_image.png]

plot_image_umap_embeddings¶

Plots a UMAP embedding with each point as its corresponding patch image.

pathflowai-visualize plot_image_umap_embeddings [OPTIONS]

Options

-i, --input_dir <input_dir>¶: Input directory for patches. [default: ./inputs/]

-e, --embeddings_file <embeddings_file>¶: Embeddings. [default: predictions/embeddings.pkl]

-b, --basename <basename>¶: Basename of patches. [default: ]

-o, --outputfilename <outputfilename>¶: Embedding visualization. [default: predictions/shap_plots.png]

-mpl, --mpl_scatter¶: Plot segmentations. [default: False]

-rb, --remove_background_annotation <remove_background_annotation>¶: If selected, removes 100% background patches based on this annotation. [default: ]

-ma, --max_background_area <max_background_area>¶: Max background area before exclusion. [default: 0.05]

-z, --zoom <zoom>¶: Size of images. [default: 0.05]

-nn, --n_neighbors <n_neighbors>¶: Number nearest neighbors. [default: 8]

-sc, --sort_col <sort_col>¶: Sort samples on this column. [default: ]

-sm, --sort_mode <sort_mode>¶

Sort ascending or descending. [default: asc]

Options: asc|desc

plot_predictions¶

Overlays classification, regression and segmentation patch level predictions on top of whole slide image.

pathflowai-visualize plot_predictions [OPTIONS]

Options

-i, --input_dir <input_dir>¶: Input directory for patches. [default: ./inputs/]

-b, --basename <basename>¶: Basename of patches. [default: A01]

-p, --patch_info_file <patch_info_file>¶: Datbase containing all patches [default: patch_info.db]

-ps, --patch_size <patch_size>¶: Patch size. [default: 224]

-o, --outputfname <outputfname>¶: Output extracted image. [default: ./output_image.png]

-an, --annotations¶: Plot annotations instead of predictions. [default: False]

-cf, --compression_factor <compression_factor>¶: How much compress image. [default: 3.0]

-al, --alpha <alpha>¶: How much to give annotations/predictions versus original image. [default: 0.8]

-s, --segmentation¶: Plot segmentations. [default: False]

-sc, --n_segmentation_classes <n_segmentation_classes>¶: Number segmentation classes [default: 4]

-c, --custom_segmentation <custom_segmentation>¶: Add custom segmentation map from prediction, npy format. [default: ]

-ac, --annotation_col <annotation_col>¶: Column of annotations [default: annotation]

-sf, --scaling_factor <scaling_factor>¶: Multiply all prediction scores by this amount. [default: 1.0]

-tif, --tif_file¶: Write to tiff file. [default: False]

shapley_plot¶

Run SHAPley attribution method on patches after classification task to see where model made prediction based on.

pathflowai-visualize shapley_plot [OPTIONS]

Options

-m, --model_pkl <model_pkl>¶: Plotly output file. [default: ]

-bs, --batch_size <batch_size>¶: Batch size. [default: 32]

-o, --outputfilename <outputfilename>¶: SHAPley visualization. [default: predictions/shap_plots.png]

-mth, --method <method>¶

Method of explaining. [default: deep]

Options: deep|gradient

-l, --local_smoothing <local_smoothing>¶: Local smoothing of SHAP scores. [default: 0.0]

-ns, --n_samples <n_samples>¶: Number shapley samples for shapley regression (gradient explainer). [default: 32]

-p, --pred_out <pred_out>¶

If not none, output prediction as shap label. [default: none]

Options: none|sigmoid|softmax

pathflowai-monitor¶

pathflowai-monitor [OPTIONS] COMMAND [ARGS]...

Options

--version¶: Show the version and exit.

monitor_usage¶

Monitor Usage over Time Interval.

pathflowai-monitor monitor_usage [OPTIONS]

Options

-csv, --records_output_csv <records_output_csv>¶: Where to store records. [default: records.csv]

-tt, --total_time <total_time>¶: Total time to monitor for in minutes. [default: 1.0]

-dt, --delay_time <delay_time>¶: Time between samples, in seconds. [default: 1.0]

datasets.py¶

Houses the DynamicImageDataset class, also functions to help with image color channel normalization, transformers, etc..

class pathflowai.datasets.DynamicImageDataset(dataset_df, set, patch_info_file, transformers, input_dir, target_names, pos_annotation_class, other_annotations=[], segmentation=False, patch_size=224, fix_names=True, target_segmentation_class=-1, target_threshold=0.0, oversampling_factor=1.0, n_segmentation_classes=4, gdl=False, mt_bce=False, classify_annotations=False)[source]¶

Generate image dataset that accesses images and annotations via dask.

Parameters

dataset_df:dataframe: Dataframe with WSI, which set it is in (train/test/val) and corresponding WSI labels if applicable.
set:str: Whether train, test, val or pass (normalization) set.
patch_info_file:str: SQL db with positional and annotation information on each slide.
transformers:dict: Contains transformers to apply on images.
input_dir:str: Directory where images comes from.
target_names:list/str: Names of initial targets, which may be modified.
pos_annotation_class:str: If selected and predicting on WSI, this class is labeled as a positive from the WSI, while the other classes are not.
other_annotations:list: Other annotations to consider from patch info db.
segmentation:bool: Conducting segmentation task?
patch_size:int: Patch size.
fix_names:bool: Whether to change the names of dataset_df.
target_segmentation_class:list: Now can be used for classification as well, matched with two below options, samples images only from this class. Can specify this and below two options multiple times.
target_threshold:list: Sampled only if above this threshold of occurence in the patches.
oversampling_factor:list: Over sample them at this amount.
n_segmentation_classes:int: Number classes to segment.
gdl:bool: Using generalized dice loss?
mt_bce:bool: For multi-target prediction tasks.
classify_annotations:bool: For classifying annotations.

Methods

`binarize_annotations`(self[, binarizer, …])	Label binarize some annotations or threshold them if classifying slide annotations.
`concat`(self, other_dataset)	Concatenate this dataset with others.
`get_class_weights`(self[, i])	Weight loss function with weights inversely proportional to the class appearence.
`retain_ID`(self, ID)	Reduce the sample set to just images from one ID.
`split_by_ID`(self)	Generator similar to groupby, but splits up by ID, generates (ID,data) using retain_ID.
`subsample`(self, p)	Sample subset of dataset.

binarize_annotations(self, binarizer=None, num_targets=1, binary_threshold=0.0)[source]¶

Label binarize some annotations or threshold them if classifying slide annotations.

Parameters

binarizer:LabelBinarizer: Binarizes the labels of a column(s)
num_targets:int: Number of desired targets to preidict on.
binary_threshold:float: Amount of annotation in patch before positive annotation.

Returns

binarizer

concat(self, other_dataset)[source]¶

Concatenate this dataset with others. Updates its own internal attributes.

Parameters

other_dataset:DynamicImageDataset: Other image dataset.

get_class_weights(self, i=0)[source]¶

Weight loss function with weights inversely proportional to the class appearence.

Parameters

i:int: If multi-target, class used for weighting.

Returns

self: Dataset.

retain_ID(self, ID)[source]¶

Reduce the sample set to just images from one ID.

Parameters

ID:str: Basename/ID to predict on.

Returns

self

split_by_ID(self)[source]¶

Generator similar to groupby, but splits up by ID, generates (ID,data) using retain_ID.

Returns

generator: ID, DynamicDataset

subsample(self, p)[source]¶

Sample subset of dataset.

Parameters

p:float: Fraction to subsample.

pathflowai.datasets.RandomRotate90()[source]¶

Transformer for random 90 degree rotation image.

Returns

function: Transformer function for operation.

pathflowai.datasets.create_transforms(mean, std)[source]¶

Create transformers.

Parameters

mean:list: See get_data_transforms.
std:list: See get_data_transforms.

Returns

dict: Transformers.

pathflowai.datasets.get_data_transforms(patch_size=None, mean=[], std=[], resize=False, transform_platform='torch', elastic=True)[source]¶

Get data transformers for training test and validation sets.

Parameters

patch_size:int: Original patch size being transformed.
mean:list of float: Mean RGB
std:list of float: Std RGB
resize:int: Which patch size to resize to.
transform_platform:str: Use pytorch or albumentation transforms.
elastic:bool: Whether to add elastic deformations from albumentations.

Returns

dict: Transformers.

pathflowai.datasets.get_normalizer(normalization_file, dataset_opts)[source]¶

Find mean and standard deviation of images in batches.

Parameters

normalization_file:str: File to store normalization information.
dataset_opts:type: Dictionary storing information to create DynamicDataset class.

Returns

dict: Stores RGB mean, stdev.

pathflowai.datasets.segmentation_transform(img, mask, transformer)[source]¶

Run albumentations and return an image and its segmentation mask.

Parameters

img:array: Image as array
mask:array: Categorical pixel by pixel.
transformer :: Transformation object.

Returns

tuple arrays: Image and mask array.

losses.py¶

Some additional loss functions that can be called using the pipeline, some of which still to be implemented.

class pathflowai.losses.FocalLoss(num_class, alpha=None, gamma=2, balance_index=-1, smooth=None, size_average=True)[source]¶

# https://raw.githubusercontent.com/Hsuxu/Loss_ToolBox-PyTorch/master/FocalLoss/FocalLoss.py This is a implementation of Focal Loss with smooth label cross entropy supported which is proposed in ‘Focal Loss for Dense Object Detection. (https://arxiv.org/abs/1708.02002)’

Focal_Loss= -1*alpha*(1-pt)*log(pt)

Parameters

num_class –
alpha – (tensor) 3D or 4D the scalar factor for this criterion
gamma – (float,double) gamma > 0 reduces the relative loss for well-classified examples (p>0.5) putting more focus on hard misclassified example
smooth – (float,double) smooth value when cross entropy
balance_index – (int) balance class index, should be specific when alpha is float
size_average – (bool, optional) By default, the losses are averaged over each loss element in the batch.

Methods

`__call__`(self, \input, \\*kwargs)	Call self as a function.
`add_module`(self, name, module)	Adds a child module to the current module.
`apply`(self, fn)	Applies `fn` recursively to every submodule (as returned by `.children()`) as well as self.
`buffers`(self[, recurse])	Returns an iterator over module buffers.
`children`(self)	Returns an iterator over immediate children modules.
`cpu`(self)	Moves all model parameters and buffers to the CPU.
`cuda`(self[, device])	Moves all model parameters and buffers to the GPU.
`double`(self)	Casts all floating point parameters and buffers to `double` datatype.
`eval`(self)	Sets the module in evaluation mode.
`extra_repr`(self)	Set the extra representation of the module
`float`(self)	Casts all floating point parameters and buffers to float datatype.
`forward`(self, logit, target)	Defines the computation performed at every call.
`half`(self)	Casts all floating point parameters and buffers to `half` datatype.
`load_state_dict`(self, state_dict[, strict])	Copies parameters and buffers from `state_dict` into this module and its descendants.
`modules`(self)	Returns an iterator over all modules in the network.
`named_buffers`(self[, prefix, recurse])	Returns an iterator over module buffers, yielding both the name of the buffer as well as the buffer itself.
`named_children`(self)	Returns an iterator over immediate children modules, yielding both the name of the module as well as the module itself.
`named_modules`(self[, memo, prefix])	Returns an iterator over all modules in the network, yielding both the name of the module as well as the module itself.
`named_parameters`(self[, prefix, recurse])	Returns an iterator over module parameters, yielding both the name of the parameter as well as the parameter itself.
`parameters`(self[, recurse])	Returns an iterator over module parameters.
`register_backward_hook`(self, hook)	Registers a backward hook on the module.
`register_buffer`(self, name, tensor)	Adds a persistent buffer to the module.
`register_forward_hook`(self, hook)	Registers a forward hook on the module.
`register_forward_pre_hook`(self, hook)	Registers a forward pre-hook on the module.
`register_parameter`(self, name, param)	Adds a parameter to the module.
`state_dict`(self[, destination, prefix, …])	Returns a dictionary containing a whole state of the module.
`to`(self, \args, \\*kwargs)	Moves and/or casts the parameters and buffers.
`train`(self[, mode])	Sets the module in training mode.
`type`(self, dst_type)	Casts all parameters and buffers to `dst_type`.
`zero_grad`(self)	Sets gradients of all model parameters to zero.

share_memory

forward(self, logit, target)[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class pathflowai.losses.GeneralizedDice(**kwargs)[source]¶

https://raw.githubusercontent.com/LIVIAETS/surface-loss/master/losses.py

Methods

__call__(self, probs, target, _)

Call self as a function.

class pathflowai.losses.GeneralizedDiceLoss(weight=None, channelwise=False, eps=1e-06, add_softmax=False)[source]¶

https://raw.githubusercontent.com/inferno-pytorch/inferno/0561e8a95cde6bfc5e10a3609841b7b0ca5b03ca/inferno/extensions/criteria/set_similarity_measures.py Computes the scalar Generalized Dice Loss defined in https://arxiv.org/abs/1707.03237

This version works for multiple classes and expects predictions for every class (e.g. softmax output) and one-hot targets for every class.

Methods

`__call__`(self, \input, \\*kwargs)	Call self as a function.
`add_module`(self, name, module)	Adds a child module to the current module.
`apply`(self, fn)	Applies `fn` recursively to every submodule (as returned by `.children()`) as well as self.
`buffers`(self[, recurse])	Returns an iterator over module buffers.
`children`(self)	Returns an iterator over immediate children modules.
`cpu`(self)	Moves all model parameters and buffers to the CPU.
`cuda`(self[, device])	Moves all model parameters and buffers to the GPU.
`double`(self)	Casts all floating point parameters and buffers to `double` datatype.
`eval`(self)	Sets the module in evaluation mode.
`extra_repr`(self)	Set the extra representation of the module
`float`(self)	Casts all floating point parameters and buffers to float datatype.
`forward`(self, input, target)	input: torch.FloatTensor or torch.cuda.FloatTensor target: torch.FloatTensor or torch.cuda.FloatTensor
`half`(self)	Casts all floating point parameters and buffers to `half` datatype.
`load_state_dict`(self, state_dict[, strict])	Copies parameters and buffers from `state_dict` into this module and its descendants.
`modules`(self)	Returns an iterator over all modules in the network.
`named_buffers`(self[, prefix, recurse])	Returns an iterator over module buffers, yielding both the name of the buffer as well as the buffer itself.
`named_children`(self)	Returns an iterator over immediate children modules, yielding both the name of the module as well as the module itself.
`named_modules`(self[, memo, prefix])	Returns an iterator over all modules in the network, yielding both the name of the module as well as the module itself.
`named_parameters`(self[, prefix, recurse])	Returns an iterator over module parameters, yielding both the name of the parameter as well as the parameter itself.
`parameters`(self[, recurse])	Returns an iterator over module parameters.
`register_backward_hook`(self, hook)	Registers a backward hook on the module.
`register_buffer`(self, name, tensor)	Adds a persistent buffer to the module.
`register_forward_hook`(self, hook)	Registers a forward hook on the module.
`register_forward_pre_hook`(self, hook)	Registers a forward pre-hook on the module.
`register_parameter`(self, name, param)	Adds a parameter to the module.
`state_dict`(self[, destination, prefix, …])	Returns a dictionary containing a whole state of the module.
`to`(self, \args, \\*kwargs)	Moves and/or casts the parameters and buffers.
`train`(self[, mode])	Sets the module in training mode.
`type`(self, dst_type)	Casts all parameters and buffers to `dst_type`.
`zero_grad`(self)	Sets gradients of all model parameters to zero.

share_memory

forward(self, input, target)[source]¶

input: torch.FloatTensor or torch.cuda.FloatTensor target: torch.FloatTensor or torch.cuda.FloatTensor

Expected shape of the inputs:

if not channelwise: (batch_size, nb_classes, …)
if channelwise: (batch_size, nb_channels, nb_classes, …)

exception pathflowai.losses.ShapeError[source]¶

class pathflowai.losses.SurfaceLoss(**kwargs)[source]¶

https://raw.githubusercontent.com/LIVIAETS/surface-loss/master/losses.py

Methods

__call__(self, probs, dist_maps, _)

Call self as a function.

pathflowai.losses.assert_(condition, message='', exception_type=<class 'AssertionError'>)[source]¶: https://raw.githubusercontent.com/inferno-pytorch/inferno/0561e8a95cde6bfc5e10a3609841b7b0ca5b03ca/inferno/utils/exceptions.py Like assert, but with arbitrary exception types.

pathflowai.losses.class2one_hot(seg:torch.Tensor, C:int) → torch.Tensor[source]¶: https://raw.githubusercontent.com/LIVIAETS/surface-loss/master/utils.py

pathflowai.losses.eq(a:torch.Tensor, b) → bool[source]¶: https://raw.githubusercontent.com/LIVIAETS/surface-loss/master/utils.py

pathflowai.losses.flatten_samples(input_)[source]¶

https://raw.githubusercontent.com/inferno-pytorch/inferno/0561e8a95cde6bfc5e10a3609841b7b0ca5b03ca/inferno/utils/torch_utils.py Flattens a tensor or a variable such that the channel axis is first and the sample axis is second. The shapes are transformed as follows:

(N, C, H, W) –> (C, N * H * W) (N, C, D, H, W) –> (C, N * D * H * W) (N, C) –> (C, N)

The input must be atleast 2d.

pathflowai.losses.one_hot(t:torch.Tensor, axis=1) → bool[source]¶: https://raw.githubusercontent.com/LIVIAETS/surface-loss/master/utils.py

pathflowai.losses.one_hot2dist(seg:numpy.ndarray) → numpy.ndarray[source]¶: https://raw.githubusercontent.com/LIVIAETS/surface-loss/master/utils.py

pathflowai.losses.simplex(t:torch.Tensor, axis=1) → bool[source]¶: https://raw.githubusercontent.com/LIVIAETS/surface-loss/master/utils.py

pathflowai.losses.sset(a:torch.Tensor, sub:Iterable) → bool[source]¶: https://raw.githubusercontent.com/LIVIAETS/surface-loss/master/utils.py

pathflowai.losses.uniq(a:torch.Tensor) → Set[source]¶: https://raw.githubusercontent.com/LIVIAETS/surface-loss/master/utils.py

sampler.py¶

Balanced sampling based on one of the columns of the patch information.

class pathflowai.sampler.ImbalancedDatasetSampler(dataset, indices=None, num_samples=None)[source]¶: Samples elements randomly from a given list of indices for imbalanced dataset https://raw.githubusercontent.com/ufoym/imbalanced-dataset-sampler/master/sampler.py Arguments:

indices (list, optional): a list of indices num_samples (int, optional): number of samples to draw

schedulers.py¶

Modulates the learning rate during the training process.

class pathflowai.schedulers.CosineAnnealingWithRestartsLR(optimizer, T_max, eta_min=0, last_epoch=-1, T_mult=1.0, alpha_decay=1.0)[source]¶

Set the learning rate of each parameter group using a cosine annealing schedule, where \(\eta_{max}\) is set to the initial lr and \(T_{cur}\) is the number of epochs since the last restart in SGDR:

\[\eta_t = \eta_{min} + \frac{1}{2}(\eta_{max} - \eta_{min})(1 + \cos(\frac{T_{cur}}{T_{max}}\pi))\]

When last_epoch=-1, sets initial lr as lr. It has been proposed in

SGDR: Stochastic Gradient Descent with Warm Restarts. This implements the cosine annealing part of SGDR, the restarts and number of iterations multiplier.

Args:
optimizer (Optimizer): Wrapped optimizer. T_max (int): Maximum number of iterations. T_mult (float): Multiply T_max by this number after each restart. Default: 1. eta_min (float): Minimum learning rate. Default: 0. last_epoch (int): The index of last epoch. Default: -1.

Attributes

step_n

Methods

`load_state_dict`(self, state_dict)	Loads the schedulers state.
`state_dict`(self)	Returns the state of the scheduler as a `dict`.

cosine
get_lr
restart
step

class pathflowai.schedulers.Scheduler(optimizer=None, opts={'T_max': 10, 'T_mult': 2, 'eta_min': 5e-08, 'lr_scheduler_decay': 0.5, 'scheduler': 'null'})[source]¶

Scheduler class that modulates learning rate of torch optimizers over epochs.

Parameters

optimizertype: torch.Optimizer object
optstype: Options of setting the learning rate scheduler, see default.

Attributes

schedulerstype: Different types of schedulers to choose from.
scheduler_step_fntype: How scheduler updates learning rate.
initial_lrtype: Initial set learning rate.
scheduler_choicetype: What scheduler type was chosen.
schedulertype: Scheduler object chosen that will more directly update optimizer LR.

Methods

`get_lr`(self)	Return current learning rate.
`step`(self)	Update optimizer learning rate

get_lr(self)[source]¶

Return current learning rate.

Returns

float: Current learning rate.

step(self)[source]¶: Update optimizer learning rate

visualize.py¶

Plots SHAP outputs, UMAP embeddings, and overlays predictions on top of WSI.

class pathflowai.visualize.PlotlyPlot[source]¶

Creates plotly html plots.

Methods

`add_plot`(self, t_data_df[, G, color_col, …])	Adds plotting data to be plotted.
`plot`(self, output_fname[, axes_off])	Plot embedding of patches to html file.

add_plot(self, t_data_df, G=None, color_col='color', name_col='name', xyz_cols=['x', 'y', 'z'], size=2, opacity=1.0, custom_colors=[])[source]¶

Adds plotting data to be plotted.

Parameters

t_data_df:dataframe: 3-D transformed dataframe.
G:nx.Graph: Networkx graph.
color_col:str: Column to use to color points.
name_col:str: Column to use to name points.
xyz_cols:list: 3 columns that denote x,y,z coords.
size:int: Marker size.
opacity:float: Marker opacity.
custom_colors:list: Custom colors to supply.

plot(self, output_fname, axes_off=False)[source]¶

Plot embedding of patches to html file.

Parameters

output_fname:str: Output html file.
axes_off:bool: Remove axes.

class pathflowai.visualize.PredictionPlotter(dask_arr_dict, patch_info_db, compression_factor=3, alpha=0.5, patch_size=224, no_db=False, plot_annotation=False, segmentation=False, n_segmentation_classes=4, input_dir='', annotation_col='annotation', scaling_factor=1.0)[source]¶

Plots predictions over entire image.

Parameters

dask_arr_dict:dict: Stores all dask arrays corresponding to all of the images.
patch_info_db:str: Patch level information, eg. prediction.
compression_factor:float: How much to compress image by.
alpha:float: Low value assigns higher weight to prediction over original image.
patch_size:int: Patch size.
no_db:bool: Don’t use patch information.
plot_annotation:bool: Plot annotations from patch information.
segmentation:bool: Plot segmentation mask.
n_segmentation_classes:int: Number segmentation classes.
input_dir:str: Input directory.
annotation_col:str: Annotation column to plot.
scaling_factor:float: Multiplies the prediction scores to make them appear darker on the images when predicting.

Methods

`add_custom_segmentation`(self, basename, npy)	Replace segmentation mask with new custom segmentation.
`generate_image`(self, ID)	Generate the image array for the whole slide image with predictions overlaid.
`output_image`(self, img, filename[, tif])	Output calculated image to file.
`return_patch`(self, ID, x, y, patch_size)	Return one single patch instead of entire image.

add_custom_segmentation(self, basename, npy)[source]¶

Replace segmentation mask with new custom segmentation.

Parameters

basename:str: Patient ID
npy:str: Numpy mask.

generate_image(self, ID)[source]¶

Generate the image array for the whole slide image with predictions overlaid.

Parameters

ID:str: patient ID.

Returns

array: Resulting overlaid whole slide image.

output_image(self, img, filename, tif=False)[source]¶

Output calculated image to file.

Parameters

img:array: Image.
filename:str: Output file name.
tif:bool: Store in TIF format?

return_patch(self, ID, x, y, patch_size)[source]¶

Return one single patch instead of entire image.

Parameters

ID:str: Patient ID
x:int: X coordinate.
y:int: Y coordinate.
patch_size:int: Patch size.

Returns

array: Image.

pathflowai.visualize.annotation2rgb(i, palette, arr)[source]¶

Go from annotation of patch to color.

Parameters

i:int: Annotation index.
palette:palette: Index to color mapping.
arr:array: Image array.

Returns

array: Resulting image.

pathflowai.visualize.blend(arr1, arr2, alpha=0.5)[source]¶

Blend 2 arrays together, mixing with alpha.

Parameters

arr1:array: Image 1.
arr2:array: Image 2.
alpha:float: Higher alpha makes image more like image 1.

Returns

array: Resulting image.

pathflowai.visualize.plot_image_(image_file, compression_factor=2.0, test_image_name='test.png')[source]¶

Plots entire SVS/other image.

Parameters

image_file:str: Image file.
compression_factor:float: Amount to shrink each dimension of image.
test_image_name:str: Output image file.

pathflowai.visualize.plot_shap(model, dataset_opts, transform_opts, batch_size, outputfilename, n_outputs=1, method='deep', local_smoothing=0.0, n_samples=20, pred_out=False)[source]¶

Plot shapley attributions overlaid on images for classification tasks.

Parameters

model:nn.Module: Pytorch model.
dataset_opts:dict: Options used to configure dataset
transform_opts:dict: Options used to configure transformers.
batch_size:int: Batch size for training.
outputfilename:str: Output filename.
n_outputs:int: Number of top outputs.
method:str: Gradient or deep explainer.
local_smoothing:float: How much to smooth shapley map.
n_samples:int: Number shapley samples to draw.
pred_out:bool: Label images with binary prediction score?

pathflowai.visualize.plot_umap_images(dask_arr_dict, embeddings_file, ID=None, cval=1.0, image_res=300.0, outputfname='output_embedding.png', mpl_scatter=True, remove_background_annotation='', max_background_area=0.01, zoom=0.05, n_neighbors=10, sort_col='', sort_mode='asc')[source]¶

Make UMAP embedding plot, overlaid with images.

Parameters

dask_arr_dict:dict: Stored dask arrays for each WSI.
embeddings_file:str: Embeddings pickle file stored from running using after trainign the model.
ID:str: Patient ID.
cval:float: Deprecated
image_res:float: Image resolution.
outputfname:str: Output image file.
mpl_scatter:bool: Recommended: Use matplotlib for scatter plot.
remove_background_annotation:str: Remove the background annotations. Enter for annotation to remove.
max_background_area:float: Maximum backgrund area in each tile for inclusion.
zoom:float: How much to zoom in on each patch, less than 1 is zoom out.
n_neighbors:int: Number of neighbors for UMAP embedding.
sort_col:str: Patch info column to sort on.
sort_mode:str: Sort ascending or descending.

Returns

type: Description of returned object.
Inspired by: https://gist.github.com/lukemetz/be6123c7ee3b366e333a
WIP!! Needs testing.

pathflowai.visualize.prob2rbg(prob, palette, arr)[source]¶

Convert probability score to rgb image.

Parameters

prob:float: Between 0 and 1 score.
palette:palette: Pallet converts between prob and color.
arr:array: Original array.

Returns

array: New image colored by prediction score.

pathflowai.visualize.seg2rgb(seg, palette, n_segmentation_classes)[source]¶

Color each pixel by segmentation class.

Parameters

seg:array: Segmentation mask.
palette:palette: Color to RGB map.
n_segmentation_classes:int: Total number segmentation classes.

Returns

array: Returned segmentation image.

pathflowai.visualize.to_pil(arr)[source]¶

Numpy array to pil.

Parameters

arr:array: Numpy array.

Returns

Image: PIL Image.

utils.py¶

General utilities that still need to be broken up into preprocessing, machine learning input preparation, and output submodules.

pathflowai.utils.add_purple_mask(arr)[source]¶

Optional add intensity mask to the dask array.

Parameters

arr:dask.array: Image data.

Returns

array: Image data with intensity added as forth channel.

pathflowai.utils.adjust_mask(mask_file, dask_img_array_file, out_npy, n_neighbors)[source]¶

Fixes segmentation masks to reduce coarse annotations over empty regions.

Parameters

mask_file:str: NPY segmentation mask.
dask_img_array_file:str: Dask image file.
out_npy:str: Output numpy file.
n_neighbors:int: Number nearest neighbors for dilation and erosion of mask from background to not background.

Returns

str: Output numpy file.

pathflowai.utils.boxes2interior(img_size, polygons)[source]¶: Deprecated.

pathflowai.utils.create_purple_mask(arr, img_size=None, sparse=True)[source]¶

Create a gray scale intensity mask. This will be changed soon to support other thresholding QC methods.

Parameters

arr:dask.array: Dask array containing image information.
img_size:int: Deprecated.
sparse:bool: Deprecated

Returns

dask.array: Intensity, grayscale array over image.

pathflowai.utils.create_sparse_annotation_arrays(xml_file, img_size, annotations=[])[source]¶

Convert annotation xml to shapely objects and store in dictionary.

Parameters

xml_file:str: XML file containing annotations.
img_size:int: Deprecated.
annotations:list: Annotations to look for in xml export.

Returns

dict: Dictionary with annotation-shapely object pairs.

pathflowai.utils.create_train_val_test(train_val_test_pkl, input_info_db, patch_size)[source]¶

Create dataframe that splits slides into training validation and test.

Parameters

train_val_test_pkl:str: Pickle for training validation and test slides.
input_info_db:str: Patch information SQL database.
patch_size:int: Patch size looking to access.

Returns

dataframe: Train test validation splits.

pathflowai.utils.df2sql(df, sql_file, patch_size, mode='replace')[source]¶

Write dataframe containing patch level information to SQL db.

Parameters

df:dataframe: Dataframe containing patch information.
sql_file:str: SQL database.
patch_size:int: Size of patches.
mode:str: Replace or append.

pathflowai.utils.dir2images(image_dir)[source]¶: Deprecated

pathflowai.utils.extract_patch_information(basename, input_dir='./', annotations=[], threshold=0.5, patch_size=224, generate_finetune_segmentation=False, target_class=0, intensity_threshold=100.0, target_threshold=0.0, adj_mask='', basic_preprocess=False, tries=0)[source]¶

Final step of preprocessing pipeline. Break up image into patches, include if not background and of a certain intensity, find area of each annotation type in patch, spatial information, image ID and dump data to SQL table.

Parameters

basename:str: Patient ID.
input_dir:str: Input directory.
annotations:list: List of annotations to record, these can be different tissue types, must correspond with XML labels.
threshold:float: Value between 0 and 1 that indicates the minimum amount of patch that musn’t be background for inclusion.
patch_size:int: Patch size of patches; this will become one of the tables.
generate_finetune_segmentation:bool: Deprecated.
target_class:int: Number of segmentation classes desired, from 0th class to target_class-1 will be annotated in SQL.
intensity_threshold:float: Value between 0 and 255 that represents minimum intensity to not include as background. Will be modified with new transforms.
target_threshold:float: Deprecated.
adj_mask:str: Adjusted mask if performed binary opening operations in previous preprocessing step.
basic_preprocess:bool: Do not store patch level information.
tries:int: Number of tries in case there is a Dask timeout, run again.

Returns

dataframe: Patch information.

pathflowai.utils.fix_name(basename)[source]¶: Fixes illegitimate basename, deprecated.

pathflowai.utils.fix_names(file_dir)[source]¶: Fixes basenames, deprecated.

pathflowai.utils.generate_patch_pipeline(basename, input_dir='./', annotations=[], threshold=0.5, patch_size=224, out_db='patch_info.db', generate_finetune_segmentation=False, target_class=0, intensity_threshold=100.0, target_threshold=0.0, adj_mask='', basic_preprocess=False)[source]¶

Short summary.

Parameters

basename:str: Patient ID.
input_dir:str: Input directory.
annotations:list: List of annotations to record, these can be different tissue types, must correspond with XML labels.
threshold:float: Value between 0 and 1 that indicates the minimum amount of patch that musn’t be background for inclusion.
patch_size:int: Patch size of patches; this will become one of the tables.
out_db:str: Output SQL database.
generate_finetune_segmentation:bool: Deprecated.
target_class:int: Number of segmentation classes desired, from 0th class to target_class-1 will be annotated in SQL.
intensity_threshold:float: Value between 0 and 255 that represents minimum intensity to not include as background. Will be modified with new transforms.
target_threshold:float: Deprecated.
adj_mask:str: Adjusted mask if performed binary opening operations in previous preprocessing step.
basic_preprocess:bool: Do not store patch level information.

pathflowai.utils.grab_interior_points(xml_file, img_size, annotations=[])[source]¶: Deprecated.

pathflowai.utils.image2coords(image_file, output_point=False)[source]¶: Deprecated.

pathflowai.utils.images2coord_dict(images, output_point=False)[source]¶: Deprecated

pathflowai.utils.img2npy_(input_dir, basename, svs_file)[source]¶

Convert SVS, TIF, TIFF to NPY.

Parameters

input_dir:str: Output file dir.
basename:str: Basename of output file
svs_file:str: SVS, TIF, TIFF file input.

Returns

str: NPY output file.

pathflowai.utils.is_coords_in_box(coords, patch_size, boxes)[source]¶

Get area of annotation in patch.

Parameters

coords:array: X,Y coordinates of patch.
patch_size:int: Patch size.
boxes:list: Shapely objects for annotations.

Returns

float: Area of annotation type.

pathflowai.utils.is_image_in_boxes(image_coord_dict, boxes)[source]¶

Find if image intersects with annotations.

Parameters

image_coord_dict:dict: Dictionary of patches.
boxes:list: Shapely annotation shapes.

Returns

dict: Dictionary of whether image intersects with any of the annotations.

pathflowai.utils.is_valid_patch(xs, ys, patch_size, purple_mask, intensity_threshold, threshold=0.5)[source]¶: Deprecated, computes whether patch is valid.

pathflowai.utils.load_dataset(in_zarr, in_pkl)[source]¶

Load ZARR image and annotations pickle.

Parameters

in_zarr:str: Input image.
in_pkl:str: Input annotations.

Returns

dask.array: Image array.
dict: Annotations dictionary.

pathflowai.utils.load_image(svs_file)[source]¶

Load SVS, TIF, TIFF

Parameters

svs_file:type: Description of parameter svs_file.

Returns

type: Description of returned object.

pathflowai.utils.load_process_image(svs_file, xml_file=None, npy_mask=None, annotations=[])[source]¶

Load SVS-like image (including NPY), segmentation/classification annotations, generate dask array and dictionary of annotations.

Parameters

svs_file:str: Image file
xml_file:str: Annotation file.
npy_mask:array: Numpy segmentation mask.
annotations:list: List of annotations in xml.

Returns

array: Dask array of image.
dict: Annotation masks.

pathflowai.utils.load_sql_df(sql_file, patch_size)[source]¶

Load pandas dataframe from SQL, accessing particular patch size within SQL.

Parameters

sql_file:str: SQL db.
patch_size:int: Patch size.

Returns

dataframe: Patch level information.

modify_patch_info(input_info_db='patch_info.db', slide_labels=Empty DataFrame

Columns: []

Index: [], pos_annotation_class='', patch_size=224, segmentation=False, other_annotations=[], target_segmentation_class=-1, target_threshold=0.0, classify_annotations=False)

Modify the patch information to get ready for deep learning, incorporate whole slide labels if needed.

Parameters

input_info_db:str: SQL DB file.
slide_labels:dataframe: Dataframe with whole slide labels.
pos_annotation_class:str: Tissue/annotation label to label with whole slide image label, if not supplied, any slide’s patches receive the whole slide label.
patch_size:int: Patch size.
segmentation:bool: Segmentation?
other_annotations:list: Other annotations to access from patch information.
target_segmentation_class:int: Segmentation class to threshold.
target_threshold:float: Include patch if patch has target area greater than this.
classify_annotations:bool: Classifying annotations for pretraining, or final model?

Returns

dataframe: Modified patch information.

pathflowai.utils.npy2da(npy_file)[source]¶

Numpy to dask array.

Parameters

npy_file:str: Input npy file.

Returns

dask.array: Converted numpy array to dask.

pathflowai.utils.parse_coord_return_boxes(xml_file, annotation_name='', return_coords=False)[source]¶

Get list of shapely objects for each annotation in the XML object.

Parameters

xml_file:str: Annotation file.
annotation_name:str: Name of xml annotation.
return_coords:bool: Just return list of coords over shapes.

Returns

list: List of shapely objects.

pathflowai.utils.process_svs(svs_file, xml_file, annotations=[], output_dir='./')[source]¶

Store images into npy format and store annotations into pickle dictionary.

Parameters

svs_file:str: Image file.
xml_file:str: Annotations file.
annotations:list: List of annotations in image.
output_dir:str: Output directory.

pathflowai.utils.retain_images(image_dir, xml_file, annotation='')[source]¶: Deprecated

pathflowai.utils.return_image_coord(nx=0, ny=0, xl=3333, yl=3333, xi=0, yi=0, xc=3, yc=3, dimx=224, dimy=224, output_point=False)[source]¶: Deprecated

pathflowai.utils.return_image_in_boxes_dict(image_dir, xml_file, annotation='')[source]¶: Deprecated

pathflowai.utils.run_preprocessing_pipeline(svs_file, xml_file=None, npy_mask=None, annotations=[], out_zarr='output_zarr.zarr', out_pkl='output.pkl')[source]¶

Run preprocessing pipeline. Store image into zarr format, segmentations maintain as npy, and xml annotations as pickle.

Parameters

svs_file:str: Input image file.
xml_file:str: Input annotation file.
npy_mask:str: NPY segmentation mask.
annotations:list: List of annotations.
out_zarr:str: Output zarr for image.
out_pkl:str: Output pickle for annotations.

pathflowai.utils.save_all_patch_info(basenames, input_dir='./', annotations=[], threshold=0.5, patch_size=224, output_pkl='patch_info.pkl')[source]¶: Deprecated.

pathflowai.utils.save_dataset(arr, masks, out_zarr, out_pkl)[source]¶

Saves dask array image, dictionary of annotations to zarr and pickle respectively.

Parameters

arr:array: Image.
masks:dict: Dictionary of annotation shapes.
out_zarr:str: Zarr output file for image.
out_pkl:str: Pickle output file.

pathflowai.utils.segmentation_predictions2npy(y_pred, patch_info, segmentation_map, npy_output)[source]¶

Convert segmentation predictions from model to numpy masks.

Parameters

y_pred:list: List of patch segmentation masks
patch_info:dataframe: Patch information from DB.
segmentation_map:array: Existing segmentation mask.
npy_output:str: Output npy file.

pathflowai.utils.svs2dask_array(svs_file, tile_size=1000, overlap=0, remove_last=True, allow_unknown_chunksizes=False)[source]¶

Convert SVS, TIF or TIFF to dask array.

Parameters

svs_file:str: Image file.
tile_size:int: Size of chunk to be read in.
overlap:int: Do not modify, overlap between neighboring tiles.
remove_last:bool: Remove last tile because it has a custom size.
allow_unknown_chunksizes: bool: Allow different chunk sizes, more flexible, but slowdown.

Returns

dask.array: Dask Array.

>>> arr=svs2dask_array(svs_file, tile_size=1000, overlap=0, remove_last=True, allow_unknown_chunksizes=False)
    ..

>>> arr2=arr.compute()
    ..

>>> arr3=to_pil(cv2.resize(arr2, dsize=(1440,700), interpolation=cv2.INTER_CUBIC))
    ..

>>> arr3.save(test_image_name)
    ..

Welcome to PathFlowAI’s documentation!¶

pathflowai-preprocess¶

alter_masks¶

collapse_annotations¶

preprocess_pipeline¶

remove_basename_from_db¶

pathflowai-visualize¶

extract_patch¶

overlay_new_annotations¶

plot_embeddings¶

plot_image¶

plot_image_umap_embeddings¶

plot_predictions¶

shapley_plot¶

pathflowai-monitor¶

monitor_usage¶

datasets.py¶

losses.py¶

sampler.py¶

schedulers.py¶

visualize.py¶

utils.py¶

Indices and tables¶