nunchaku package

Submodules

nunchaku.nunchaku module

class nunchaku.nunchaku.Nunchaku(X, Y, err=None, yrange=None, prior=None, estimate_err='default', minlen='default', method='simpson', bases=None, quiet=False)

Bases: object

Find how many contiguous segments a dataset with two variables (e.g. 1D-time series) should be partitioned into, and find the start and end of each segment. The segments can be either constant, linear, or linear combinations of arbitrary basis functions (e.g. polynomials).

Parameters:
  • X (list of floats or 1-D numpy array) – the x vector of data, sorted ascendingly.

  • Y (array-like) – the y vector or matrix of data, each row being one replicate of measurement.

  • err (list of floats or 1-D numpy array, optional) – the error of the input data.

  • yrange (list of length 2, optional) – the min and max of y allowed by the instrument’s readout. Only used when bases is None.

  • prior (list of length 2 or 4 or list of lists, optional) – When bases is None, the prior range of the gradient (and the intercept when length is 4). This argument will overwrite yrange. When bases is given, it is a list of lists, with each list being the prior range of each basis function’s coefficient.

  • estimate_err (bool, optional) – if True, estimate error from data; default True when Y has >= 5 replicates.

  • minlen (int, optional) – the minimal length of a valid segment (must be >= len(bases) + 1). Default value is len(bases) + 1.

  • method ({"simpson", "quad"}, default "simpson") – the numerical integration method to be used when error is neither estimated nor provided.

  • bases (list of functions, optional) – the basis functions, default to be linear ([numpy.ones_like, lambda x: x]).

  • quiet (bool, default False.) – if True, do not show progress bar.

Raises:
  • ValueError – when the length of prior is not 2 or 4, if provided, when bases is None.

  • ValueError – when the length of prior does not equal the length of bases when bases is given.

Examples

>>> from nunchaku import Nunchaku, get_example_data
>>> x, y = get_example_data()
>>> # load data and set the prior of the gradient
>>> nc = Nunchaku(x, y, prior=[-5, 5])
>>> # compare models with 1, 2, 3 and 4 linear segments
>>> numseg, evidences = nc.get_number(num_range=(1, 4))
>>> # get the mean and standard deviation of the boundary points
>>> bds, bds_std = nc.get_iboundaries(numseg)
>>> # get the information of all segments
>>> info_df = nc.get_info(bds)
>>> # plot the data and the segments
>>> nc.plot(info_df)
>>> # get the underlying piecewise function (for piecewise linear functions only)
>>> y_prediction = nc.predict(info_df)
get_MLE_of_error(numseg)

Returns the MLE of the data’s error estimated by expectation-maximisation.

Parameters:

numseg (int) – number of segments

Returns:

err – The MLE of the data’s error, assuming homogeneity of variance.

Return type:

float

Raises:

NotImplementedError – when the error is already provided or estimated.

get_iboundaries(numseg, round=True, bd_err=True)

Return the mean and standard deviation of the internal boundary indices, i.e. excluding the first (0) and last (len(x)) indices of the data.

Parameters:
  • numseg (int) – number of segments

  • round (bool, default True) – whether to round the returned mean to integer

  • bd_err (bool, default True) – whether to estimate the error of the boundary positions. Setting it to False will reduce the computational load.

Returns:

  • boundaries (list of int or float) – Indices of internal boundaries

  • boundaries_err (list of float) – Error of the indices of internal boundaries

Raises:

OverflowError – when numerical integration yields infinity.

get_info(boundaries)

Return a Pandas dataframe that describes the segments within given internal boundaries, returned by get_iboundaries().

Parameters:

boundaries (list of int) – a list of indices of boundary points

Returns:

df – Pandas dataframe that describes the segments within given internal boundaries,

Return type:

pd.Dataframe

get_number(num_range)

Get the number of segments of the highest evidence.

Parameters:

num_range (int or tuple of length 2) – if integer, the range is [1, num_range].

Returns:

  • best_numseg (int) – the number of segments with the highest evidence.

  • evi (float) – log10 the model evidence of each segment number M (log10 P(D|M)).

Raises:
  • OverflowError – when numerical integration yields infinity.

  • ValueError – argument num_range is neither int nor tuple.

plot(info_df=None, show=False, start=True, end=True, figsize=(6, 5), err_width=1, s=50, color='red', alpha=0.5, hlmax={'rsquare': ('orange', 's')}, hlmin=None, **kwargs)

Plot the raw data and the start and/or end points of each segment, and highlight the linear segments of interest if the model is piece-wise linear.

Parameters:
  • info_df (pandas.DataFrame, default None) – the pandas dataframe returned by get_info(); if None, only the data is shown.

  • show (bool, default False) – if True, call plt.show()

  • start (bool, default True) – if True, show the start point of each segment

  • end (bool, default True) – if True, show the end point of each segment

  • figsize (tuple, default (6, 5)) – size of figure passed to plt.subplots()

  • s (float, default 50) – size of the boundary points as passed into Matplotlib’s scatter()

  • color (str, default "red") – color of the boundary points and segments as passed into Matplotlib’s scatter() and plot()

  • alpha (float, default 0.5) – transparency of the boundary points and segments as passed into Matplotlib’s scatter() and plot()

  • hlmax (dict, default {"rsquare": ("orange", "s")}) – highlighting the linear segment with max quantity (e.g. rsquare). The key is the column name in info_df and the value is a tuple: (color, marker). This argument has no effect if the segments are not linear.

  • hlmin (dict, optional) – highlighting the linear segment with min quantity (e.g. rsquare). The key is the column name in info_df and the value is a tuple: (color, marker). This argument has no effect if the segments are not linear.

  • **kwargs (keyword arguments) – keyword arguments to be passed into Matplotlib’s scatter()

Returns:

  • fig (matplotlib.figure.Figure object) – Matplotlib Figure object

  • axes (matplotlib.axes.Axes object) – Matplotlib Axes object

predict(info_df)

Returns the estimated piecewise-linear function. Note: this function only works when the model is piece-wise linear.

Parameters:

info_df (pandas.DataFrame) – the pandas dataframe returned by get_info().

Returns:

y_preds – the estimated piecewise-linear function.

Return type:

numpy.ndarray

nunchaku.nunchaku.get_example_data(plot=False)

Return example data, with x being cell number and y being three replicates of OD measurement.

Parameters:

plot (bool, default False) – If true, plot the example data.

Returns:

  • x (1D numpy array) – Example data of x

  • y (2D numpy array) – Example data of y

Examples

>>> from nunchaku.nunchaku import get_example_data
>>> x, y = get_example_data()
nunchaku.nunchaku.log_matmul(A, B)

see https://stackoverflow.com/questions/36467022/handling-matrix-multiplication-in-log-space-in-python

Module contents