Python API

Database

Database specifications for an evaluation protocol based on the Iris Flower databases from Fisher’s original work.

rr.database.load()[source]

Loads the data from its CSV format into an easy to dictionary of arrays

rr.database.split_data(data, subset, splits)[source]

Returns the data for a given protocol

rr.database.get(protocol, subset, classes=['setosa', 'versicolor', 'virginica'], variables=['sepal length', 'sepal width', 'petal length', 'petal width'])[source]

Returns the data subset given a particular protocol

Parameters:
  • protocol (str) – one of the valid protocols supported by this interface

  • subset (str) – one of ‘train’ or ‘test’

  • classes (list of str) – a list of strings containing the names of the classes from which you want to have the data from

  • variables (list of str) – a list of strings containg the names of the variables (features) you want to have data from

Returns:

data – The data for all the classes and variables nicely packed into one numpy 3D array. One depth represents the data for one class, one row is one example, one column a given feature.

Return type:

numpy.ndarray

Pre-processor

A simple pre-processing that applies Z-normalization to the input features

rr.preprocessor.estimate_norm(X)[source]

Estimates the mean and standard deviation from a data set

Parameters:

X (numpy.ndarray) – A 2D numpy ndarray in which the rows represent examples while the columns, features of the data you want to estimate normalization parameters on

Returns:

  • mean (numpy.ndarray) – A 1D numpy ndarray containing the estimated mean over dimension 1 (columns) of the input data X

  • std (numpy.ndarray) – A 1D numpy ndarray containing the estimated unbiased standard deviation over dimension 1 (columns) of the input data X

rr.preprocessor.normalize(X, norm)[source]

Applies the given norm to the input data set

Parameters:
  • X (numpy.ndarray) – A 3D numpy ndarray in which the rows represent examples while the columns, features of the data set you want to normalize. Every depth corresponds to data for a particular class

  • norm (tuple) – A tuple containing two 1D numpy ndarrays corresponding to the normalization parameters extracted with estimated_norm() above.

Returns:

X_normed – A 3D numpy ndarray with the same dimensions as the input array X, but with its values normalized according to the norm input.

Return type:

numpy.ndarray

Machine Learning Algorithm

rr.algorithm.make_labels(X)[source]

Helper function that generates a single 1D array with labels which are good targets for stock logistic regression.

Parameters:

X (numpy.ndarray) – The input data matrix. This must be an array with 3 dimensions or an iterable containing 2 arrays with 2 dimensions each. Each correspond to the data for one of the two classes, every row corresponds to one example of the data set, every column, one different feature.

Returns:

labels – With a single dimension, containing suitable labels for all rows and for all classes defined in X (depth).

Return type:

numpy.ndarray

class rr.algorithm.Machine(theta)[source]

A class to handle all run-time aspects for Logistic Regression

Parameters:

theta (numpy.ndarray) – A set of parameters for the Logistic Regression model. This must be an iterable (or numpy array) with all parameters for the model, including the bias term, which must be on entry 0 (the first entry at the iterable).

predict(X)[source]

Predicts the class of each row of X

Parameters:

X (numpy.ndarray) – The input data matrix. This must be an array with 2 dimensions. Every row corresponds to one example of the data set, every column, one different feature.

Returns:

predictions – A 1D array with as many entries as rows in the input 2D array X, representing g(x), the class predictions for the current machine.

Return type:

numpy.ndarray

J(X, regularizer=0.0)[source]

Calculates the logistic regression cost

Parameters:
  • X (numpy.ndarray) – The input data matrix. This must be an array with 3 dimensions or an iterable containing 2 numpy.ndarrays with 2 dimensions each. Each correspond to the data for one of the two classes, every row corresponds to one example of the data set, every column, one different feature.

  • regularizer (float) – The regularization parameter

Returns:

cost – The averaged (regularized) cost for the whole dataset

Return type:

float

dJ(X, regularizer=0.0)[source]

Calculates the logistic regression first derivative of the cost w.r.t. each parameter theta

Parameters:
  • X (numpy.ndarray) – The input data matrix. This must be an array with 3 dimensions or an iterable containing 2 arrays with 2 dimensions each. Each correspond to the data for one of the two classes, every row corresponds to one example of the data set, every column, one different feature.

  • regularizer (float) – The regularization parameter, if the solution should be regularized.

Returns:

grad – A 1D array with as many entries as columns on the input matrix X plus 1 (the bias term). It denotes the average gradient of the cost w.r.t. to each machine parameter theta.

Return type:

numpy.ndarray

class rr.algorithm.Trainer(regularizer=0.0)[source]

A class to handle all training aspects for Logistic Regression

Parameters:

regularizer (float) – The regularization parameter

J(theta, machine, X)[source]

Calculates the vectorized cost J.

dJ(theta, machine, X)[source]

Calculates the vectorized partial derivative of the cost J w.r.t. to all :math:` heta`’s. Use the training dataset.

train(X)[source]

Optimizes the machine parameters to fit the input data, using scipy.optimize.fmin_l_bfgs_b.

Parameters:

X (numpy.ndarray) – The input data matrix. This must be an array with 3 dimensions or an iterable containing 2 arrays with 2 dimensions each. Each correspond to the data for one of the two classes, every row corresponds to one example of the data set, every column, one different feature.

Returns:

machine – A trained machine.

Return type:

Machine

Raises:

RuntimeError – In case problems exist with the design matrix X or with convergence.

class rr.algorithm.MultiClassMachine(machines)[source]

A class to handle all run-time aspects for Multiclass Log. Regression

Parameters:

machines (list or tuple) – An iterable over any number of machines that will be stored.

predict(X)[source]

Predicts the class of each row of X

Parameters:

X (numpy.ndarray) – The input data matrix. This must be an array with 3 dimensions or an iterable containing 2 arrays with 2 dimensions each. Each correspond to the data for one of the two classes, every row corresponds to one example of the data set, every column, one different feature.

Returns:

predictions – A 1D array with as many entries as rows in the input 2D array X, representing g(x), the class predictions for the current machine.

Return type:

numpy.ndarray

class rr.algorithm.MultiClassTrainer(regularizer=0.0)[source]

A class to handle all training aspects for Multiclass Log. Regression

Parameters:

regularizer (float) – The regularization parameter

train(X)[source]

Trains multiple logistic regression classifiers to handle the multiclass problem posed by X.

Parameters:

X (numpy.ndarray) – The input data matrix. This must be an array with 3 dimensions or an iterable containing 2 arrays with 2 dimensions each. Each correspond to the data for one of the input classes, every row corresponds to one example of the data set, every column, one different feature.

Returns:

machine – A trained multiclass machine.

Return type:

Machine

Analysis

rr.analysis.CER(prediction, true_labels)[source]

Calculates the classification error rate for an N-class classification problem

Parameters:

prediction (numpy.ndarray): A 1D numpy.ndarray containing your

prediction

true_labels (numpy.ndarray): A 1D numpy.ndarray

containing the ground truth labels for the input array, organized in the same order.

Testing