Python API¶
Database¶
Database specifications for an evaluation protocol based on the Iris Flower databases from Fisher’s original work.
- rr.database.get(protocol, subset, classes=['setosa', 'versicolor', 'virginica'], variables=['sepal length', 'sepal width', 'petal length', 'petal width'])[source]¶
Returns the data subset given a particular protocol
- Parameters:
protocol (str) – one of the valid protocols supported by this interface
subset (str) – one of ‘train’ or ‘test’
classes (
list
ofstr
) – a list of strings containing the names of the classes from which you want to have the data fromvariables (
list
ofstr
) – a list of strings containg the names of the variables (features) you want to have data from
- Returns:
data – The data for all the classes and variables nicely packed into one numpy 3D array. One depth represents the data for one class, one row is one example, one column a given feature.
- Return type:
Pre-processor¶
A simple pre-processing that applies Z-normalization to the input features
- rr.preprocessor.estimate_norm(X)[source]¶
Estimates the mean and standard deviation from a data set
- Parameters:
X (numpy.ndarray) – A 2D numpy ndarray in which the rows represent examples while the columns, features of the data you want to estimate normalization parameters on
- Returns:
mean (numpy.ndarray) – A 1D numpy ndarray containing the estimated mean over dimension 1 (columns) of the input data X
std (numpy.ndarray) – A 1D numpy ndarray containing the estimated unbiased standard deviation over dimension 1 (columns) of the input data X
- rr.preprocessor.normalize(X, norm)[source]¶
Applies the given norm to the input data set
- Parameters:
X (numpy.ndarray) – A 3D numpy ndarray in which the rows represent examples while the columns, features of the data set you want to normalize. Every depth corresponds to data for a particular class
norm (tuple) – A tuple containing two 1D numpy ndarrays corresponding to the normalization parameters extracted with
estimated_norm()
above.
- Returns:
X_normed – A 3D numpy ndarray with the same dimensions as the input array
X
, but with its values normalized according to the norm input.- Return type:
Machine Learning Algorithm¶
- rr.algorithm.make_labels(X)[source]¶
Helper function that generates a single 1D array with labels which are good targets for stock logistic regression.
- Parameters:
X (numpy.ndarray) – The input data matrix. This must be an array with 3 dimensions or an iterable containing 2 arrays with 2 dimensions each. Each correspond to the data for one of the two classes, every row corresponds to one example of the data set, every column, one different feature.
- Returns:
labels – With a single dimension, containing suitable labels for all rows and for all classes defined in X (depth).
- Return type:
- class rr.algorithm.Machine(theta)[source]¶
A class to handle all run-time aspects for Logistic Regression
- Parameters:
theta (numpy.ndarray) – A set of parameters for the Logistic Regression model. This must be an iterable (or numpy array) with all parameters for the model, including the bias term, which must be on entry 0 (the first entry at the iterable).
- predict(X)[source]¶
Predicts the class of each row of X
- Parameters:
X (numpy.ndarray) – The input data matrix. This must be an array with 2 dimensions. Every row corresponds to one example of the data set, every column, one different feature.
- Returns:
predictions – A 1D array with as many entries as rows in the input 2D array
X
, representing g(x), the class predictions for the current machine.- Return type:
- J(X, regularizer=0.0)[source]¶
Calculates the logistic regression cost
- Parameters:
X (numpy.ndarray) – The input data matrix. This must be an array with 3 dimensions or an iterable containing 2 numpy.ndarrays with 2 dimensions each. Each correspond to the data for one of the two classes, every row corresponds to one example of the data set, every column, one different feature.
regularizer (float) – The regularization parameter
- Returns:
cost – The averaged (regularized) cost for the whole dataset
- Return type:
- dJ(X, regularizer=0.0)[source]¶
Calculates the logistic regression first derivative of the cost w.r.t. each parameter theta
- Parameters:
X (numpy.ndarray) – The input data matrix. This must be an array with 3 dimensions or an iterable containing 2 arrays with 2 dimensions each. Each correspond to the data for one of the two classes, every row corresponds to one example of the data set, every column, one different feature.
regularizer (float) – The regularization parameter, if the solution should be regularized.
- Returns:
grad – A 1D array with as many entries as columns on the input matrix
X
plus 1 (the bias term). It denotes the average gradient of the cost w.r.t. to each machine parameter theta.- Return type:
- class rr.algorithm.Trainer(regularizer=0.0)[source]¶
A class to handle all training aspects for Logistic Regression
- Parameters:
regularizer (float) – The regularization parameter
- dJ(theta, machine, X)[source]¶
Calculates the vectorized partial derivative of the cost J w.r.t. to all :math:` heta`’s. Use the training dataset.
- train(X)[source]¶
Optimizes the machine parameters to fit the input data, using
scipy.optimize.fmin_l_bfgs_b
.- Parameters:
X (numpy.ndarray) – The input data matrix. This must be an array with 3 dimensions or an iterable containing 2 arrays with 2 dimensions each. Each correspond to the data for one of the two classes, every row corresponds to one example of the data set, every column, one different feature.
- Returns:
machine – A trained machine.
- Return type:
- Raises:
RuntimeError – In case problems exist with the design matrix
X
or with convergence.
- class rr.algorithm.MultiClassMachine(machines)[source]¶
A class to handle all run-time aspects for Multiclass Log. Regression
- predict(X)[source]¶
Predicts the class of each row of X
- Parameters:
X (numpy.ndarray) – The input data matrix. This must be an array with 3 dimensions or an iterable containing 2 arrays with 2 dimensions each. Each correspond to the data for one of the two classes, every row corresponds to one example of the data set, every column, one different feature.
- Returns:
predictions – A 1D array with as many entries as rows in the input 2D array
X
, representing g(x), the class predictions for the current machine.- Return type:
- class rr.algorithm.MultiClassTrainer(regularizer=0.0)[source]¶
A class to handle all training aspects for Multiclass Log. Regression
- Parameters:
regularizer (float) – The regularization parameter
- train(X)[source]¶
Trains multiple logistic regression classifiers to handle the multiclass problem posed by
X
.- Parameters:
X (numpy.ndarray) – The input data matrix. This must be an array with 3 dimensions or an iterable containing 2 arrays with 2 dimensions each. Each correspond to the data for one of the input classes, every row corresponds to one example of the data set, every column, one different feature.
- Returns:
machine – A trained multiclass machine.
- Return type:
Analysis¶
- rr.analysis.CER(prediction, true_labels)[source]¶
Calculates the classification error rate for an N-class classification problem
Parameters:
- prediction (numpy.ndarray): A 1D
numpy.ndarray
containing your prediction
- true_labels (numpy.ndarray): A 1D
numpy.ndarray
containing the ground truth labels for the input array, organized in the same order.
- prediction (numpy.ndarray): A 1D