Manages Training and Test Data for building Statistical Models and provides functionality for Crossvalidation. More...
#include <DataManager.h>
Public Types | |
typedef Representer< T > | RepresenterType |
typedef RepresenterType::DatasetPointerType | DatasetPointerType |
typedef RepresenterType::DatasetConstPointerType | DatasetConstPointerType |
typedef DataItem< T > | DataItemType |
typedef DataItemWithSurrogates< T > | DataItemWithSurrogatesType |
typedef std::list< const DataItemType * > | DataItemListType |
typedef CrossValidationFold< T > | CrossValidationFoldType |
typedef std::list < CrossValidationFoldType > | CrossValidationFoldListType |
Public Member Functions | |
void | Delete () |
virtual | ~DataManager () |
virtual void | AddDataset (DatasetConstPointerType dataset, const std::string &URI) |
virtual void | Save (const std::string &filename) const |
DataItemListType | GetData () const |
unsigned | GetNumberOfSamples () const |
CrossValidationFoldListType | GetCrossValidationFolds (unsigned nFolds, bool randomize=true) const |
CrossValidationFoldListType | GetLeaveOneOutCrossValidationFolds () const |
Static Public Member Functions | |
static DataManager< T > * | Create (const RepresenterType *representer) |
static DataManager< T > * | Load (Representer< T > *representer, const std::string &filename) |
Protected Member Functions | |
DataManager (const RepresenterType *representer) | |
DataManager (const DataManager< T > &orig) | |
DataManager & | operator= (const DataManager< T > &rhs) |
Protected Attributes | |
RepresenterType * | m_representer |
DataItemListType | m_DataItemList |
Manages Training and Test Data for building Statistical Models and provides functionality for Crossvalidation.
The DataManager class provides functionality for loading and managing data sets to be used in the statistical model. The datasets are loaded either by using DataManager::AddDataset or directly from a hdf5 File using the Load function. Per default all the datasets are marked as training data. It is, however, often useful to leave a few datasets out to validate the model. For this purpose, the DataManager class implements basic crossvalidation functionality.
Note that while Dataset are provided, the Representer class automatically converts them into Samples (Representer::DatasetToSample) For efficiency purposes, the data is internally stored as a large matrix, using the internal SampleVector representation (Representer::DatasetToSample). Furthermore, Statismo emphasizes on traceability, and ties information with the datasets, such as the original filename. This means that when accessing the data stored in the DataManager, one gets a DataItem structure
|
virtual |
Destructor
|
virtual |
Add a dataset to the data manager.
dataset | the dataset to be added |
URI | A string containing the URI of the given dataset. This is only added as an info to the metadata. |
While it is not strictly necessary, and sometimes not even possible, to specify a URI for the given dataset, it is strongly encouraged to add a description. The string will be added to the metadata and stored with the model. Having this information stored with the model may prove valuable at a later point in time.
|
inlinestatic |
Factory method that creates a new instance of a DataManager class
|
inline |
Destroy the object. The same effect can be achieved by deleting the object in the usual way using the c++ delete keyword.
DataManager< T >::CrossValidationFoldListType statismo::DataManager< T >::GetCrossValidationFolds | ( | unsigned | nFolds, |
bool | randomize = true |
||
) | const |
Assigns the data to one of n Folds to be used for cross validation. This method has to be called before cross validation can be started.
nFolds | The number of folds used in the crossvalidation |
randomize | If true, the data will be randomly assigned to the nfolds, otherwise the order with which it was added is preserved |
DataManager< T >::DataItemListType statismo::DataManager< T >::GetData | ( | ) | const |
return a list with all the sample data objects managed by the data manager
DataManager< T >::CrossValidationFoldListType statismo::DataManager< T >::GetLeaveOneOutCrossValidationFolds | ( | ) | const |
Generates Leave-one-out cross validation folds
|
inline |
returns the number of samples managed by the datamanager
|
static |
Create a new dataManager, with the data stored in the given hdf5 file
|
virtual |
Saves the data matrix and all URIs into an HDF5 file.
filename |