DataSet

From Planimate Knowledge Base
Revision as of 18:33, 13 January 2008 by Rick (talk | contribs)
Jump to navigation Jump to search

Planimate® 5.07 introduced a new framework to save and load model data, called "DataSets 2". For information about the original DataSet/Scenario mechanism in Planimate (which has options for it in the menu bar) see DataSet (Scenario).

Key Features of DataSet 2

  • The system manages saving and reloading of specified label lists, sub label lists, attributes and tables to a specified filename. There is no limit to the number of dataset configurations as they are defined by a model table.
  • The dataset file is compressed and encrypted using a modeller specified key. With the choice of a good key, the file will be extremely difficult to crack.
  • Dataset loading and saving is completely controlled by modeller routine code, using a table to define the data objects to save and reload. All errors are reported as return codes, enabling the modeller to retain control.
  • Data can be loaded into objects determined by the modeller at runtime, not only the exact locations from which it was saved.
  • Numerical data is saved and reloaded with bit perfect precision.
  • Calendar dates are automatically translated to the run date offset of the model; run restarts and changing the run start date are no longer required
  • Designed to support huge tables (up to 500 million rows)

Key Features for Label Lists

  • Replacement or merging of label lists is supported
  • Label aliases are also retained when label lists are replaced on load
  • Labels are properly merged/loaded even if the label list for data was not included in the dataset. Only required labels are merged.
  • If a label has become an alias to another label, the loaded data will be updated.
  • If a data set attribute / column requires a label list that doesn't exist, a label list will get created automatically when the dataset is loaded.
  • Labels which are not directly under user control (such as object or paint labels) are supported and an attempt will be made to match them by name upon load.


Key Features for Tables and Attributes

  • Tables and attributes can be selectively loaded from a data set file, for example just a specific attribute or table can be read. This facilitates taking multiple passes at a file with a modeller defined "version" or "type" to load it correctly.
  • If column names are used, a loaded table will match up the columns correctly when they have been moved or rearranged or other columns added.
  • In cases where columns were not named, there are rules on how the data will be loaded.
  • Using names enables loading only specified columns, ignoring other columns in the same table.
  • A table can be reconstructed entirely from a dataset, restoring all rows and columns (with units, labels and clear value set).
  • Free text is supported for table columns formatted for it.
  • Loaded attributes and columns are automatically re-formatted to the format they had when the dataset was saved. They also get their title and tuple name set.
  • Attribute and column Clear Values are also loaded.

Using DataSets

Datasets Version 2 does not contain a user interface as its designed to be accessible to the model itself at runtime. Two new Routine Operations LoadDataset2 and SaveDataset2 are used. These both require parameters as follows:

  • a Table reference (Dataset Definition Table or DDT)
  • a string reference/label which will be used as a filename/filepath
  • a label which will be used as an encryption passphrase.

The data to include in a dataset is specified in the DDT using labels from the _Data Objects labellsit. This enables data objects to be named and referred to within the model. The process of creating a data object label depends on the type of data:

Label List Click the "File" button in the editor and select "Add To Data Object List"
Sub Label List Select the "Add To Data Object List" option in the sub label list edit popup menu
Portal Attribute Click the "Attribute" button at the bottom of the attribute edit dialog (the DataSet button is for Version 1 datasets)
Table Select the "Add To Data Object List" option in the Advanced menu bar option

Once the data objects have been assigned labels, the next step is to create the table which defines the dataset’s members, the Dataset Definition Table. It currently requires 3 columns with the following labels. They can be in any order and other columns will be ignored.

_DataSetResourceIndex This column contains numbers which identify a data resource in the dataset file. You should initially set this value to 0 and it will be automatically allocated an index when the dataset is saved for the first time. It must be unique for every row in the table. Take care changing the value as it will impact any previously saved data sets.
_DataSetComponent This column should refer to a data object and should be formatted using the "_Data Objects" label list. It associates which object in the model will be saved or have data loaded into it. A modeller can change this label to cause data to load into a different location in the model from which it was saved.
_Result This column should be formatted to the "_dataset errors" label list. It is set by the dataset loader to indicate any errors specific to each row.


A key role of this table is to map between data objects (in the model) and dataset file resources (in the saved file). This mapping is what gives much flexibility as data can be directed into arbitrary objects in the model, as long as they have data object labels.

The resource index column will be allocated values the first time a dataset is saved and these values will not change once they are non zero. Newly added rows to the dataset definition table should be given a resource index value of 0 and then the data set saved; they will then get allocated a value.

As a developer, it is your responsibility to keep a record of the datasaet definition tables used to save different version datasets as your applications are released and then further developed. Without the resource index/data object associations, the dataset file will not be of any use.

This is not onerous; after changes to a model's dataset structure, save a dataset and keep a copy of the resulting DDT. You’ll need it for loading that dataset file.

Planimate® does not manage versioning of datasets. Its design means that unexpected data in a dataset is ignored and missing data will be tagged as an "Invalid Resource" for that DDT row, without stopping the loading of other data. It is strongly recommended to include a "Version" portal attribute right from the start; it will be useful in future versions if you need to process data differently depending on the version of the dataset.

Do not expect Planimate® to allocate resource indicies in the order of the rows in the table. Also do not attempt to "re-use" resource indicies for different purposes. It may create conflicts when loading an older dataset file. The automatic resource index allocatur will always allocate beyond the highest currently used index in the table, to avoid such conflicts.

Label List Handling

As attributes and tables columns can be formatted for a label list, things can get complex when loading data which depends on label lists.

A dataset contains the contents of all label lists that are referenced by attributes and columns in the data within, even if the lists have not been assigned Data Object labels. This enables the data to be merged into a model where the labels have changed their values.

If a label list is explicitly included in a DDT during a save and it is also included in the DDT used to load the dataset, then the model’s contents will be replaced with whatever contents were in the label list when the dataset was saved. No updating/reindexing of existing tables in the model will be made in this case. Hence, do not have any other data depending on a label list if that list is saved in a dataset with the intention of replacing it on load (apart from the data in the dataset itself).

In all other cases a merge will occur when the dataset is loaded, for example if a label list is included in the DDT during save but not during load. Data in the dataset will be updated to reflect any changes in label indicies. Any missing labels will be automatically added to label lists. This is important to consider if you ever change or remove a label and then load an old version dataset. Making the old label an alias of another label is useful.

For example if a typo in a label is corrected in a new version of an application, make the typo an alias of the corrected label, so old datasets which contain the typo will correctly load.

In the case where data uses a label list which is not under user control (eg: _Model Objects) Planimate® will save enough information to attempt to locate the label upon load.

If a dataset uses a label list that no longer exists in a model when it is loaded, a new label list will be created. Anticipate this if you plan to delete a label list in a model.

Label mappings will not be properly handled if a data object (eg: a sub label list) has a different parent label list at the time the dataset was saved and when it is loaded unless that parent label list was explicitly listed in the DDT during save and the new parent list is set as the Data Object for that list in the DDT during loading of the dataset.

Tables Handling

Tables are saved with enough detail to reconstruct them (data wise) upon load or otherwise extract/merge data into new column structures.

The following breaks down the loading rules for tables.

  1. CASE 1: No destination table
    If a table in a dataset has no data object row in the DDT, then it gets skipped.
  2. CASE 2: Totally empty target table
    If a target table is found AND has no columns, then the table will be recontructed with the rows, columns, column titles, labels and column formats as saved.
    Note that datasets do not save alignment, colouring, width, or font for columns or individual cell formats; these are UI aspects, not data. A developer can create UI loading and saving mechanisms on top of this framework.
  3. CASE 3: Destination table already contains columns
    If the destination table contains columns, then the following rules are used
    1. CASE 3.1: No named columns in the dataset's table
      If there are no named columns in the dataset, then the columns will load into the target table beginning at column 1 in the order they appear. This is regardless of format or whether the columns in the target table have been subsequently named. The target table’s columns will have their format overridden with the dataset column's format but any column names will be retained.
    2. CASE 3.2: At least one named column in the dataset table
      If there is at least one named column in the dataset, then a first pass is made. Named columns in the dataset are matched and loaded into corresponding columns in the target table (with the same name). If there isn't a column with the same name, that column is skipped. This allows for rearrangement and discarding of named columns in the dataset. Any unnamed columns in the dataset are then handled as follows:
      1. CASE 3.2.1: Named column position match
        If the positions of all the named columns in the dataset were the same as when the dataset was saved, then the remaining unnamed columns in the dataset are loaded into columns matched by position as well - even if the corresponding target table columns now have names associated with them. If the target table does not have enough columns, then the outstanding columns in the dataset are not loaded.
      2. CASE 3.2.2: Named column positions did not match
        If any of the named columns in the dataset did not match at the same position as when the dataset was saved (ie: the columns have been rearranged) then the unnamed columns are loaded into consecutively available unnamed columns only. In this case an unnamed column does not load into a named column.

following are examples to help clarify the rules.

Notation: A,B,C ... are named columns and U1,U2,U3 ... are unnamed columns in the dataset file's table (when it was saved). V1,V2,V3 ... are unnamed columns in the target table


EXAMPLE 1: Appended some new columns (E and V7) to the target table

Loaded dataset:

A B U1 U2 C D U3 U4 U5 U6

Target table:

A B V1 V2 C D V3 V4 V5 V6 E V7

This results in all the columns from the data set. New columns E and V7 are set to their clear value for every row read.

EXAMPLE 2: Rearrangement of named columns

Loaded dataset:

A B U1 U2 C D U3 U4 U5 U6

Target table:

D C E A B

Columns A,B,C and D in the data set are loaded and rearranged, new column E is set to its clear value.

EXAMPLE 3: Newly labelled columns (F,G,H) in otherwise matching target table

(rule 3.2.1 above)

Loaded dataset: Loaded dataset:

A B U1 U2 C D U3 U4 U5 U6

Target table:

A B F G C D V1 H

Columns A,B,C and D in the data set all match and are in the same position as the dataset. This then allows U1 to load to F, U2 to load to G, U3 to load to V1 and U4 to load to H.

EXAMPLE 4: Rearrangement of named and unnamed columns

(rule 3.2.2 above)

Loaded dataset:

A B U1 U2 C D U3 U4 U5 U6

Target table:

Target table:

A B C V1 V2 V3 V4 V5 V6 V7 V8

Columns A, B and C match and are read. Since all the named columns did not match position (D is missing, C has moved), the unnamed columns are transferred only to other unnamed columns. So U1 .. U6 read into V1..V6. V7 and V8 are set to their clear value.


EXAMPLE 5: Rearrangement of named and unnamed columns (2)

(rule 3.2.2 above)

Loaded dataset:

A B U1 U2 C D U3 U4 U5 U6

Target table:

V1 V2 V3 E C B A V4 V5

Columns A, B and C match and are read. Since all the named columns did not match position, the unnamed columns are transferred only to other unnamed columns. U1..U5 load to V1..V5 respectively. E is set to its clear value.

Its recognised that the rules are a little complex. This system is attempting to allow for a number of different styles of use of tables. Here are some summary tips:

  1. If you label all data set table columns, then the loader will load them into matching named columns. Missing columns are ignored and new columns in the target table are filled with their clear value.
  2. If you are reading a table of uncertain columns and you want to read it all, then remove all columns in the target table before loading the table. It will then get sized and allocated as per the table in the dataset
  3. Tables saved with all unnamed columns always load into the first columns of the target table, even if the columns have been subsequently named
  4. Tables saved with a mixture of named and unnamed columns will use name matching for the named columns but special attention needs to be given to where the unnamed columns will go.