Executing

Now that the model is trained, we don’t want to re-train it every time we want to classify or regress data. Because we saved the trained model to the document library, we can retrieve it and execute it against a dataset using an executor model.

The model takes the following parameters:

training_model_name: the name of the model created in the training section of the tutorial
target: the target column to fill the prediction, or leave blank for default of result.
data: tabular data that contains every column used as a feature in the trained model, with as many rows as should be processed.

Examples are provided below for retrieving the trained model data from either the Akumen Document Manager or a third party cloud storage service.

Akumen Document Manager

To create an ML execution model, you can do the following:

Go to the App Manager, and select Create Application -> Python Model, named ML Executor - Breast Cancer.
Click the Git Clone button on the toolbar, and enter the git url: https://gitlab.com/optika-solutions/apps/auto-sklearn-executor-document.git. You can leave the username and password blank, and the branch on master. Click ok.
Go to the research grid and enter the following:
1. training_model_name: ML Trainer - Breast Cancer
2. target: leave blank.
3. data: see below.

As a sample, you can use the following for data (Save the contents as a CSV and upload to the data spreadsheet):

radius_mean,texture_mean,perimeter_mean,area_mean,smoothness_mean,compactness_mean,concavity_mean,concave points_mean,symmetry_mean,fractal_dimension_mean,radius_se,texture_se,perimeter_se,area_se,smoothness_se,compactness_se,concavity_se,concave points_se,symmetry_se,fractal_dimension_se,radius_worst,texture_worst,perimeter_worst,area_worst,smoothness_worst,compactness_worst,concavity_worst,concave points_worst,symmetry_worst,fractal_dimension_worst
17.99,10.38,122.8,1001,.1184,.2776,.3001,.1471,.2419,.07871,1.095,.9053,8.589,153.4,.006399,.04904,.05373,.01587,.03003,.006193,25.38,17.33,184.6,2019,.1622,.6656,.7119,.2654,.4601,.1189

Execute the scenario. Once completed, go to the data tab and find the result column.

Third Party Cloud Storage

Alternatively, if an Amazon S3 bucket was used for the ML Training model, the steps below can be used to retrieve the data:

The model takes the following parameters:

training_model_name: the name of the model created in the training section of the tutorial
joblib_location: see below.
target: the target column to fill the prediction, or leave blank for default of result.
data: tabular data that contains every column used as a feature in the trained model, with as many rows as should be processed.

joblib_location JSON:

{
  "provider": "s3",
  "bucket": "model-bucket",
  "key": "xxx",
  "secret": "xxx",
  "region": "ap-southeast-2"
}

To create an ML execution model, you can do the following:

Go to the App Manager, and select Create Application -> Python Model, named ML Executor - Breast Cancer.
Click the Git Clone button on the toolbar, and enter the git url: https://gitlab.com/optika-solutions/apps/auto-sklearn-executor.git. You can leave the username and password blank, and the branch on master. Click ok.
Go to the research grid and enter the following:
1. training_model_name: ML Trainer - Breast Cancer
2. joblib_location: see above.
3. target: leave blank.
4. data: see below.

As a sample, you can use the following for data (Save the contents as a CSV and upload to the data spreadsheet):

radius_mean,texture_mean,perimeter_mean,area_mean,smoothness_mean,compactness_mean,concavity_mean,concave points_mean,symmetry_mean,fractal_dimension_mean,radius_se,texture_se,perimeter_se,area_se,smoothness_se,compactness_se,concavity_se,concave points_se,symmetry_se,fractal_dimension_se,radius_worst,texture_worst,perimeter_worst,area_worst,smoothness_worst,compactness_worst,concavity_worst,concave points_worst,symmetry_worst,fractal_dimension_worst
17.99,10.38,122.8,1001,.1184,.2776,.3001,.1471,.2419,.07871,1.095,.9053,8.589,153.4,.006399,.04904,.05373,.01587,.03003,.006193,25.38,17.33,184.6,2019,.1622,.6656,.7119,.2654,.4601,.1189

Execute the scenario. Once completed, go to the data tab and find the result column.