XFlow and models

XFlow

XFlow is a MaxCompute algorithm package. You can use PyODPS to execute XFlow tasks. For the following PAI command:

PAI -name AlgoName -project algo_public -Dparam1=param_value1 -Dparam2=param_value2 ...

You can call run_xflow to execute it asynchronously:

>>> # call asynchronously
>>> inst = o.run_xflow('AlgoName', 'algo_public',
                       parameters={'param1': 'param_value1', 'param2': 'param_value2', ...})

Or call execute_xflow to execute it synchronously:

>>> # call synchronously
>>> inst = o.execute_xflow('AlgoName', 'algo_public',
                           parameters={'param1': 'param_value1', 'param2': 'param_value2', ...})

Parameters should not include quotation marks at the ends of argument values in PAI command if they are in the PAI command, nor the semicolon at the end of the command.

Both methods return an Instance object. An XFlow instance contains several sub-instances. You can obtain the LogView of each Instance by using the following method:

>>> for sub_inst_name, sub_inst in o.get_xflow_sub_instances(inst).items():
>>>     print('%s: %s' % (sub_inst_name, sub_inst.get_logview_address()))

Note that get_xflow_sub_instances returns the current sub-instances of an Instance object, which may change over time. Periodic queries may be required. To simplify this, you may use iter_xflow_sub_instances which returns a generator of sub-instances that will block current thread till a new instance starts or the main instance terminates. Also note that iter_xflow_sub_instances will not check if the instance succeeds by default. It is recommended to check whether the instance succeeds manually to avoid potential errors, or add a check=True parameter to let iter_xflow_sub_instances check automatically at exit.

>>> # asynchronous call is recommended here
>>> inst = o.run_xflow('AlgoName', 'algo_public',
                       parameters={'param1': 'param_value1', 'param2': 'param_value2', ...})
>>> # if no break in loop, will run till instance exits
>>> for sub_inst_name, sub_inst in o.iter_xflow_sub_instances(inst):
>>>     print('%s: %s' % (sub_inst_name, sub_inst.get_logview_address()))
>>> # check if instance succeeds in case of uncaught errors
>>> instance.wait_for_success()

Or

>>> # asynchronous call is recommended here
>>> inst = o.run_xflow('AlgoName', 'algo_public',
                       parameters={'param1': 'param_value1', 'param2': 'param_value2', ...})
>>> # add check=True to check if instance succeeds at exit.
>>> # check not available if break in loop
>>> for sub_inst_name, sub_inst in o.iter_xflow_sub_instances(inst):
>>>     print('%s: %s' % (sub_inst_name, sub_inst.get_logview_address()))

You can specify runtime parameters when calling run_xflow or execute_xflow. This process is similar to executing SQL statements:

>>> parameters = {'param1': 'param_value1', 'param2': 'param_value2', ...}
>>> o.execute_xflow('AlgoName', 'algo_public', parameters=parameters, hints={'odps.xxx.yyy': 10})

For instance, if you want to run your code in hosts with certain hardware, you can add a configuration in hints:

>>> hints={"settings": json.dumps({"odps.algo.hybrid.deploy.info": "xxxxx"})}

You can use options.ml.xflow_settings to configure the global settings:

>>> from odps import options
>>> options.ml.xflow_settings = {'odps.xxx.yyy': 10}
>>> parameters = {'param1': 'param_value1', 'param2': 'param_value2', ...}
>>> o.execute_xflow('AlgoName', 'algo_public', parameters=parameters)

Details about PAI commands can be found in chapters about different components linked in this page

Offline models

Offline models are outputs of XFlow classification or regression algorithms. You can directly call odps.run_xflow to create an offline model. For example:

>>> o.run_xflow('LogisticRegression', 'algo_public', dict(modelName='logistic_regression_model_name',
>>>             regularizedLevel='1', maxIter='100', regularizedType='l1', epsilon='0.000001', labelColName='y',
>>>             featureColNames='pdays,emp_var_rate', goodValue='1', inputTableName='bank_data'))

After creating the models, you can list the models under the current project as follows:

>>> models = o.list_offline_models(prefix='prefix')

You can also retrieve the models and read their PMML (if supported) by the model names:

>>> model = o.get_offline_model('logistic_regression_model_name')
>>> pmml = model.get_model()

You can copy a model using the following statement:

>>> model = o.get_offline_model('logistic_regression_model_name')
>>> # copy to current project
>>> new_model = model.copy('logistic_regression_model_name_new')
>>> # copy to another project
>>> new_model2 = model.copy('logistic_regression_model_name_new2', project='new_project')

You can delete a model using the following statement:

>>> o.delete_offline_model('logistic_regression_model_name')