XFlow and models


XFlow is a MaxCompute algorithm package. You can use PyODPS to execute XFlow tasks. For the following PAI command:

PAI -name AlgoName -project algo_public -Dparam1=param_value1 -Dparam2=param_value2 ...

You can call run_xflow to execute it asynchronously:

>>> # call asynchronously
>>> inst = o.run_xflow('AlgoName', 'algo_public',
                       parameters={'param1': 'param_value1', 'param2': 'param_value2', ...})

Or call execute_xflow to execute it synchronously:

>>> # call synchronously
>>> inst = o.execute_xflow('AlgoName', 'algo_public',
                           parameters={'param1': 'param_value1', 'param2': 'param_value2', ...})

Parameters should not include quotation marks at the ends of argument values in PAI command if they are in the PAI command, nor the semicolon at the end of the command.

Both methods return an Instance object. An XFlow instance contains several sub-instances. You can obtain the LogView of each Instance by using the following method:

>>> for sub_inst_name, sub_inst in o.get_xflow_sub_instances(inst).items():
>>>     print('%s: %s' % (sub_inst_name, sub_inst.get_logview_address()))

Note that get_xflow_sub_instances returns the current sub-instances of an Instance object, which may change over time. Periodic queries may be required. To simplify this, you may use iter_xflow_sub_instances which returns a generator of sub-instances that will block current thread till a new instance starts or the main instance terminates.

>>> # asynchronous call is recommended here
>>> inst = o.run_xflow('AlgoName', 'algo_public',
                       parameters={'param1': 'param_value1', 'param2': 'param_value2', ...})
>>> for sub_inst_name, sub_inst in o.iter_xflow_sub_instances(inst):  # will wait here
>>>     print('%s: %s' % (sub_inst_name, sub_inst.get_logview_address()))

You can specify runtime parameters when calling run_xflow or execute_xflow. This process is similar to executing SQL statements:

>>> parameters = {'param1': 'param_value1', 'param2': 'param_value2', ...}
>>> o.execute_xflow('AlgoName', 'algo_public', parameters=parameters, hints={'odps.xxx.yyy': 10})

如果需要任务运行到指定卡型的机器上,可以在 hints 中增加如下配置:

>>> hints={"settings": json.dumps({"odps.algo.hybrid.deploy.info": "xxxxx"})

You can use options.ml.xflow_settings to configure the global settings:

>>> from odps import options
>>> options.ml.xflow_settings = {'odps.xxx.yyy': 10}
>>> parameters = {'param1': 'param_value1', 'param2': 'param_value2', ...}
>>> o.execute_xflow('AlgoName', 'algo_public', parameters=parameters)

Details about PAI commands can be found in this page

Offline models

Offline models are outputs of XFlow classification or regression algorithms. You can directly call odps.run_xflow to create an offline model. For example:

>>> o.run_xflow('LogisticRegression', 'algo_public', dict(modelName='logistic_regression_model_name',
>>>                regularizedLevel='1', maxIter='100', regularizedType='l1', epsilon='0.000001', labelColName='y',
>>>                featureColNames='pdays,emp_var_rate', goodValue='1', inputTableName='bank_data'))

After creating the models, you can list the models under the current project as follows:

>>> models = o.list_offline_models(prefix='prefix')

You can also retrieve the models and read their PMML (if supported) by the model names:

>>> model = o.get_offline_model('logistic_regression_model_name')
>>> pmml = model.get_model()

You can copy a model using the following statement:

>>> model = o.get_offline_model('logistic_regression_model_name')
>>> # copy to current project
>>> new_model = model.copy('logistic_regression_model_name_new')
>>> # copy to another project
>>> new_model2 = model.copy('logistic_regression_model_name_new2', project='new_project')

You can delete a model using the following statement:

>>> o.delete_offline_model('logistic_regression_model_name')