Instance
Tasks such as SQLTask are the basic computing units in MaxCompute. When executed, a Task is instantiated as a MaxCompute instance.
Basic operations
可以调用 list_instances() 来获取项目空间下的所有 instance,exist_instance() 能判断是否存在某instance,get_instance() 能获取实例。
>>> for instance in o.list_instances():
>>> print(instance.id)
>>> if o.exist_instance('<my_instance_id>'):
>>> print("Instance <my_instance_id> exists!")
You can call stop_instance() on an odps object to stop an instance, or call the stop() method on an Instance object.
>>> # Method 1: use stop_instance to stop an instance
>>> o.exist_instance('<my_instance_id>')
>>> # Method 2: use stop method of instance object to stop an instance
>>> instance = o.get_instance('<my_instance_id>')
>>> instance.stop()
Retrieve LogView address
For a SQL task, you can call the get_logview_address() method to retrieve the LogView address.
>>> # from an existing instance object
>>> instance = o.run_sql('desc pyodps_iris')
>>> print(instance.get_logview_address())
>>> # from an instance id
>>> instance = o.get_instance('2016042605520945g9k5pvyi2')
>>> print(instance.get_logview_address())
For an XFlow task, you need to enumerate its subtasks and retrieve their LogView as follows. More details can be seen at XFlow and models.
>>> instance = o.run_xflow('AppendID', 'algo_public',
{'inputTableName': 'input_table', 'outputTableName': 'output_table'})
>>> for sub_inst_name, sub_inst in o.get_xflow_sub_instances(instance).items():
>>> print('%s: %s' % (sub_inst_name, sub_inst.get_logview_address()))
Instance status
The status of an instance can be Running, Suspended or Terminated. You can retrieve the status of an instance by using the status attribute. The is_terminated() method returns whether the execution of the current instance has been completed. The is_successful() method returns whether the execution of the current instance has been successful. A False is returned if the instance is still running or if the execution has failed.
>>> instance = o.get_instance('2016042605520945g9k5pvyi2')
>>> instance.status
<Status.TERMINATED: 'Terminated'>
>>> from odps.models import Instance
>>> instance.status == Instance.Status.TERMINATED
True
>>> instance.status.value
'Terminated'
调用 wait_for_completion() 方法会阻塞直到instance执行完成。wait_for_success() 方法同样会阻塞,不同的是,如果最终任务执行失败,则会抛出相关异常。
Subtask operations
When an instance is running, it may contain one or several subtasks, which are called Tasks. Note that these Tasks are different from the computing units in MaxCompute.
You can call get_task_names() to retrieve all Tasks. This method returns the Task names in a list type.
>>> instance.get_task_names()
['SQLDropTableTask']
拿到Task的名称,我们就可以通过 get_task_result() 来获取这个Task的执行结果。get_task_results() 以字典的形式返回每个Task的执行结果
>>> instance = o.execute_sql('select * from pyodps_iris limit 1')
>>> instance.get_task_names()
['AnonymousSQLTask']
>>> instance.get_task_result('AnonymousSQLTask')
'"sepallength","sepalwidth","petallength","petalwidth","name"\n5.1,3.5,1.4,0.2,"Iris-setosa"\n'
>>> instance.get_task_results()
OrderedDict([('AnonymousSQLTask',
'"sepallength","sepalwidth","petallength","petalwidth","name"\n5.1,3.5,1.4,0.2,"Iris-setosa"\n')])
You can use get_task_progress() to retrieve the running progress of a Task.
>>> while not instance.is_terminated():
>>> for task_name in instance.get_task_names():
>>> print(instance.id, instance.get_task_progress(task_name).get_stage_progress_formatted_string())
>>> time.sleep(10)
20160519101349613gzbzufck2 2016-05-19 18:14:03 M1_Stg1_job0:0/1/1[100%]