MaxCompute entry

class odps.ODPS(access_id=None, secret_access_key=None, project=None, endpoint=None, schema=None, app_account=None, logview_host=None, tunnel_endpoint=None, region_name=None, quota_name=None, namespace=None, **kw)[源代码]

Main entrance to ODPS.

Convenient operations on ODPS objects are provided. Please refer to ODPS docs for more details.

Generally, basic operations such as list, get, exist, create, delete are provided for each ODPS object. Take the Table as an example.

To create an ODPS instance, access_id and access_key is required, and should ensure correctness, or SignatureNotMatch error will throw. If tunnel_endpoint is not set, the tunnel API will route service URL automatically.

参数:
  • access_id -- Aliyun Access ID

  • secret_access_key -- Aliyun Access Key

  • project -- default project name

  • endpoint -- Rest service URL

  • tunnel_endpoint -- Tunnel service URL

  • logview_host -- Logview host URL

  • app_account -- Application account, instance of odps.accounts.AppAccount used for dual authentication

Example:

>>> odps = ODPS('**your access id**', '**your access key**', 'default_project')
>>>
>>> for table in odps.list_tables():
>>>    # handle each table
>>>
>>> table = odps.get_table('dual')
>>>
>>> odps.exist_table('dual') is True
>>>
>>> odps.create_table('test_table', schema)
>>>
>>> odps.delete_table('test_table')
as_account(access_id=None, secret_access_key=None, account=None, app_account=None, namespace=None)[源代码]

Creates a new ODPS entry object with a new account information

参数:
  • access_id -- Aliyun Access ID of the new account

  • secret_access_key -- Aliyun Access Key of the new account

  • account -- new account object, if access_id and secret_access_key not supplied

  • app_account -- Application account, instance of odps.accounts.AppAccount used for dual authentication

  • namespace -- namespace of the new account to be created

返回:

copy_offline_model(name, new_name, project=None, new_project=None, async_=False)[源代码]

Copy current model into a new location.

参数:
  • new_name -- name of the new model

  • new_project -- new project name. if absent, original project name will be used

  • async -- if True, return the copy instance. otherwise return the newly-copied model

create_external_volume(name, project=None, schema=None, location=None, rolearn=None, auto_create_dir=False, accelerate=False, **kwargs)[源代码]

Create a file system volume based on external storage (for instance, OSS) in a project.

参数:
  • name (str) -- volume name

  • project (str) -- project name, if not provided, will be the default project

  • schema (str) -- schema name, if not provided, will be the default schema

  • location (str) -- location of OSS dir, should be oss://endpoint/bucket/path

  • rolearn (str) -- role arn of the account hosting the OSS bucket

  • auto_create_dir (bool) -- if True, will create directory automatically

  • accelerate (bool) -- if True, will accelerate transfer of large volumes

返回:

volume

返回类型:

odps.models.FSVolume

参见

odps.models.FSVolume

create_fs_volume(name, project=None, schema=None, **kwargs)[源代码]

Create a new-fashioned file system volume in a project.

参数:
  • name (str) -- volume name

  • project (str) -- project name, if not provided, will be the default project

  • schema (str) -- schema name, if not provided, will be the default schema

返回:

volume

返回类型:

odps.models.FSVolume

参见

odps.models.FSVolume

create_function(name, project=None, schema=None, **kwargs)[源代码]

Create a function by given name.

参数:
  • name -- function name

  • project -- project name, if not provided, will be the default project

  • schema (str) -- schema name, if not provided, will be the default schema

  • class_type (str) -- main class

  • resources (list) -- the resources that function needs to use

返回:

the created function

返回类型:

odps.models.Function

Example:

>>> res = odps.get_resource('test_func.py')
>>> func = odps.create_function('test_func', class_type='test_func.Test', resources=[res, ])
create_parted_volume(name, project=None, schema=None, **kwargs)[源代码]

Create an old-fashioned partitioned volume in a project.

参数:
  • name (str) -- volume name

  • project (str) -- project name, if not provided, will be the default project

  • schema (str) -- schema name, if not provided, will be the default schema

返回:

volume

返回类型:

odps.models.PartedVolume

参见

odps.models.PartedVolume

create_resource(name, type=None, project=None, schema=None, **kwargs)[源代码]

Create a resource by given name and given type.

Currently, the resource type can be file, jar, py, archive, table.

The file, jar, py, archive can be classified into file resource. To init the file resource, you have to provide another parameter which is a file-like object.

For the table resource, the table name, project name, and partition should be provided which the partition is optional.

参数:
  • name -- resource name

  • type -- resource type, now support file, jar, py, archive, table

  • project -- project name, if not provided, will be the default project

  • schema (str) -- schema name, if not provided, will be the default schema

  • kwargs -- optional arguments, I will illustrate this in the example below.

返回:

resource depends on the type, if file will be odps.models.FileResource and so on

返回类型:

odps.models.Resource's subclasses

Example:

>>> from odps.models.resource import *
>>>
>>> res = odps.create_resource('test_file_resource', 'file', fileobj=open('/to/path/file'))
>>> assert isinstance(res, FileResource)
>>> True
>>>
>>> res = odps.create_resource('test_py_resource.py', 'py', fileobj=StringIO('import this'))
>>> assert isinstance(res, PyResource)
>>> True
>>>
>>> res = odps.create_resource('test_table_resource', 'table', table_name='test_table', partition='pt=test')
>>> assert isinstance(res, TableResource)
>>> True
>>>
create_role(name, project=None)[源代码]

Create a role in a project

参数:
  • name -- name of the role to create

  • project -- project name, if not provided, will be the default project

返回:

role object created

create_schema(name, project=None, async_=False)[源代码]

Create a schema with given name

参数:
  • name -- schema name

  • project -- project name, if not provided, will be the default project

  • async -- if True, will run asynchronously

返回:

if async_ is True, return instance, otherwise return Schema object.

create_table(name, table_schema=None, project=None, schema=None, comment=None, if_not_exists=False, lifecycle=None, shard_num=None, hub_lifecycle=None, hints=None, transactional=False, primary_key=None, storage_tier=None, table_properties=None, async_=False, **kw)[源代码]

Create a table by given schema and other optional parameters.

参数:
  • name -- table name

  • table_schema -- table schema. Can be an instance of odps.models.TableSchema or a string like 'col1 string, col2 bigint'

  • project -- project name, if not provided, will be the default project

  • comment -- table comment

  • schema (str) -- schema name, if not provided, will be the default schema

  • if_not_exists (bool) -- will not create if this table already exists, default False

  • lifecycle (int) -- table's lifecycle. If absent, options.lifecycle will be used.

  • shard_num (int) -- table's shard num

  • hub_lifecycle (int) -- hub lifecycle

  • hints (dict) -- hints for the task

  • transactional (bool) -- make table transactional

  • primary_key (list) -- primary key of the table, only for transactional tables

  • storage_tier (str) -- storage tier of the table

  • table_properties (dict) -- properties for table creation

  • async (bool) -- if True, will run asynchronously

返回:

the created Table if not async else odps instance

返回类型:

odps.models.Table or odps.models.Instance

create_user(name, project=None)[源代码]

Add a user into the project

参数:
  • name -- user name

  • project -- project name, if not provided, will be the default project

返回:

user created

create_volume_directory(volume, path=None, project=None, schema=None)[源代码]

Create a directory under a file system volume.

参数:
  • volume (str) -- name of the volume.

  • path (str) -- path of the directory to be created.

  • project (str) -- project name, if not provided, will be the default project.

  • schema (str) -- schema name, if not provided, will be the default schema

返回:

directory object.

delete_function(name, project=None, schema=None)[源代码]

Delete a function by given name.

参数:
  • name -- function name

  • project -- project name, if not provided, will be the default project

  • schema (str) -- schema name, if not provided, will be the default schema

返回:

None

delete_materialized_view(name, project=None, if_exists=False, schema=None, hints=None, async_=False)[源代码]

Delete the materialized view with given name

参数:
  • name -- materialized view name

  • project -- project name, if not provided, will be the default project

  • if_exists (bool) -- will not raise errors when the materialized view does not exist, default False

  • schema (str) -- schema name, if not provided, will be the default schema

  • hints (dict) -- hints for the task

  • async (bool) -- if True, will run asynchronously

返回:

None if not async else odps instance

delete_offline_model(name, project=None, if_exists=False)[源代码]

Delete the offline model by given name.

参数:
  • name -- offline model's name

  • if_exists -- will not raise errors when the offline model does not exist, default False

  • project -- project name, if not provided, will be the default project

返回:

None

delete_resource(name, project=None, schema=None)[源代码]

Delete resource by given name.

参数:
  • name -- resource name

  • project -- project name, if not provided, will be the default project

  • schema (str) -- schema name, if not provided, will be the default schema

返回:

None

delete_role(name, project=None)[源代码]

Delete a role in a project

参数:
  • name -- name of the role to delete

  • project -- project name, if not provided, will be the default project

delete_schema(name, project=None, async_=False)[源代码]

Delete the schema with given name

参数:
  • name -- schema name

  • project -- project name, if not provided, will be the default project

  • async (bool) -- if True, will run asynchronously

delete_table(name, project=None, if_exists=False, schema=None, hints=None, async_=False)[源代码]

Delete the table with given name

参数:
  • name -- table name

  • project -- project name, if not provided, will be the default project

  • if_exists (bool) -- will not raise errors when the table does not exist, default False

  • schema (str) -- schema name, if not provided, will be the default schema

  • hints (dict) -- hints for the task

  • async (bool) -- if True, will run asynchronously

返回:

None if not async else odps instance

delete_user(name, project=None)[源代码]

Delete a user from the project

参数:
  • name -- user name

  • project -- project name, if not provided, will be the default project

delete_view(name, project=None, if_exists=False, schema=None, hints=None, async_=False)[源代码]

Delete the view with given name

参数:
  • name -- view name

  • project -- project name, if not provided, will be the default project

  • if_exists (bool) -- will not raise errors when the view does not exist, default False

  • schema (str) -- schema name, if not provided, will be the default schema

  • hints (dict) -- hints for the task

  • async (bool) -- if True, will run asynchronously

返回:

None if not async else odps instance

delete_volume(name, project=None, schema=None, auto_remove_dir=False, recursive=False)[源代码]

Delete volume by given name.

参数:
  • name -- volume name

  • project (str) -- project name, if not provided, will be the default project

  • schema (str) -- schema name, if not provided, will be the default schema

  • auto_remove_dir (bool) -- if True, directory created by external volume will be deleted

  • recursive (bool) -- if True, directory deletion should be recursive

返回:

None

delete_volume_file(volume, path=None, recursive=False, project=None, schema=None)[源代码]

Delete a file / directory object under a file system volume.

参数:
  • volume (str) -- name of the volume.

  • path (str) -- path of the directory to be created.

  • recursive (bool) -- if True, recursively delete files

  • project (str) -- project name, if not provided, will be the default project.

  • schema (str) -- schema name, if not provided, will be the default schema

返回:

directory object.

delete_volume_partition(volume, partition=None, project=None, schema=None)[源代码]

Delete partition in a volume by given name

参数:
  • volume (str) -- volume name

  • partition (str) -- partition name

  • project (str) -- project name, if not provided, will be the default project

  • schema (str) -- schema name, if not provided, will be the default schema

delete_xflow(name, project=None)[源代码]

Delete xflow by given name.

参数:
  • name -- xflow name

  • project -- project name, if not provided, will be the default project

返回:

None

execute_archive_table(table, partition=None, project=None, schema=None, hints=None, priority=None, running_cluster=None, quota_name=None, unique_identifier_id=None, create_callback=None)

Execute a task to archive tables and wait for termination.

参数:
  • table -- name of the table to archive

  • partition -- partition to archive

  • project -- project name, if not provided, will be the default project

  • hints -- settings for table archive task.

  • priority -- instance priority, 9 as default

  • unique_identifier_id (str) -- unique instance ID

返回:

instance

返回类型:

odps.models.Instance

execute_freeze_command(table, partition=None, command=None, project=None, schema=None, hints=None, priority=None, running_cluster=None, quota_name=None, unique_identifier_id=None, create_callback=None)

Execute a task to archive tables and wait for termination.

参数:
  • table -- name of the table to archive

  • partition -- partition to archive

  • command -- freeze command to execute, can be freeze or restore

  • project -- project name, if not provided, will be the default project

  • hints -- settings for table archive task.

  • priority -- instance priority, 9 as default

  • unique_identifier_id (str) -- unique instance ID

返回:

instance

返回类型:

odps.models.Instance

execute_merge_files(table, partition=None, project=None, schema=None, hints=None, priority=None, running_cluster=None, compact_type=None, force_mode=None, recent_hours=None, quota_name=None, unique_identifier_id=None, create_callback=None, **kwargs)

Execute a task to merge multiple files in tables and wait for termination.

参数:
  • table -- name of the table to optimize

  • partition -- partition to optimize

  • project -- project name, if not provided, will be the default project

  • schema (str) -- schema name, if not provided, will be the default schema

  • hints -- settings for merge task.

  • priority -- instance priority, 9 as default

  • running_cluster -- cluster to run this instance

  • compact_type -- compact option for transactional table, can be major or minor.

  • unique_identifier_id (str) -- unique instance ID

返回:

instance

返回类型:

odps.models.Instance

execute_security_query(query, project=None, schema=None, token=None, hints=None, output_json=True)[源代码]

Execute a security query to grant / revoke / query privileges and returns the result string or json value.

参数:
  • query (str) -- query text

  • project (str) -- project name, if not provided, will be the default project

  • schema (str) -- schema name, if not provided, will be the default schema

  • output_json (bool) -- parse json for the output

返回:

result string / json object

execute_sql(sql, project=None, priority=None, running_cluster=None, hints=None, quota_name=None, unique_identifier_id=None, **kwargs)[源代码]

Run a given SQL statement and block until the SQL executed successfully.

参数:
  • sql (str) -- SQL statement

  • project -- project name, if not provided, will be the default project

  • priority (int) -- instance priority, 9 as default

  • running_cluster (str) -- cluster to run this instance

  • hints (dict) -- settings for SQL, e.g. odps.mapred.map.split.size

  • quota_name (str) -- name of quota to use for SQL job

  • unique_identifier_id (str) -- unique instance ID

返回:

instance

返回类型:

odps.models.Instance

Example:

>>> instance = odps.execute_sql('select * from dual')
>>> with instance.open_reader() as reader:
>>>     for record in reader:  # iterate to handle result with schema
>>>         # handle each record
>>>
>>> instance = odps.execute_sql('desc dual')
>>> with instance.open_reader() as reader:
>>>     print(reader.raw)  # without schema, just get the raw result
execute_sql_cost(sql, project=None, hints=None, **kwargs)[源代码]
参数:
  • sql (str) -- SQL statement

  • project -- project name, if not provided, will be the default project

  • hints (dict) -- settings for SQL, e.g. odps.mapred.map.split.size

返回:

cost info in dict format

返回类型:

cost: dict

Example:

>>> sql_cost = odps.execute_sql_cost('select * from dual')
>>> sql_cost.udf_num
0
>>> sql_cost.complexity
1.0
>>> sql_cost.input_size
100
execute_sql_interactive(sql, hints=None, fallback=True, wait_fallback=True, offline_quota_name=None, use_mcqa_v2=False, **kwargs)

Run SQL query in interactive mode (a.k.a MaxCompute QueryAcceleration). If query is not supported or fails, and fallback is True, will fallback to offline mode automatically

参数:
  • sql -- the sql query.

  • hints -- settings for sql query.

  • fallback -- fallback query to non-interactive mode, True by default. Both boolean type and policy names separated by commas are acceptable.

  • wait_fallback (bool) -- wait fallback instance to finish, True by default.

返回:

instance.

execute_xflow(xflow_name, xflow_project=None, parameters=None, project=None, hints=None, priority=None)[源代码]

Run xflow by given name, xflow project, paremeters, block until xflow executed successfully.

参数:
  • xflow_name (str) -- XFlow name

  • xflow_project (str) -- the project XFlow deploys

  • parameters (dict) -- parameters

  • project -- project name, if not provided, will be the default project

  • hints (dict) -- execution hints

  • priority (int) -- instance priority, 9 as default

返回:

instance

返回类型:

odps.models.Instance

exist_function(name, project=None, schema=None)[源代码]

If the function with given name exists or not.

参数:
  • name (str) -- function name

  • project (str) -- project name, if not provided, will be the default project

  • schema (str) -- schema name, if not provided, will be the default schema

返回:

True if the function exists or False

返回类型:

bool

exist_instance(id_, project=None)[源代码]

If the instance with given id exists or not.

参数:
  • id -- instance id

  • project -- project name, if not provided, will be the default project

返回:

True if exists or False

返回类型:

bool

exist_offline_model(name, project=None)[源代码]

If the offline model with given name exists or not.

参数:
  • name -- offline model's name

  • project -- project name, if not provided, will be the default project

返回:

True if offline model exists else False

返回类型:

bool

exist_project(name)[源代码]

If project name which provided exists or not.

参数:

name -- project name

返回:

True if exists or False

返回类型:

bool

exist_quota(name)[源代码]

If quota name which provided exists or not.

参数:

name -- quota name

返回:

True if exists or False

返回类型:

bool

exist_resource(name, project=None, schema=None)[源代码]

If the resource with given name exists or not.

参数:
  • name -- resource name

  • schema (str) -- schema name, if not provided, will be the default schema

  • project -- project name, if not provided, will be the default project

返回:

True if exists or False

返回类型:

bool

exist_role(name, project=None)[源代码]

Check if a role exists in a project

参数:
  • name -- name of the role

  • project -- project name, if not provided, will be the default project

exist_schema(name, project=None)[源代码]

If schema name which provided exists or not.

参数:
  • name -- schema name

  • project -- project name, if not provided, will be the default project

返回:

True if exists or False

返回类型:

bool

exist_table(name, project=None, schema=None)[源代码]

If the table with given name exists or not.

参数:
  • name -- table name

  • project -- project name, if not provided, will be the default project

  • schema (str) -- schema name, if not provided, will be the default schema

返回:

True if table exists or False

返回类型:

bool

exist_user(name, project=None)[源代码]

Check if a user exists in the project

参数:
  • name -- user name

  • project -- project name, if not provided, will be the default project

exist_volume(name, schema=None, project=None)[源代码]

If the volume with given name exists or not.

参数:
  • name (str) -- volume name

  • project (str) -- project name, if not provided, will be the default project

  • schema (str) -- schema name, if not provided, will be the default schema

返回:

True if exists or False

返回类型:

bool

exist_volume_partition(volume, partition=None, project=None, schema=None)[源代码]

If the volume with given name exists in a partition or not.

参数:
  • volume (str) -- volume name

  • partition (str) -- partition name

  • project (str) -- project name, if not provided, will be the default project

  • schema (str) -- schema name, if not provided, will be the default schema

exist_xflow(name, project=None)[源代码]

If the xflow with given name exists or not.

参数:
  • name -- xflow name

  • project -- project name, if not provided, will be the default project

返回:

True if exists or False

返回类型:

bool

get_function(name, project=None, schema=None)[源代码]

Get the function by given name

参数:
  • name -- function name

  • project (str) -- project name, if not provided, will be the default project

  • schema (str) -- schema name, if not provided, will be the default schema

返回:

the right function

Raise:

odps.errors.NoSuchObject if not exists

get_instance(id_, project=None, quota_name=None)[源代码]

Get instance by given instance id.

参数:
  • id -- instance id

  • project -- project name, if not provided, will be the default project

返回:

the right instance

返回类型:

odps.models.Instance

Raise:

odps.errors.NoSuchObject if not exists

get_logview_address(instance_id, hours=None, project=None, use_legacy=None)[源代码]

Get logview address by given instance id and hours.

参数:
  • instance_id -- instance id

  • hours

  • project -- project name, if not provided, will be the default project

返回:

logview address

返回类型:

str

get_logview_host()[源代码]

Get logview host address. :return: logview host address

get_offline_model(name, project=None)[源代码]

Get offline model by given name

参数:
  • name -- offline model name

  • project -- project name, if not provided, will be the default project

返回:

offline model

返回类型:

odps.models.ml.OfflineModel

Raise:

odps.errors.NoSuchObject if not exists

get_project(name=None, default_schema=None)[源代码]

Get project by given name.

参数:
  • name (str) -- project name, if not provided, will be the default project

  • default_schema (str) -- default schema name, if not provided, will be the schema specified in ODPS object

返回:

the right project

返回类型:

odps.models.Project

Raise:

odps.errors.NoSuchObject if not exists

get_project_policy(project=None)[源代码]

Get policy of a project

参数:

project -- project name, if not provided, will be the default project

返回:

JSON object

get_quota(name=None, tenant_id=None)[源代码]

Get quota by name

参数:

name (str) -- quota name, if not provided, will be the name in ODPS entry

get_resource(name, project=None, schema=None)[源代码]

Get a resource by given name

参数:
  • name -- resource name

  • project -- project name, if not provided, will be the default project

  • schema (str) -- schema name, if not provided, will be the default schema

返回:

the right resource

返回类型:

odps.models.Resource

Raise:

odps.errors.NoSuchObject if not exists

get_role_policy(name, project=None)[源代码]

Get policy object of a role

参数:
  • name -- name of the role

  • project -- project name, if not provided, will be the default project

返回:

JSON object

get_schema(name=None, project=None)[源代码]

Get the schema by given name.

参数:
  • name -- schema name, if not provided, will be the default schema

  • project -- project name, if not provided, will be the default project

返回:

the Schema object

get_security_option(option_name, project=None)[源代码]

Get one security option of a project

参数:
  • option_name -- name of the security option. Please refer to ODPS options for more details.

  • project -- project name, if not provided, will be the default project

返回:

option value

get_security_options(project=None)[源代码]

Get all security options of a project

参数:

project -- project name, if not provided, will be the default project

返回:

SecurityConfiguration object

get_table(name, project=None, schema=None)[源代码]

Get table by given name.

参数:
  • name -- table name

  • project (str) -- project name, if not provided, will be the default project

  • schema (str) -- schema name, if not provided, will be the default schema

返回:

the right table

返回类型:

odps.models.Table

Raise:

odps.errors.NoSuchObject if not exists

get_volume(name, project=None, schema=None)[源代码]

Get volume by given name.

参数:
  • name (str) -- volume name

  • project (str) -- project name, if not provided, will be the default project

  • schema (str) -- schema name, if not provided, will be the default schema

返回:

volume object. Return type depends on the type of the volume.

返回类型:

odps.models.Volume

get_volume_file(volume, path=None, project=None, schema=None)[源代码]

Get a file under a partition of a parted volume, or a file / directory object under a file system volume.

参数:
  • volume (str) -- name of the volume.

  • path (str) -- path of the directory to be created.

  • project (str) -- project name, if not provided, will be the default project.

  • schema (str) -- schema name, if not provided, will be the default schema

返回:

directory object.

get_volume_partition(volume, partition=None, project=None, schema=None)[源代码]

Get partition in a parted volume by given name.

参数:
  • volume (str) -- volume name

  • partition (str) -- partition name

  • project (str) -- project name, if not provided, will be the default project

  • schema (str) -- schema name, if not provided, will be the default schema

返回:

partitions

返回类型:

odps.models.VolumePartition

get_xflow(name, project=None)[源代码]

Get xflow by given name

参数:
  • name -- xflow name

  • project -- project name, if not provided, will be the default project

返回:

xflow

返回类型:

odps.models.XFlow

Raise:

odps.errors.NoSuchObject if not exists

参见

odps.models.XFlow

get_xflow_results(instance, project=None)[源代码]

The result given the results of xflow

参数:
  • instance (odps.models.Instance) -- instance of xflow

  • project -- project name, if not provided, will be the default project

返回:

xflow result

返回类型:

dict

get_xflow_sub_instances(instance, project=None)[源代码]

The result iterates the sub instance of xflow

参数:
  • instance (odps.models.Instance) -- instance of xflow

  • project -- project name, if not provided, will be the default project

返回:

sub instances dictionary

iter_xflow_sub_instances(instance, interval=1, project=None, check=False)[源代码]

The result iterates the sub instance of xflow and will wait till instance finish

参数:
  • instance (odps.models.Instance) -- instance of xflow

  • interval -- time interval to check

  • project -- project name, if not provided, will be the default project

  • check (bool) -- check if the instance is successful

返回:

generator of sub-instances

list_functions(project=None, prefix=None, owner=None, schema=None)[源代码]

List all functions of a project.

参数:
  • project (str) -- project name, if not provided, will be the default project

  • prefix (str) -- the listed functions start with this prefix

  • owner (str) -- Aliyun account, the owner which listed tables belong to

  • schema (str) -- schema name, if not provided, will be the default schema

返回:

functions

返回类型:

generator

list_instance_queueing_infos(project=None, status=None, only_owner=None, quota_index=None)[源代码]

List instance queueing information.

参数:
  • project -- project name, if not provided, will be the default project

  • status -- including 'Running', 'Suspended', 'Terminated'

  • only_owner (bool) -- True will filter the instances created by current user

  • quota_index (str)

返回:

instance queueing infos

返回类型:

list

list_instances(project=None, start_time=None, end_time=None, status=None, only_owner=None, quota_index=None, **kw)[源代码]

List instances of a project by given optional conditions including start time, end time, status and if only the owner.

参数:
  • project -- project name, if not provided, will be the default project

  • start_time (datetime, int or float) -- the start time of filtered instances

  • end_time (datetime, int or float) -- the end time of filtered instances

  • status -- including 'Running', 'Suspended', 'Terminated'

  • only_owner (bool) -- True will filter the instances created by current user

  • quota_index (str)

返回:

instances

返回类型:

list

list_offline_models(project=None, prefix=None, owner=None)[源代码]

List offline models of project by optional filter conditions including prefix and owner.

参数:
  • project -- project name, if not provided, will be the default project

  • prefix -- prefix of offline model's name

  • owner -- Aliyun account

返回:

offline models

返回类型:

list

list_projects(owner=None, user=None, group=None, prefix=None, max_items=None, region_id=None, tenant_id=None)[源代码]

List projects.

参数:
  • owner -- Aliyun account, the owner which listed projects belong to

  • user -- name of the user who has access to listed projects

  • group -- name of the group listed projects belong to

  • prefix -- prefix of names of listed projects

  • max_items -- the maximal size of result set

返回:

projects in this endpoint.

返回类型:

generator

list_quotas(region_id=None)[源代码]

List quotas by region id

参数:

region_id (str) -- Region ID

返回:

quotas

list_resources(project=None, prefix=None, owner=None, schema=None)[源代码]

List all resources of a project.

参数:
  • project -- project name, if not provided, will be the default project

  • prefix (str) -- the listed resources start with this prefix

  • owner (str) -- Aliyun account, the owner which listed tables belong to

  • schema (str) -- schema name, if not provided, will be the default schema

返回:

resources

返回类型:

generator

list_role_users(name, project=None)[源代码]

List users who have the specified role.

参数:
  • name -- name of the role

  • project -- project name, if not provided, will be the default project

返回:

collection of User objects

list_roles(project=None)[源代码]

List all roles in a project

参数:

project -- project name, if not provided, will be the default project

返回:

collection of role objects

list_schemas(project=None, prefix=None, owner=None)[源代码]

List all schemas of a project.

参数:
  • project -- project name, if not provided, will be the default project

  • prefix (str) -- the listed schemas start with this prefix

  • owner (str) -- Aliyun account, the owner which listed tables belong to

返回:

schemas

list_tables(project=None, prefix=None, owner=None, schema=None, type=None, extended=False)[源代码]

List all tables of a project. If prefix is provided, the listed tables will all start with this prefix. If owner is provided, the listed tables will belong to such owner.

参数:
  • project (str) -- project name, if not provided, will be the default project

  • prefix (str) -- the listed tables start with this prefix

  • owner (str) -- Aliyun account, the owner which listed tables belong to

  • schema (str) -- schema name, if not provided, will be the default schema

  • type (str) -- type of the table

  • extended (bool) -- if True, load extended information for table

返回:

tables in this project, filtered by the optional prefix and owner.

返回类型:

generator

list_tables_model(prefix='', project=None)

List all TablesModel in the given project.

参数:
  • prefix -- model prefix

  • project (str) -- project name, if you want to look up in another project

返回类型:

list[str]

list_user_roles(name, project=None)[源代码]

List roles of the specified user

参数:
  • name -- user name

  • project -- project name, if not provided, will be the default project

返回:

collection of Role object

list_users(project=None)[源代码]

List users in the project

参数:

project -- project name, if not provided, will be the default project

返回:

collection of User objects

list_volume_files(volume, partition=None, project=None, schema=None)[源代码]

List files in a volume. In partitioned volumes, the function returns files under specified partition. In file system volumes, the function returns files under specified path.

参数:
  • volume (str) -- volume name

  • partition (str) -- partition name for partitioned volumes, and path for file system volumes.

  • project (str) -- project name, if not provided, will be the default project

  • schema (str) -- schema name, if not provided, will be the default schema

返回:

files

返回类型:

list

Example:

>>> # List files under a partition in a partitioned volume. Two calls are equivalent.
>>> odps.list_volume_files('parted_volume', 'partition_name')
>>> odps.list_volume_files('/parted_volume/partition_name')
>>> # List files under a path in a file system volume. Two calls are equivalent.
>>> odps.list_volume_files('fs_volume', 'dir1/dir2')
>>> odps.list_volume_files('/fs_volume/dir1/dir2')
list_volume_partitions(volume, project=None, schema=None)[源代码]

List partitions of a volume.

参数:
  • volume (str) -- volume name

  • project (str) -- project name, if not provided, will be the default project

  • schema (str) -- schema name, if not provided, will be the default schema

返回:

partitions

返回类型:

list

list_volumes(project=None, schema=None, owner=None)[源代码]

List volumes of a project.

参数:
  • project (str) -- project name, if not provided, will be the default project

  • schema (str) -- schema name, if not provided, will be the default schema

  • owner (str) -- Aliyun account

返回:

volumes

返回类型:

list

list_xflows(project=None, owner=None)[源代码]

List xflows of a project which can be filtered by the xflow owner.

参数:
  • project (str) -- project name, if not provided, will be the default project

  • owner (str) -- Aliyun account

返回:

xflows

返回类型:

list

move_volume_file(old_path, new_path, replication=None, project=None, schema=None)[源代码]

Move a file / directory object under a file system volume to another location in the same volume.

参数:
  • old_path (str) -- old path of the volume file.

  • new_path (str) -- target path of the moved file.

  • replication (int) -- file replication.

  • project (str) -- project name, if not provided, will be the default project.

  • schema (str) -- schema name, if not provided, will be the default schema

返回:

directory object.

open_resource(name, project=None, mode='r+', encoding='utf-8', schema=None, type='file', stream=False, comment=None, temp=False)[源代码]

Open a file resource as file-like object. This is an elegant and pythonic way to handle file resource.

The argument mode stands for the open mode for this file resource. It can be binary mode if the 'b' is inside. For instance, 'rb' means opening the resource as read binary mode while 'r+b' means opening the resource as read+write binary mode. This is most import when the file is actually binary such as tar or jpeg file, so be aware of opening this file as a correct mode.

Basically, the text mode can be 'r', 'w', 'a', 'r+', 'w+', 'a+' just like the builtin python open method.

  • r means read only

  • w means write only, the file will be truncated when opening

  • a means append only

  • r+ means read+write without constraint

  • w+ will truncate first then opening into read+write

  • a+ can read+write, however the written content can only be appended to the end

参数:
  • name (odps.models.FileResource or str) -- file resource or file resource name

  • project -- project name, if not provided, will be the default project

  • schema (str) -- schema name, if not provided, will be the default schema

  • mode (str) -- the mode of opening file, described as above

  • encoding (str) -- utf-8 as default

  • type (str) -- resource type, can be "file", "archive", "jar" or "py"

  • stream (bool) -- if True, use stream to upload, False by default

  • comment (str) -- comment of the resource

返回:

file-like object

Example:

>>> with odps.open_resource('test_resource', mode='r') as fp:
>>>     fp.read(1)  # read one unicode character
>>>     fp.write('test')  # wrong, cannot write under read mode
>>>
>>> with odps.open_resource('test_resource', mode='wb') as fp:
>>>     fp.readlines() # wrong, cannot read under write mode
>>>     fp.write('hello world') # write bytes
>>>
>>> with odps.open_resource('test_resource') as fp: # default as read-write mode
>>>     fp.seek(5)
>>>     fp.truncate()
>>>     fp.flush()
open_volume_reader(volume, partition=None, file_name=None, project=None, schema=None, start=None, length=None, **kwargs)[源代码]

Open a volume file for read. A file-like object will be returned which can be used to read contents from volume files.

参数:
  • volume (str) -- name of the volume

  • partition (str) -- name of the partition

  • file_name (str) -- name of the file

  • project (str) -- project name, if not provided, will be the default project

  • schema (str) -- schema name, if not provided, will be the default schema

  • start -- start position

  • length -- length limit

  • compress_option (CompressOption) -- the compression algorithm, level and strategy

Example:

>>> with odps.open_volume_reader('parted_volume', 'partition', 'file') as reader:
>>>     [print(line) for line in reader]
open_volume_writer(volume, partition=None, project=None, schema=None, **kwargs)[源代码]

Write data into a volume. This function behaves differently under different types of volumes.

Under partitioned volumes, all files under a partition should be uploaded in one submission. The method returns a writer object with whose open method you can open a file inside the volume and write to it, or you can use write method to write to specific files.

Under file system volumes, the method returns a file-like object.

参数:
  • volume (str) -- name of the volume

  • partition (str) -- partition name for partitioned volumes, and path for file system volumes.

  • project (str) -- project name, if not provided, will be the default project

  • schema (str) -- schema name, if not provided, will be the default schema

  • compress_option (odps.tunnel.CompressOption) -- the compression algorithm, level and strategy

Example:

>>> # Writing to partitioned volumes
>>> with odps.open_volume_writer('parted_volume', 'partition') as writer:
>>>     # both write methods are acceptable
>>>     writer.open('file1').write('some content')
>>>     writer.write('file2', 'some content')
>>> # Writing to file system volumes
>>> with odps.open_volume_writer('/fs_volume/dir1/file_name') as writer:
>>>     writer.write('some content')
read_table(name, limit=None, start=0, step=None, project=None, schema=None, partition=None, **kw)

Read table's records.

参数:
  • name (odps.models.table.Table or str) -- table or table name

  • limit -- the records' size, if None will read all records from the table

  • start -- the record where read starts with

  • step -- default as 1

  • project -- project name, if not provided, will be the default project

  • schema (str) -- schema name, if not provided, will be the default schema

  • partition -- the partition of this table to read

  • columns (list) -- the columns' names which are the parts of table's columns

  • compress (bool) -- if True, the data will be compressed during downloading

  • compress_option (odps.tunnel.CompressOption) -- the compression algorithm, level and strategy

  • endpoint -- tunnel service URL

  • reopen -- reading the table will reuse the session which opened last time, if set to True will open a new download session, default as False

返回:

records

返回类型:

generator

Example:

>>> for record in odps.read_table('test_table', 100):
>>>     # deal with such 100 records
>>> for record in odps.read_table('test_table', partition='pt=test', start=100, limit=100):
>>>     # read the `pt=test` partition, skip 100 records and read 100 records
run_archive_table(table, partition=None, project=None, schema=None, hints=None, priority=None, running_cluster=None, quota_name=None, unique_identifier_id=None, create_callback=None)

Start running a task to archive tables.

参数:
  • table -- name of the table to archive

  • partition -- partition to archive

  • project -- project name, if not provided, will be the default project

  • hints -- settings for table archive task.

  • priority -- instance priority, 9 as default

  • unique_identifier_id (str) -- unique instance ID

返回:

instance

返回类型:

odps.models.Instance

run_freeze_command(table, partition=None, command=None, project=None, schema=None, hints=None, priority=None, running_cluster=None, quota_name=None, unique_identifier_id=None, create_callback=None)

Start running a task to freeze or restore tables.

参数:
  • table -- name of the table to archive

  • partition -- partition to archive

  • command -- freeze command to execute, can be freeze or restore

  • project -- project name, if not provided, will be the default project

  • hints -- settings for table archive task.

  • priority -- instance priority, 9 as default

  • unique_identifier_id (str) -- unique instance ID

返回:

instance

返回类型:

odps.models.Instance

run_merge_files(table, partition=None, project=None, schema=None, hints=None, priority=None, running_cluster=None, compact_type=None, force_mode=None, recent_hours=None, quota_name=None, unique_identifier_id=None, create_callback=None, **kwargs)

Start running a task to merge multiple files in tables.

参数:
  • table -- name of the table to optimize

  • partition -- partition to optimize

  • project -- project name, if not provided, will be the default project

  • schema (str) -- schema name, if not provided, will be the default schema

  • hints -- settings for merge task.

  • priority -- instance priority, 9 as default

  • running_cluster -- cluster to run this instance

  • compact_type -- compact option for transactional table, can be major or minor.

  • unique_identifier_id (str) -- unique instance ID

返回:

instance

返回类型:

odps.models.Instance

run_security_query(query, project=None, schema=None, token=None, hints=None, output_json=True)[源代码]

Run a security query to grant / revoke / query privileges. If the query is install package or uninstall package, return a waitable AuthQueryInstance object, otherwise returns the result string or json value.

参数:
  • query (str) -- query text

  • project (str) -- project name, if not provided, will be the default project

  • schema (str) -- schema name, if not provided, will be the default schema

  • output_json (bool) -- parse json for the output

返回:

result string / json object

run_sql(sql, project=None, priority=None, running_cluster=None, hints=None, aliases=None, default_schema=None, quota_name=None, unique_identifier_id=None, **kwargs)[源代码]

Run a given SQL statement asynchronously

参数:
  • sql (str) -- SQL statement

  • project (str) -- project name, if not provided, will be the default project

  • priority (int) -- instance priority, 9 as default

  • running_cluster (str) -- cluster to run this instance

  • hints (dict) -- settings for SQL, e.g. odps.mapred.map.split.size

  • aliases (dict)

  • quota_name (str) -- name of quota to use for SQL job

  • unique_identifier_id (str) -- unique instance ID

返回:

instance

返回类型:

odps.models.Instance

run_sql_interactive(sql, hints=None, use_mcqa_v2=False, **kwargs)

Run SQL query in interactive mode (a.k.a MaxCompute QueryAcceleration). Won't fallback to offline mode automatically if query not supported or fails

参数:
  • sql -- the sql query.

  • hints -- settings for sql query.

返回:

instance.

run_xflow(xflow_name, xflow_project=None, parameters=None, project=None, hints=None, priority=None)[源代码]

Run xflow by given name, xflow project, paremeters asynchronously.

参数:
  • xflow_name (str) -- XFlow name

  • xflow_project (str) -- the project XFlow deploys

  • parameters (dict) -- parameters

  • project -- project name, if not provided, will be the default project

  • hints (dict) -- execution hints

  • priority (int) -- instance priority, 9 as default

返回:

instance

返回类型:

odps.models.Instance

property schema

Get or set default schema name of the ODPS object

set_project_policy(policy, project=None)[源代码]

Set policy of a project

参数:
  • policy -- name of policy.

  • project -- project name, if not provided, will be the default project

返回:

JSON object

set_role_policy(name, policy, project=None)[源代码]

Get policy object of project

参数:
  • name -- name of the role

  • policy -- policy string or JSON object

  • project -- project name, if not provided, will be the default project

set_security_option(option_name, value, project=None)[源代码]

Set a security option of a project

参数:
  • option_name -- name of the security option. Please refer to ODPS options for more details.

  • value -- value of security option to be set.

  • project -- project name, if not provided, will be the default project.

stop_instance(id_, project=None)[源代码]

Stop the running instance by given instance id.

参数:
  • id -- instance id

  • project -- project name, if not provided, will be the default project

返回:

None

stop_job(id_, project=None)

Stop the running instance by given instance id.

参数:
  • id -- instance id

  • project -- project name, if not provided, will be the default project

返回:

None

property tunnel_endpoint

Get or set tunnel endpoint of the ODPS object

write_sql_result_to_table(table_name, sql, partition=None, partition_cols=None, create_table=False, create_partition=False, append_missing_cols=False, overwrite=False, project=None, schema=None, lifecycle=None, type_mapping=None, table_schema_callback=None, table_kwargs=None, hints=None, running_cluster=None, unique_identifier_id=None, **kwargs)

Write SQL query results into a specified table and partition. If the target table does not exist, you may specify the argument create_table=True. Columns are inserted into the target table aligned by column names. Note that column order in the target table will NOT be changed.

参数:
  • table_name (str) -- The target table name

  • sql (str) -- The SQL query to execute

  • partition (str) -- Target partition in the format "part=value" or "part1=value1,part2=value2"

  • partition_cols (list) -- List of dynamic partition fields. If not provided, all partition fields of the target table are used.

  • create_table (bool) -- Whether to create the target table if it does not exist. False by default.

  • create_partition (bool) -- Whether to create partitions if they do not exist. False by default.

  • append_missing_cols (bool) -- Whether to append missing columns to the target table. False by default.

  • overwrite (bool) -- Whether to overwrite existing data. False by default.

  • project (str) -- project name, if not provided, will be the default project

  • schema (str) -- schema name, if not provided, will be the default schema

  • lifecycle (int) -- specify table lifecycle when creating tables

  • type_mapping (dict) -- specify type mapping for columns when creating tables, can be dicts like {"column": "bigint"}. If column does not exist in data, it will be added as an empty column.

  • table_schema_callback -- a function to accept table schema resolved from data and return a new schema for table to create. Only works when target table does not exist and create_table is True.

  • table_kwargs (dict) -- specify other kwargs for create_table()

  • hints (dict) -- specify hints for SQL statements, will be passed through to execute_sql method

  • running_cluster (dict) -- specify running cluster for SQL statements, will be passed through to execute_sql method

write_table(name, *block_data, **kw)

Write records or pandas DataFrame into given table.

参数:
  • name (models.table.Table or str) -- table or table name

  • block_data -- records / DataFrame, or block ids and records / DataFrame. If given records or DataFrame only, the block id will be 0 as default.

  • project (str) -- project name, if not provided, will be the default project

  • schema (str) -- schema name, if not provided, will be the default schema

  • partition -- the partition of this table to write into

  • partition_cols (list) -- columns representing dynamic partitions

  • append_missing_cols (bool) -- Whether to append missing columns to the target table. False by default.

  • overwrite (bool) -- if True, will overwrite existing data

  • create_table (bool) -- if true, the table will be created if not exist

  • table_kwargs (dict) -- specify other kwargs for create_table()

  • type_mapping (dict) -- specify type mapping for columns when creating tables, can be dicts like {"column": "bigint"}. If column does not exist in data, it will be added as an empty column.

  • table_schema_callback -- a function to accept table schema resolved from data and return a new schema for table to create. Only works when target table does not exist and create_table is True.

  • lifecycle (int) -- specify table lifecycle when creating tables

  • create_partition (bool) -- if true, the partition will be created if not exist

  • compress_option (odps.tunnel.CompressOption) -- the compression algorithm, level and strategy

  • endpoint (str) -- tunnel service URL

  • reopen (bool) -- writing the table will reuse the session which opened last time, if set to True will open a new upload session, default as False

返回:

None

Example:

Write records into a specified table.

>>> odps.write_table('test_table', data)

Write records into multiple blocks.

>>> odps.write_table('test_table', 0, records1, 1, records2)

Write into a given partition.

>>> odps.write_table('test_table', data, partition='pt=test')

Write a pandas DataFrame. Create the table if it does not exist.

>>> import pandas as pd
>>> df = pd.DataFrame([
>>>     [111, 'aaa', True],
>>>     [222, 'bbb', False],
>>>     [333, 'ccc', True],
>>>     [444, '中文', False]
>>> ], columns=['num_col', 'str_col', 'bool_col'])
>>> o.write_table('test_table', df, partition='pt=test', create_table=True, create_partition=True)

Passing more arguments when creating table.

>>> import pandas as pd
>>> df = pd.DataFrame([
>>>     [111, 'aaa', True],
>>>     [222, 'bbb', False],
>>>     [333, 'ccc', True],
>>>     [444, '中文', False]
>>> ], columns=['num_col', 'str_col', 'bool_col'])
>>> # this dict will be passed to `create_table` as kwargs.
>>> table_kwargs = {"transactional": True, "primary_key": "num_col"}
>>> o.write_table('test_table', df, partition='pt=test', create_table=True, create_partition=True,
>>>               table_kwargs=table_kwargs)

Write with dynamic partitioning.

>>> import pandas as pd
>>> df = pd.DataFrame([
>>>     [111, 'aaa', True, 'p1'],
>>>     [222, 'bbb', False, 'p1'],
>>>     [333, 'ccc', True, 'p2'],
>>>     [444, '中文', False, 'p2']
>>> ], columns=['num_col', 'str_col', 'bool_col', 'pt'])
>>> o.write_table('test_part_table', df, partition_cols=['pt'], create_partition=True)
Note:

write_table treats object type of Pandas data as strings as it is often hard to determine their types when creating a new table for your data. To make sure the column type meet your need, you can specify type_mapping argument to specify the column types, for instance, type_mapping={"col1": "array<struct<id:string>>"}.