MaxCompute entry
- class odps.ODPS(access_id=None, secret_access_key=None, project=None, endpoint=None, schema=None, app_account=None, logview_host=None, tunnel_endpoint=None, region_name=None, quota_name=None, namespace=None, catalog_endpoint=None, **kw)[source]
Main entrance to ODPS.
Convenient operations on ODPS objects are provided.
Generally, basic operations such as
list,get,exist,create,deleteare provided for each ODPS object. Take theTableas an example.To create an ODPS instance, access_id and access_key is required, and should ensure correctness, or
SignatureNotMatcherror will throw. If tunnel_endpoint is not set, the tunnel API will route service URL automatically.- Parameters:
access_id – Cloud Access ID
secret_access_key – Cloud Access Key
project – default project name
endpoint – Rest service URL
tunnel_endpoint – Tunnel service URL
logview_host – Logview host URL
app_account – Application account, instance of odps.accounts.AppAccount used for dual authentication
- Example:
>>> odps = ODPS('**your access id**', '**your access key**', 'default_project') >>> >>> for table in odps.list_tables(): >>> # handle each table >>> >>> table = odps.get_table('dual') >>> >>> odps.exist_table('dual') is True >>> >>> odps.create_table('test_table', schema) >>> >>> odps.delete_table('test_table')
- as_account(access_id=None, secret_access_key=None, account=None, app_account=None, namespace=None)[source]
Creates a new ODPS entry object with a new account information
- Parameters:
access_id – Cloud Access ID of the new account
secret_access_key – Cloud Access Key of the new account
account – new account object, if access_id and secret_access_key not supplied
app_account – Application account, instance of odps.accounts.AppAccount used for dual authentication
namespace – namespace of the new account to be created
- Returns:
- copy_offline_model(name, new_name, project=None, new_project=None, async_=False)[source]
Copy current model into a new location.
- Parameters:
new_name – name of the new model
new_project – new project name. if absent, original project name will be used
async – if True, return the copy instance. otherwise return the newly-copied model
- create_external_volume(name, project=None, schema=None, location=None, rolearn=None, auto_create_dir=False, accelerate=False, **kwargs)[source]
Create a file system volume based on external storage (for instance, OSS) in a project.
- Parameters:
name (str) – volume name
project (str) – project name, if not provided, will be the default project
schema (str) – schema name, if not provided, will be the default schema
location (str) – location of OSS dir, should be oss://endpoint/bucket/path
rolearn (str) – role arn of the account hosting the OSS bucket
auto_create_dir (bool) – if True, will create directory automatically
accelerate (bool) – if True, will accelerate transfer of large volumes
- Returns:
volume
- Return type:
odps.models.FSVolume
See also
odps.models.FSVolume
- create_fs_volume(name, project=None, schema=None, **kwargs)[source]
Create a new-fashioned file system volume in a project.
- Parameters:
name (str) – volume name
project (str) – project name, if not provided, will be the default project
schema (str) – schema name, if not provided, will be the default schema
- Returns:
volume
- Return type:
odps.models.FSVolume
See also
odps.models.FSVolume
- create_function(name, project=None, schema=None, **kwargs)[source]
Create a function by given name.
- Parameters:
name – function name
project – project name, if not provided, will be the default project
schema (str) – schema name, if not provided, will be the default schema
class_type (str) – main class
resources (list) – the resources that function needs to use
- Returns:
the created function
- Return type:
- Example:
>>> res = odps.get_resource('test_func.py') >>> func = odps.create_function('test_func', class_type='test_func.Test', resources=[res, ])
See also
- create_parted_volume(name, project=None, schema=None, **kwargs)[source]
Create an old-fashioned partitioned volume in a project.
- Parameters:
name (str) – volume name
project (str) – project name, if not provided, will be the default project
schema (str) – schema name, if not provided, will be the default schema
- Returns:
volume
- Return type:
odps.models.PartedVolume
See also
odps.models.PartedVolume
- create_resource(name, type=None, project=None, schema=None, **kwargs)[source]
Create a resource by given name and given type.
Currently, the resource type can be
file,jar,py,archive,table.The
file,jar,py,archivecan be classified into file resource. To init the file resource, you have to provide another parameter which is a file-like object.For the table resource, the table name, project name, and partition should be provided which the partition is optional.
- Parameters:
name – resource name
type – resource type, now support
file,jar,py,archive,tableproject – project name, if not provided, will be the default project
schema (str) – schema name, if not provided, will be the default schema
kwargs – optional arguments, I will illustrate this in the example below.
- Returns:
resource depends on the type, if
filewill beodps.models.FileResourceand so on- Return type:
odps.models.Resource’s subclasses- Example:
>>> from odps.models.resource import * >>> >>> res = odps.create_resource('test_file_resource', 'file', fileobj=open('/to/path/file')) >>> assert isinstance(res, FileResource) >>> True >>> >>> res = odps.create_resource('test_py_resource.py', 'py', fileobj=StringIO('import this')) >>> assert isinstance(res, PyResource) >>> True >>> >>> res = odps.create_resource('test_table_resource', 'table', table_name='test_table', partition='pt=test') >>> assert isinstance(res, TableResource) >>> True >>>
- create_role(name, project=None)[source]
Create a role in a project
- Parameters:
name – name of the role to create
project – project name, if not provided, will be the default project
- Returns:
role object created
- create_schema(name, project=None, async_=False)[source]
Create a schema with given name
- Parameters:
name – schema name
project – project name, if not provided, will be the default project
async – if True, will run asynchronously
- Returns:
if async_ is True, return instance, otherwise return Schema object.
- create_table(name, table_schema=None, project=None, schema=None, comment=None, if_not_exists=False, lifecycle=None, shard_num=None, hub_lifecycle=None, hints=None, transactional=False, primary_key=None, storage_tier=None, table_properties=None, async_=False, **kw)[source]
Create a table by given schema and other optional parameters.
- Parameters:
name – table name
table_schema – table schema. Can be an instance of
odps.models.TableSchemaor a string like ‘col1 string, col2 bigint’project – project name, if not provided, will be the default project
comment – table comment
schema (str) – schema name, if not provided, will be the default schema
if_not_exists (bool) – will not create if this table already exists, default False
lifecycle (int) – table’s lifecycle. If absent, options.lifecycle will be used.
shard_num (int) – table’s shard num
hub_lifecycle (int) – hub lifecycle
hints (dict) – hints for the task
transactional (bool) – make table transactional
primary_key (list) – primary key of the table, only for transactional tables
storage_tier (str) – storage tier of the table
table_properties (dict) – properties for table creation
async (bool) – if True, will run asynchronously
- Returns:
the created Table if not async else odps instance
- Return type:
See also
- create_user(name, project=None)[source]
Add a user into the project
- Parameters:
name – user name
project – project name, if not provided, will be the default project
- Returns:
user created
- create_volume_directory(volume, path=None, project=None, schema=None)[source]
Create a directory under a file system volume.
- Parameters:
volume (str) – name of the volume.
path (str) – path of the directory to be created.
project (str) – project name, if not provided, will be the default project.
schema (str) – schema name, if not provided, will be the default schema
- Returns:
directory object.
- delete_function(name, project=None, schema=None)[source]
Delete a function by given name.
- Parameters:
name – function name
project – project name, if not provided, will be the default project
schema (str) – schema name, if not provided, will be the default schema
- Returns:
None
- delete_materialized_view(name, project=None, if_exists=False, schema=None, hints=None, async_=False)[source]
Delete the materialized view with given name
- Parameters:
name – materialized view name
project – project name, if not provided, will be the default project
if_exists (bool) – will not raise errors when the materialized view does not exist, default False
schema (str) – schema name, if not provided, will be the default schema
hints (dict) – hints for the task
async (bool) – if True, will run asynchronously
- Returns:
None if not async else odps instance
- delete_offline_model(name, project=None, if_exists=False)[source]
Delete the offline model by given name.
- Parameters:
name – offline model’s name
if_exists – will not raise errors when the offline model does not exist, default False
project – project name, if not provided, will be the default project
- Returns:
None
- delete_resource(name, project=None, schema=None)[source]
Delete resource by given name.
- Parameters:
name – resource name
project – project name, if not provided, will be the default project
schema (str) – schema name, if not provided, will be the default schema
- Returns:
None
- delete_role(name, project=None)[source]
Delete a role in a project
- Parameters:
name – name of the role to delete
project – project name, if not provided, will be the default project
- delete_schema(name, project=None, async_=False)[source]
Delete the schema with given name
- Parameters:
name – schema name
project – project name, if not provided, will be the default project
async (bool) – if True, will run asynchronously
- delete_table(name, project=None, if_exists=False, schema=None, hints=None, async_=False)[source]
Delete the table with given name
- Parameters:
name – table name
project – project name, if not provided, will be the default project
if_exists (bool) – will not raise errors when the table does not exist, default False
schema (str) – schema name, if not provided, will be the default schema
hints (dict) – hints for the task
async (bool) – if True, will run asynchronously
- Returns:
None if not async else odps instance
- delete_user(name, project=None)[source]
Delete a user from the project
- Parameters:
name – user name
project – project name, if not provided, will be the default project
- delete_view(name, project=None, if_exists=False, schema=None, hints=None, async_=False)[source]
Delete the view with given name
- Parameters:
name – view name
project – project name, if not provided, will be the default project
if_exists (bool) – will not raise errors when the view does not exist, default False
schema (str) – schema name, if not provided, will be the default schema
hints (dict) – hints for the task
async (bool) – if True, will run asynchronously
- Returns:
None if not async else odps instance
- delete_volume(name, project=None, schema=None, auto_remove_dir=False, recursive=False)[source]
Delete volume by given name.
- Parameters:
name – volume name
project (str) – project name, if not provided, will be the default project
schema (str) – schema name, if not provided, will be the default schema
auto_remove_dir (bool) – if True, directory created by external volume will be deleted
recursive (bool) – if True, directory deletion should be recursive
- Returns:
None
- delete_volume_file(volume, path=None, recursive=False, project=None, schema=None)[source]
Delete a file / directory object under a file system volume.
- Parameters:
volume (str) – name of the volume.
path (str) – path of the directory to be created.
recursive (bool) – if True, recursively delete files
project (str) – project name, if not provided, will be the default project.
schema (str) – schema name, if not provided, will be the default schema
- Returns:
directory object.
- delete_volume_partition(volume, partition=None, project=None, schema=None)[source]
Delete partition in a volume by given name
- Parameters:
volume (str) – volume name
partition (str) – partition name
project (str) – project name, if not provided, will be the default project
schema (str) – schema name, if not provided, will be the default schema
- delete_xflow(name, project=None)[source]
Delete xflow by given name.
- Parameters:
name – xflow name
project – project name, if not provided, will be the default project
- Returns:
None
- execute_archive_table(table, partition=None, project=None, schema=None, hints=None, priority=None, running_cluster=None, quota_name=None, unique_identifier_id=None, create_callback=None)
Execute a task to archive tables and wait for termination.
- Parameters:
table – name of the table to archive
partition – partition to archive
project – project name, if not provided, will be the default project
hints – settings for table archive task.
priority – instance priority, 9 as default
unique_identifier_id (str) – unique instance ID
- Returns:
instance
- Return type:
- execute_freeze_command(table, partition=None, command=None, project=None, schema=None, hints=None, priority=None, running_cluster=None, quota_name=None, unique_identifier_id=None, create_callback=None)
Execute a task to archive tables and wait for termination.
- Parameters:
table – name of the table to archive
partition – partition to archive
command – freeze command to execute, can be freeze or restore
project – project name, if not provided, will be the default project
hints – settings for table archive task.
priority – instance priority, 9 as default
unique_identifier_id (str) – unique instance ID
- Returns:
instance
- Return type:
- execute_merge_files(table, partition=None, project=None, schema=None, hints=None, priority=None, running_cluster=None, compact_type=None, force_mode=None, recent_hours=None, quota_name=None, unique_identifier_id=None, create_callback=None, **kwargs)
Execute a task to merge multiple files in tables and wait for termination.
- Parameters:
table – name of the table to optimize
partition – partition to optimize
project – project name, if not provided, will be the default project
schema (str) – schema name, if not provided, will be the default schema
hints – settings for merge task.
priority – instance priority, 9 as default
running_cluster – cluster to run this instance
compact_type – compact option for transactional table, can be major or minor.
unique_identifier_id (str) – unique instance ID
- Returns:
instance
- Return type:
- execute_security_query(query, project=None, schema=None, token=None, hints=None, output_json=True)[source]
Execute a security query to grant / revoke / query privileges and returns the result string or json value.
- Parameters:
query (str) – query text
project (str) – project name, if not provided, will be the default project
schema (str) – schema name, if not provided, will be the default schema
output_json (bool) – parse json for the output
- Returns:
result string / json object
- execute_sql(sql, project=None, priority=None, running_cluster=None, hints=None, quota_name=None, unique_identifier_id=None, **kwargs)[source]
Run a given SQL statement and block until the SQL executed successfully.
- Parameters:
sql (str) – SQL statement
project – project name, if not provided, will be the default project
priority (int) – instance priority, 9 as default
running_cluster (str) – cluster to run this instance
hints (dict) – settings for SQL, e.g. odps.mapred.map.split.size
quota_name (str) – name of quota to use for SQL job
unique_identifier_id (str) – unique instance ID
- Returns:
instance
- Return type:
- Example:
>>> instance = odps.execute_sql('select * from dual') >>> with instance.open_reader() as reader: >>> for record in reader: # iterate to handle result with schema >>> # handle each record >>> >>> instance = odps.execute_sql('desc dual') >>> with instance.open_reader() as reader: >>> print(reader.raw) # without schema, just get the raw result
See also
- execute_sql_cost(sql, project=None, hints=None, **kwargs)[source]
- Parameters:
sql (str) – SQL statement
project – project name, if not provided, will be the default project
hints (dict) – settings for SQL, e.g. odps.mapred.map.split.size
- Returns:
cost info in dict format
- Return type:
cost: dict
- Example:
>>> sql_cost = odps.execute_sql_cost('select * from dual') >>> sql_cost.udf_num 0 >>> sql_cost.complexity 1.0 >>> sql_cost.input_size 100
- execute_sql_interactive(sql, hints=None, fallback=True, wait_fallback=True, offline_quota_name=None, use_mcqa_v2=False, **kwargs)
Run SQL query in interactive mode (a.k.a MaxCompute QueryAcceleration). If query is not supported or fails, and fallback is True, will fallback to offline mode automatically
- Parameters:
sql – the sql query.
hints – settings for sql query.
fallback – fallback query to non-interactive mode, True by default. Both boolean type and policy names separated by commas are acceptable.
wait_fallback (bool) – wait fallback instance to finish, True by default.
- Returns:
instance.
- execute_xflow(xflow_name, xflow_project=None, parameters=None, project=None, hints=None, priority=None)[source]
Run xflow by given name, xflow project, paremeters, block until xflow executed successfully.
- Parameters:
xflow_name (str) – XFlow name
xflow_project (str) – the project XFlow deploys
parameters (dict) – parameters
project – project name, if not provided, will be the default project
hints (dict) – execution hints
priority (int) – instance priority, 9 as default
- Returns:
instance
- Return type:
See also
- exist_function(name, project=None, schema=None)[source]
If the function with given name exists or not.
- Parameters:
name (str) – function name
project (str) – project name, if not provided, will be the default project
schema (str) – schema name, if not provided, will be the default schema
- Returns:
True if the function exists or False
- Return type:
bool
- exist_instance(id_, project=None)[source]
If the instance with given id exists or not.
- Parameters:
id – instance id
project – project name, if not provided, will be the default project
- Returns:
True if exists or False
- Return type:
bool
- exist_model(name, project=None, schema=None)[source]
If the model with given name exists or not.
- Parameters:
name – model’s name
project – project name, if not provided, will be the default project
schema (str) – schema name, if not provided, will be the default schema
- Returns:
True if model exists else False
- Return type:
bool
- exist_offline_model(name, project=None)[source]
If the offline model with given name exists or not.
- Parameters:
name – offline model’s name
project – project name, if not provided, will be the default project
- Returns:
True if offline model exists else False
- Return type:
bool
- exist_project(name)[source]
If project name which provided exists or not.
- Parameters:
name – project name
- Returns:
True if exists or False
- Return type:
bool
- exist_quota(nickname=None, tenant_id=None, region_id=None)[source]
If quota name which provided exists or not.
- Parameters:
name – quota name
- Returns:
True if exists or False
- Return type:
bool
- exist_resource(name, project=None, schema=None)[source]
If the resource with given name exists or not.
- Parameters:
name – resource name
schema (str) – schema name, if not provided, will be the default schema
project – project name, if not provided, will be the default project
- Returns:
True if exists or False
- Return type:
bool
- exist_role(name, project=None)[source]
Check if a role exists in a project
- Parameters:
name – name of the role
project – project name, if not provided, will be the default project
- exist_schema(name, project=None)[source]
If schema name which provided exists or not.
- Parameters:
name – schema name
project – project name, if not provided, will be the default project
- Returns:
True if exists or False
- Return type:
bool
- exist_table(name, project=None, schema=None)[source]
If the table with given name exists or not.
- Parameters:
name – table name
project – project name, if not provided, will be the default project
schema (str) – schema name, if not provided, will be the default schema
- Returns:
True if table exists or False
- Return type:
bool
- exist_user(name, project=None)[source]
Check if a user exists in the project
- Parameters:
name – user name
project – project name, if not provided, will be the default project
- exist_volume(name, schema=None, project=None)[source]
If the volume with given name exists or not.
- Parameters:
name (str) – volume name
project (str) – project name, if not provided, will be the default project
schema (str) – schema name, if not provided, will be the default schema
- Returns:
True if exists or False
- Return type:
bool
- exist_volume_partition(volume, partition=None, project=None, schema=None)[source]
If the volume with given name exists in a partition or not.
- Parameters:
volume (str) – volume name
partition (str) – partition name
project (str) – project name, if not provided, will be the default project
schema (str) – schema name, if not provided, will be the default schema
- exist_xflow(name, project=None)[source]
If the xflow with given name exists or not.
- Parameters:
name – xflow name
project – project name, if not provided, will be the default project
- Returns:
True if exists or False
- Return type:
bool
- get_function(name, project=None, schema=None)[source]
Get the function by given name
- Parameters:
name – function name
project (str) – project name, if not provided, will be the default project
schema (str) – schema name, if not provided, will be the default schema
- Returns:
the right function
Note
if the function does not exist, no errors will be raised unless
reloadmethod is called or some field of the object is accessed.See also
- get_instance(id_, project=None, quota_name=None)[source]
Get instance by given instance id.
- Parameters:
id – instance id
project – project name, if not provided, will be the default project
- Returns:
the right instance
- Return type:
Note
if the instance does not exist, no errors will be raised unless
reloadmethod is called or some field of the object is accessed.See also
- get_logview_address(instance_id, hours=None, project=None, use_legacy=None)[source]
Get logview address by given instance id and hours.
- Parameters:
instance_id – instance id
hours
project – project name, if not provided, will be the default project
- Returns:
logview address
- Return type:
str
- get_model(name, project=None, schema=None)[source]
Get model by given name
- Parameters:
name – model name
project – project name, if not provided, will be the default project
schema (str) – schema name, if not provided, will be the default schema
- Returns:
model
- Return type:
odps.models.ml.Model
Note
if the model does not exist, no errors will be raised unless
reloadmethod is called or some field of the object is accessed.
- get_offline_model(name, project=None)[source]
Get offline model by given name
- Parameters:
name – offline model name
project – project name, if not provided, will be the default project
- Returns:
offline model
- Return type:
Note
if the model does not exist, no errors will be raised unless
reloadmethod is called or some field of the object is accessed.
- get_project(name=None, default_schema=None)[source]
Get project by given name.
- Parameters:
name (str) – project name, if not provided, will be the default project
default_schema (str) – default schema name, if not provided, will be the schema specified in ODPS object
- Returns:
the right project
- Return type:
Note
if the project does not exist, no errors will be raised unless
reloadmethod is called or some field of the object is accessed.See also
- get_project_policy(project=None)[source]
Get policy of a project
- Parameters:
project – project name, if not provided, will be the default project
- Returns:
JSON object
- get_quota(name=None, tenant_id=None, region_id=None)[source]
Get quota by name
- Parameters:
name (str) – quota name, if not provided, will be the name in ODPS entry
- get_resource(name, project=None, schema=None)[source]
Get a resource by given name
- Parameters:
name – resource name
project – project name, if not provided, will be the default project
schema (str) – schema name, if not provided, will be the default schema
- Returns:
the right resource
- Return type:
Note
if the resource does not exist, no errors will be raised unless
reloadmethod is called or some field of the object is accessed.See also
- get_role_policy(name, project=None)[source]
Get policy object of a role
- Parameters:
name – name of the role
project – project name, if not provided, will be the default project
- Returns:
JSON object
- get_schema(name=None, project=None)[source]
Get the schema by given name.
- Parameters:
name – schema name, if not provided, will be the default schema
project – project name, if not provided, will be the default project
- Returns:
the Schema object
- get_security_option(option_name, project=None)[source]
Get one security option of a project
- Parameters:
option_name – name of the security option. Please refer to ODPS options for more details.
project – project name, if not provided, will be the default project
- Returns:
option value
- get_security_options(project=None)[source]
Get all security options of a project
- Parameters:
project – project name, if not provided, will be the default project
- Returns:
SecurityConfiguration object
- get_table(name, project=None, schema=None)[source]
Get table by given name.
- Parameters:
name – table name
project (str) – project name, if not provided, will be the default project
schema (str) – schema name, if not provided, will be the default schema
- Returns:
the right table
- Return type:
Note
if the table does not exist, no errors will be raised unless
reloadmethod is called or some field of the object is accessed.See also
- get_volume(name, project=None, schema=None)[source]
Get volume by given name.
- Parameters:
name (str) – volume name
project (str) – project name, if not provided, will be the default project
schema (str) – schema name, if not provided, will be the default schema
- Returns:
volume object. Return type depends on the type of the volume.
- Return type:
odps.models.Volume
- get_volume_file(volume, path=None, project=None, schema=None)[source]
Get a file under a partition of a parted volume, or a file / directory object under a file system volume.
- Parameters:
volume (str) – name of the volume.
path (str) – path of the directory to be created.
project (str) – project name, if not provided, will be the default project.
schema (str) – schema name, if not provided, will be the default schema
- Returns:
directory object.
- get_volume_partition(volume, partition=None, project=None, schema=None)[source]
Get partition in a parted volume by given name.
- Parameters:
volume (str) – volume name
partition (str) – partition name
project (str) – project name, if not provided, will be the default project
schema (str) – schema name, if not provided, will be the default schema
- Returns:
partitions
- Return type:
odps.models.VolumePartition
- get_xflow(name, project=None)[source]
Get xflow by given name
- Parameters:
name – xflow name
project – project name, if not provided, will be the default project
- Returns:
xflow
- Return type:
odps.models.XFlow
Note
if the xflow does not exist, no errors will be raised unless
reloadmethod is called or some field of the object is accessed.See also
odps.models.XFlow
- get_xflow_results(instance, project=None)[source]
The result given the results of xflow
- Parameters:
instance (
odps.models.Instance) – instance of xflowproject – project name, if not provided, will be the default project
- Returns:
xflow result
- Return type:
dict
- get_xflow_sub_instances(instance, project=None)[source]
The result iterates the sub instance of xflow
- Parameters:
instance (
odps.models.Instance) – instance of xflowproject – project name, if not provided, will be the default project
- Returns:
sub instances dictionary
- iter_xflow_sub_instances(instance, interval=1, project=None, check=False)[source]
The result iterates the sub instance of xflow and will wait till instance finish
- Parameters:
instance (
odps.models.Instance) – instance of xflowinterval – time interval to check
project – project name, if not provided, will be the default project
check (bool) – check if the instance is successful
- Returns:
generator of sub-instances
- list_functions(project=None, prefix=None, owner=None, schema=None)[source]
List all functions of a project.
- Parameters:
project (str) – project name, if not provided, will be the default project
prefix (str) – the listed functions start with this prefix
owner (str) – Cloud account, the owner which listed tables belong to
schema (str) – schema name, if not provided, will be the default schema
- Returns:
functions
- Return type:
generator
- list_instance_queueing_infos(project=None, status=None, only_owner=None, quota_index=None)[source]
List instance queueing information.
- Parameters:
project – project name, if not provided, will be the default project
status – including ‘Running’, ‘Suspended’, ‘Terminated’
only_owner (bool) – True will filter the instances created by current user
quota_index (str)
- Returns:
instance queueing infos
- Return type:
list
- list_instances(project=None, start_time=None, end_time=None, status=None, only_owner=None, quota_index=None, **kw)[source]
List instances of a project by given optional conditions including start time, end time, status and if only the owner.
- Parameters:
project – project name, if not provided, will be the default project
start_time (datetime, int or float) – the start time of filtered instances
end_time (datetime, int or float) – the end time of filtered instances
status – including ‘Running’, ‘Suspended’, ‘Terminated’
only_owner (bool) – True will filter the instances created by current user
quota_index (str)
- Returns:
instances
- Return type:
list
- list_models(project=None, schema=None)[source]
List models of project by optional filter conditions including prefix and owner.
- Parameters:
project – project name, if not provided, will be the default project
schema (str) – schema name, if not provided, will be the default schema
- Returns:
models
- Return type:
list
- list_offline_models(project=None, prefix=None, owner=None)[source]
List offline models of project by optional filter conditions including prefix and owner.
- Parameters:
project – project name, if not provided, will be the default project
prefix – prefix of offline model’s name
owner – account ID
- Returns:
offline models
- Return type:
list
- list_projects(owner=None, user=None, group=None, prefix=None, max_items=None, region_id=None, tenant_id=None)[source]
List projects.
- Parameters:
owner – Cloud account, the owner which listed projects belong to
user – name of the user who has access to listed projects
group – name of the group listed projects belong to
prefix – prefix of names of listed projects
max_items – the maximal size of result set
- Returns:
projects in this endpoint.
- Return type:
generator
- list_quotas(region_id=None)[source]
List quotas by region id
- Parameters:
region_id (str) – Region ID
- Returns:
quotas
- list_resources(project=None, prefix=None, owner=None, schema=None)[source]
List all resources of a project.
- Parameters:
project – project name, if not provided, will be the default project
prefix (str) – the listed resources start with this prefix
owner (str) – Cloud account, the owner which listed tables belong to
schema (str) – schema name, if not provided, will be the default schema
- Returns:
resources
- Return type:
generator
- list_role_users(name, project=None)[source]
List users who have the specified role.
- Parameters:
name – name of the role
project – project name, if not provided, will be the default project
- Returns:
collection of User objects
- list_roles(project=None)[source]
List all roles in a project
- Parameters:
project – project name, if not provided, will be the default project
- Returns:
collection of role objects
- list_schemas(project=None, prefix=None, owner=None)[source]
List all schemas of a project.
- Parameters:
project – project name, if not provided, will be the default project
prefix (str) – the listed schemas start with this prefix
owner (str) – account ID, the owner which listed tables belong to
- Returns:
schemas
- list_tables(project=None, prefix=None, owner=None, schema=None, type=None, extended=False)[source]
List all tables of a project. If prefix is provided, the listed tables will all start with this prefix. If owner is provided, the listed tables will belong to such owner.
- Parameters:
project (str) – project name, if not provided, will be the default project
prefix (str) – the listed tables start with this prefix
owner (str) – Cloud account, the owner which listed tables belong to
schema (str) – schema name, if not provided, will be the default schema
type (str) – type of the table
extended (bool) – if True, load extended information for table
- Returns:
tables in this project, filtered by the optional prefix and owner.
- Return type:
generator
- list_tables_model(prefix='', project=None)
List all TablesModel in the given project.
- Parameters:
prefix – model prefix
project (str) – project name, if you want to look up in another project
- Return type:
list[str]
- list_user_roles(name, project=None)[source]
List roles of the specified user
- Parameters:
name – user name
project – project name, if not provided, will be the default project
- Returns:
collection of Role object
- list_users(project=None)[source]
List users in the project
- Parameters:
project – project name, if not provided, will be the default project
- Returns:
collection of User objects
- list_volume_files(volume, partition=None, project=None, schema=None)[source]
List files in a volume. In partitioned volumes, the function returns files under specified partition. In file system volumes, the function returns files under specified path.
- Parameters:
volume (str) – volume name
partition (str) – partition name for partitioned volumes, and path for file system volumes.
project (str) – project name, if not provided, will be the default project
schema (str) – schema name, if not provided, will be the default schema
- Returns:
files
- Return type:
list
- Example:
>>> # List files under a partition in a partitioned volume. Two calls are equivalent. >>> odps.list_volume_files('parted_volume', 'partition_name') >>> odps.list_volume_files('/parted_volume/partition_name') >>> # List files under a path in a file system volume. Two calls are equivalent. >>> odps.list_volume_files('fs_volume', 'dir1/dir2') >>> odps.list_volume_files('/fs_volume/dir1/dir2')
- list_volume_partitions(volume, project=None, schema=None)[source]
List partitions of a volume.
- Parameters:
volume (str) – volume name
project (str) – project name, if not provided, will be the default project
schema (str) – schema name, if not provided, will be the default schema
- Returns:
partitions
- Return type:
list
- list_volumes(project=None, schema=None, owner=None)[source]
List volumes of a project.
- Parameters:
project (str) – project name, if not provided, will be the default project
schema (str) – schema name, if not provided, will be the default schema
owner (str) – account ID
- Returns:
volumes
- Return type:
list
- list_xflows(project=None, owner=None)[source]
List xflows of a project which can be filtered by the xflow owner.
- Parameters:
project (str) – project name, if not provided, will be the default project
owner (str) – account ID
- Returns:
xflows
- Return type:
list
- move_volume_file(old_path, new_path, replication=None, project=None, schema=None)[source]
Move a file / directory object under a file system volume to another location in the same volume.
- Parameters:
old_path (str) – old path of the volume file.
new_path (str) – target path of the moved file.
replication (int) – file replication.
project (str) – project name, if not provided, will be the default project.
schema (str) – schema name, if not provided, will be the default schema
- Returns:
directory object.
- open_resource(name, project=None, mode='r+', encoding='utf-8', schema=None, type='file', stream=False, comment=None, temp=False)[source]
Open a file resource as file-like object. This is an elegant and pythonic way to handle file resource.
The argument
modestands for the open mode for this file resource. It can be binary mode if the ‘b’ is inside. For instance, ‘rb’ means opening the resource as read binary mode while ‘r+b’ means opening the resource as read+write binary mode. This is most import when the file is actually binary such as tar or jpeg file, so be aware of opening this file as a correct mode.Basically, the text mode can be ‘r’, ‘w’, ‘a’, ‘r+’, ‘w+’, ‘a+’ just like the builtin python
openmethod.rmeans read onlywmeans write only, the file will be truncated when openingameans append onlyr+means read+write without constraintw+will truncate first then opening into read+writea+can read+write, however the written content can only be appended to the end
- Parameters:
name (
odps.models.FileResourceor str) – file resource or file resource nameproject – project name, if not provided, will be the default project
schema (str) – schema name, if not provided, will be the default schema
mode (str) – the mode of opening file, described as above
encoding (str) – utf-8 as default
type (str) – resource type, can be “file”, “archive”, “jar” or “py”
stream (bool) – if True, use stream to upload, False by default
comment (str) – comment of the resource
- Returns:
file-like object
- Example:
>>> with odps.open_resource('test_resource', mode='r') as fp: >>> fp.read(1) # read one unicode character >>> fp.write('test') # wrong, cannot write under read mode >>> >>> with odps.open_resource('test_resource', mode='wb') as fp: >>> fp.readlines() # wrong, cannot read under write mode >>> fp.write('hello world') # write bytes >>> >>> with odps.open_resource('test_resource') as fp: # default as read-write mode >>> fp.seek(5) >>> fp.truncate() >>> fp.flush()
- open_volume_reader(volume, partition=None, file_name=None, project=None, schema=None, start=None, length=None, **kwargs)[source]
Open a volume file for read. A file-like object will be returned which can be used to read contents from volume files.
- Parameters:
volume (str) – name of the volume
partition (str) – name of the partition
file_name (str) – name of the file
project (str) – project name, if not provided, will be the default project
schema (str) – schema name, if not provided, will be the default schema
start – start position
length – length limit
compress_option (CompressOption) – the compression algorithm, level and strategy
- Example:
>>> with odps.open_volume_reader('parted_volume', 'partition', 'file') as reader: >>> [print(line) for line in reader]
- open_volume_writer(volume, partition=None, project=None, schema=None, **kwargs)[source]
Write data into a volume. This function behaves differently under different types of volumes.
Under partitioned volumes, all files under a partition should be uploaded in one submission. The method returns a writer object with whose open method you can open a file inside the volume and write to it, or you can use write method to write to specific files.
Under file system volumes, the method returns a file-like object.
- Parameters:
volume (str) – name of the volume
partition (str) – partition name for partitioned volumes, and path for file system volumes.
project (str) – project name, if not provided, will be the default project
schema (str) – schema name, if not provided, will be the default schema
compress_option (
odps.tunnel.CompressOption) – the compression algorithm, level and strategy
- Example:
>>> # Writing to partitioned volumes >>> with odps.open_volume_writer('parted_volume', 'partition') as writer: >>> # both write methods are acceptable >>> writer.open('file1').write('some content') >>> writer.write('file2', 'some content') >>> # Writing to file system volumes >>> with odps.open_volume_writer('/fs_volume/dir1/file_name') as writer: >>> writer.write('some content')
- read_table(name, limit=None, start=0, step=None, project=None, schema=None, partition=None, **kw)
Read table’s records.
- Parameters:
name (
odps.models.table.Tableor str) – table or table namelimit – the records’ size, if None will read all records from the table
start – the record where read starts with
step – default as 1
project – project name, if not provided, will be the default project
schema (str) – schema name, if not provided, will be the default schema
partition – the partition of this table to read
columns (list) – the columns’ names which are the parts of table’s columns
compress (bool) – if True, the data will be compressed during downloading
compress_option (
odps.tunnel.CompressOption) – the compression algorithm, level and strategyendpoint – tunnel service URL
reopen – reading the table will reuse the session which opened last time, if set to True will open a new download session, default as False
- Returns:
records
- Return type:
generator
- Example:
>>> for record in odps.read_table('test_table', 100): >>> # deal with such 100 records >>> for record in odps.read_table('test_table', partition='pt=test', start=100, limit=100): >>> # read the `pt=test` partition, skip 100 records and read 100 records
See also
- run_archive_table(table, partition=None, project=None, schema=None, hints=None, priority=None, running_cluster=None, quota_name=None, unique_identifier_id=None, create_callback=None)
Start running a task to archive tables.
- Parameters:
table – name of the table to archive
partition – partition to archive
project – project name, if not provided, will be the default project
hints – settings for table archive task.
priority – instance priority, 9 as default
unique_identifier_id (str) – unique instance ID
- Returns:
instance
- Return type:
- run_freeze_command(table, partition=None, command=None, project=None, schema=None, hints=None, priority=None, running_cluster=None, quota_name=None, unique_identifier_id=None, create_callback=None)
Start running a task to freeze or restore tables.
- Parameters:
table – name of the table to archive
partition – partition to archive
command – freeze command to execute, can be freeze or restore
project – project name, if not provided, will be the default project
hints – settings for table archive task.
priority – instance priority, 9 as default
unique_identifier_id (str) – unique instance ID
- Returns:
instance
- Return type:
- run_merge_files(table, partition=None, project=None, schema=None, hints=None, priority=None, running_cluster=None, compact_type=None, force_mode=None, recent_hours=None, quota_name=None, unique_identifier_id=None, create_callback=None, **kwargs)
Start running a task to merge multiple files in tables.
- Parameters:
table – name of the table to optimize
partition – partition to optimize
project – project name, if not provided, will be the default project
schema (str) – schema name, if not provided, will be the default schema
hints – settings for merge task.
priority – instance priority, 9 as default
running_cluster – cluster to run this instance
compact_type – compact option for transactional table, can be major or minor.
unique_identifier_id (str) – unique instance ID
- Returns:
instance
- Return type:
- run_security_query(query, project=None, schema=None, token=None, hints=None, output_json=True)[source]
Run a security query to grant / revoke / query privileges. If the query is install package or uninstall package, return a waitable AuthQueryInstance object, otherwise returns the result string or json value.
- Parameters:
query (str) – query text
project (str) – project name, if not provided, will be the default project
schema (str) – schema name, if not provided, will be the default schema
output_json (bool) – parse json for the output
- Returns:
result string / json object
- run_sql(sql, project=None, priority=None, running_cluster=None, hints=None, aliases=None, default_schema=None, quota_name=None, unique_identifier_id=None, **kwargs)[source]
Run a given SQL statement asynchronously
- Parameters:
sql (str) – SQL statement
project (str) – project name, if not provided, will be the default project
priority (int) – instance priority, 9 as default
running_cluster (str) – cluster to run this instance
hints (dict) – settings for SQL, e.g. odps.mapred.map.split.size
aliases (dict)
quota_name (str) – name of quota to use for SQL job
unique_identifier_id (str) – unique instance ID
- Returns:
instance
- Return type:
See also
- run_sql_interactive(sql, hints=None, use_mcqa_v2=False, **kwargs)
Run SQL query in interactive mode (a.k.a MaxCompute QueryAcceleration). Won’t fallback to offline mode automatically if query not supported or fails
- Parameters:
sql – the sql query.
hints – settings for sql query.
- Returns:
instance.
- run_xflow(xflow_name, xflow_project=None, parameters=None, project=None, hints=None, priority=None)[source]
Run xflow by given name, xflow project, paremeters asynchronously.
- Parameters:
xflow_name (str) – XFlow name
xflow_project (str) – the project XFlow deploys
parameters (dict) – parameters
project – project name, if not provided, will be the default project
hints (dict) – execution hints
priority (int) – instance priority, 9 as default
- Returns:
instance
- Return type:
See also
- property schema
Get or set default schema name of the ODPS object
- set_project_policy(policy, project=None)[source]
Set policy of a project
- Parameters:
policy – name of policy.
project – project name, if not provided, will be the default project
- Returns:
JSON object
- set_role_policy(name, policy, project=None)[source]
Get policy object of project
- Parameters:
name – name of the role
policy – policy string or JSON object
project – project name, if not provided, will be the default project
- set_security_option(option_name, value, project=None)[source]
Set a security option of a project
- Parameters:
option_name – name of the security option. Please refer to ODPS options for more details.
value – value of security option to be set.
project – project name, if not provided, will be the default project.
- stop_instance(id_, project=None)[source]
Stop the running instance by given instance id.
- Parameters:
id – instance id
project – project name, if not provided, will be the default project
- Returns:
None
- stop_job(id_, project=None)
Stop the running instance by given instance id.
- Parameters:
id – instance id
project – project name, if not provided, will be the default project
- Returns:
None
- property tunnel_endpoint
Get or set tunnel endpoint of the ODPS object
- write_sql_result_to_table(table_name, sql, partition=None, partition_cols=None, create_table=False, create_partition=False, append_missing_cols=False, overwrite=False, project=None, schema=None, lifecycle=None, type_mapping=None, table_schema_callback=None, table_kwargs=None, hints=None, running_cluster=None, unique_identifier_id=None, **kwargs)
Write SQL query results into a specified table and partition. If the target table does not exist, you may specify the argument create_table=True. Columns are inserted into the target table aligned by column names. Note that column order in the target table will NOT be changed.
- Parameters:
table_name (str) – The target table name
sql (str) – The SQL query to execute
partition (str) – Target partition in the format “part=value” or “part1=value1,part2=value2”
partition_cols (list) – List of dynamic partition fields. If not provided, all partition fields of the target table are used.
create_table (bool) – Whether to create the target table if it does not exist. False by default.
create_partition (bool) – Whether to create partitions if they do not exist. False by default.
append_missing_cols (bool) – Whether to append missing columns to the target table. False by default.
overwrite (bool) – Whether to overwrite existing data. False by default.
project (str) – project name, if not provided, will be the default project
schema (str) – schema name, if not provided, will be the default schema
lifecycle (int) – specify table lifecycle when creating tables
type_mapping (dict) – specify type mapping for columns when creating tables, can be dicts like
{"column": "bigint"}. If column does not exist in data, it will be added as an empty column.table_schema_callback – a function to accept table schema resolved from data and return a new schema for table to create. Only works when target table does not exist and
create_tableis True.table_kwargs (dict) – specify other kwargs for
create_table()hints (dict) – specify hints for SQL statements, will be passed through to execute_sql method
running_cluster (dict) – specify running cluster for SQL statements, will be passed through to execute_sql method
- write_table(name, *block_data, **kw)
Write records or pandas DataFrame into given table.
- Parameters:
name (
models.table.Tableor str) – table or table nameblock_data – records / DataFrame, or block ids and records / DataFrame. If given records or DataFrame only, the block id will be 0 as default.
project (str) – project name, if not provided, will be the default project
schema (str) – schema name, if not provided, will be the default schema
partition – the partition of this table to write into
partition_cols (list) – columns representing dynamic partitions
append_missing_cols (bool) – Whether to append missing columns to the target table. False by default.
overwrite (bool) – if True, will overwrite existing data
create_table (bool) – if true, the table will be created if not exist
table_kwargs (dict) – specify other kwargs for
create_table()type_mapping (dict) – specify type mapping for columns when creating tables, can be dicts like
{"column": "bigint"}. If column does not exist in data, it will be added as an empty column.infer_type_with_arrow (bool) – whether to infer column types of pandas objects with arrow when creating tables. Default as False.
table_schema_callback – a function to accept table schema resolved from data and return a new schema for table to create. Only works when target table does not exist and
create_tableis True.lifecycle (int) – specify table lifecycle when creating tables
create_partition (bool) – if true, the partition will be created if not exist
compress_option (
odps.tunnel.CompressOption) – the compression algorithm, level and strategyendpoint (str) – tunnel service URL
reopen (bool) – writing the table will reuse the session which opened last time, if set to True will open a new upload session, default as False
- Returns:
None
- Example:
Write records into a specified table.
>>> odps.write_table('test_table', data)
Write records into multiple blocks.
>>> odps.write_table('test_table', 0, records1, 1, records2)
Write into a given partition.
>>> odps.write_table('test_table', data, partition='pt=test')
Write a pandas DataFrame. Create the table if it does not exist.
>>> import pandas as pd >>> df = pd.DataFrame([ >>> [111, 'aaa', True], >>> [222, 'bbb', False], >>> [333, 'ccc', True], >>> [444, '中文', False] >>> ], columns=['num_col', 'str_col', 'bool_col']) >>> o.write_table('test_table', df, partition='pt=test', create_table=True, create_partition=True)
Passing more arguments when creating table.
>>> import pandas as pd >>> df = pd.DataFrame([ >>> [111, 'aaa', True], >>> [222, 'bbb', False], >>> [333, 'ccc', True], >>> [444, '中文', False] >>> ], columns=['num_col', 'str_col', 'bool_col']) >>> # this dict will be passed to `create_table` as kwargs. >>> table_kwargs = {"transactional": True, "primary_key": "num_col"} >>> o.write_table('test_table', df, partition='pt=test', create_table=True, create_partition=True, >>> table_kwargs=table_kwargs)
Write with dynamic partitioning.
>>> import pandas as pd >>> df = pd.DataFrame([ >>> [111, 'aaa', True, 'p1'], >>> [222, 'bbb', False, 'p1'], >>> [333, 'ccc', True, 'p2'], >>> [444, '中文', False, 'p2'] >>> ], columns=['num_col', 'str_col', 'bool_col', 'pt']) >>> o.write_table('test_part_table', df, partition_cols=['pt'], create_partition=True)
- Note:
write_tabletreats object type of Pandas data as strings as it is often hard to determine their types when creating a new table for your data. To make sure the column type meet your need, you can specify type_mapping argument to specify the column types, for instance,type_mapping={"col1": "array<struct<id:string>>"}.See also