Definitions¶
-
class
odps.
ODPS
(access_id=None, secret_access_key=None, project=None, endpoint=None, **kw)[source]¶ Main entrance to ODPS.
Convenient operations on ODPS objects are provided. Please refer to ODPS docs to see the details.
Generally, basic operations such as
list
,get
,exist
,create
,delete
are provided for each ODPS object. Take theTable
as an example.To create an ODPS instance, access_id and access_key is required, and should ensure correctness, or
SignatureNotMatch
error will throw. If tunnel_endpoint is not set, the tunnel API will route service URL automatically.Parameters: - access_id – Aliyun Access ID
- secret_access_key – Aliyun Access Key
- project – default project name
- endpoint – Rest service URL
- tunnel_endpoint – Tunnel service URL
- logview_host – Logview host URL
Example: >>> odps = ODPS('**your access id**', '**your access key**', 'default_project') >>> >>> for table in odps.list_tables(): >>> # handle each table >>> >>> table = odps.get_table('dual') >>> >>> odps.exist_table('dual') is True >>> >>> odps.create_table('test_table', schema) >>> >>> odps.delete_table('test_table')
-
attach_session
(session_name, taskname=None, hints=None)[source]¶ Attach to an existing session.
Parameters: - session_name – The session name.
- taskname – The created sqlrt task name. If not provided, the default value is used. Mostly doesn’t matter, default works.
Returns: A SessionInstance you may execute select tasks within.
-
copy_offline_model
(name, new_name, project=None, new_project=None, async_=False)[source]¶ Copy current model into a new location.
Parameters: - new_name – name of the new model
- new_project – new project name. if absent, original project name will be used
- async – if True, return the copy instance. otherwise return the newly-copied model
-
create_fs_volume
(name, project=None, **kwargs)[source]¶ Create a new-fashioned file system volume in a project.
Parameters: - name (str) – volume name name
- project (str) – project name, if not provided, will be the default project
Returns: volume
Return type: odps.models.FSVolume
See also
odps.models.FSVolume
-
create_function
(name, project=None, **kwargs)[source]¶ Create a function by given name.
Parameters: - name – function name
- project – project name, if not provided, will be the default project
- class_type (str) – main class
- resources (list) – the resources that function needs to use
Returns: the created function
Return type: Example: >>> res = odps.get_resource('test_func.py') >>> func = odps.create_function('test_func', class_type='test_func.Test', resources=[res, ])
See also
-
create_parted_volume
(name, project=None, **kwargs)[source]¶ Create an old-fashioned partitioned volume in a project.
Parameters: - name (str) – volume name name
- project (str) – project name, if not provided, will be the default project
Returns: volume
Return type: odps.models.PartedVolume
See also
odps.models.PartedVolume
-
create_resource
(name, type=None, project=None, **kwargs)[source]¶ Create a resource by given name and given type.
Currently, the resource type can be
file
,jar
,py
,archive
,table
.The
file
,jar
,py
,archive
can be classified into file resource. To init the file resource, you have to provide another parameter which is a file-like object.For the table resource, the table name, project name, and partition should be provided which the partition is optional.
Parameters: - name – resource name
- type – resource type, now support
file
,jar
,py
,archive
,table
- project – project name, if not provided, will be the default project
- kwargs – optional arguments, I will illustrate this in the example below.
Returns: resource depends on the type, if
file
will beodps.models.FileResource
and so onReturn type: odps.models.Resource
’s subclassesExample: >>> from odps.models.resource import * >>> >>> res = odps.create_resource('test_file_resource', 'file', fileobj=open('/to/path/file')) >>> assert isinstance(res, FileResource) >>> True >>> >>> res = odps.create_resource('test_py_resource.py', 'py', fileobj=StringIO('import this')) >>> assert isinstance(res, PyResource) >>> True >>> >>> res = odps.create_resource('test_table_resource', 'table', table_name='test_table', partition='pt=test') >>> assert isinstance(res, TableResource) >>> True >>>
-
create_role
(name, project=None)[source]¶ Create a role in a project
Parameters: - name – name of the role to create
- project – project name, if not provided, will be the default project
Returns: role object created
-
create_session
(session_worker_count, session_worker_memory, session_name=None, worker_spare_span=None, taskname=None, hints=None)[source]¶ Create session.
Parameters: - session_worker_count – How much workers assigned to the session.
- session_worker_memory – How much memory each worker consumes.
- session_name – The session name. Not specifying to use its ID as name.
- worker_spare_span – format “00-24”, allocated workers will be reduced during this time. Not specifying to disable this.
- taskname – The created sqlrt task name. If not provided, the default value is used. Mostly doesn’t matter, default works.
- hints – Extra hints provided to the session. Parameters of this method will override certain hints.
Returns: A SessionInstance you may execute select tasks within.
-
create_table
(name, schema, project=None, comment=None, if_not_exists=False, lifecycle=None, shard_num=None, hub_lifecycle=None, async_=False, **kw)[source]¶ Create an table by given schema and other optional parameters.
Parameters: - name – table name
- schema – table schema. Can be an instance of
odps.models.Schema
or a string like ‘col1 string, col2 bigint’ - project – project name, if not provided, will be the default project
- comment – table comment
- if_not_exists (bool) – will not create if this table already exists, default False
- lifecycle (int) – table’s lifecycle. If absent, options.lifecycle will be used.
- shard_num (int) – table’s shard num
- hub_lifecycle (int) – hub lifecycle
- async (bool) – if True, will run asynchronously
Returns: the created Table if not async else odps instance
Return type: See also
-
create_user
(name, project=None)[source]¶ Add a user into the project
Parameters: - name – user name
- project – project name, if not provided, will be the default project
Returns: user created
-
create_volume_directory
(volume, path=None, project=None)[source]¶ Create a directory under a file system volume.
Parameters: - volume (str) – name of the volume.
- path (str) – path of the directory to be created.
- project (str) – project name, if not provided, will be the default project.
Returns: directory object.
-
default_session
()[source]¶ Attach to the default session of your project.
Returns: A SessionInstance you may execute select tasks within.
-
delete_function
(name, project=None)[source]¶ Delete a function by given name.
Parameters: - name – function name
- project – project name, if not provided, will be the default project
Returns: None
-
delete_offline_model
(name, project=None, if_exists=False)[source]¶ Delete the offline model by given name.
Parameters: - name – offline model’s name
- if_exists – will not raise errors when the offline model does not exist, default False
- project – project name, if not provided, will be the default project
Returns: None
-
delete_resource
(name, project=None)[source]¶ Delete resource by given name.
Parameters: - name – resource name
- project – project name, if not provided, will be the default project
Returns: None
-
delete_role
(name, project=None)[source]¶ Delete a role in a project
Parameters: - name – name of the role to delete
- project – project name, if not provided, will be the default project
-
delete_table
(name, project=None, if_exists=False, async_=False, **kw)[source]¶ Delete the table with given name
Parameters: - name – table name
- project – project name, if not provided, will be the default project
- if_exists – will not raise errors when the table does not exist, default False
- async (bool) – if True, will run asynchronously
Returns: None if not async else odps instance
-
delete_user
(name, project=None)[source]¶ Delete a user from the project
Parameters: - name – user name
- project – project name, if not provided, will be the default project
-
delete_volume
(name, project=None)[source]¶ Delete volume by given name.
Parameters: - name – volume name
- project – project name, if not provided, will be the default project
Returns: None
-
delete_volume_file
(volume, path=None, recursive=False, project=None)[source]¶ Delete a file / directory object under a file system volume.
Parameters: - volume (str) – name of the volume.
- path (str) – path of the directory to be created.
- recursive (bool) – if True, recursively delete files
- project (str) – project name, if not provided, will be the default project.
Returns: directory object.
-
delete_volume_partition
(volume, partition=None, project=None)[source]¶ Delete partition in a volume by given name
Parameters: - volume (str) – volume name
- partition (str) – partition name
- project (str) – project name, if not provided, will be the default project
-
delete_xflow
(name, project=None)[source]¶ Delete xflow by given name.
Parameters: - name – xflow name
- project – project name, if not provided, will be the default project
Returns: None
-
execute_archive_table
(table, partition=None, project=None, hints=None, priority=None)[source]¶ Execute a task to archive tables and wait for termination.
Parameters: - table – name of the table to archive
- partition – partition to archive
- project – project name, if not provided, will be the default project
- hints – settings for table archive task.
- priority – instance priority, 9 as default
Returns: instance
Return type:
-
execute_merge_files
(table, partition=None, project=None, hints=None, priority=None)[source]¶ Execute a task to merge multiple files in tables and wait for termination.
Parameters: - table – name of the table to optimize
- partition – partition to optimize
- project – project name, if not provided, will be the default project
- hints – settings for merge task.
- priority – instance priority, 9 as default
Returns: instance
Return type:
-
execute_sql
(sql, project=None, priority=None, running_cluster=None, hints=None, **kwargs)[source]¶ Run a given SQL statement and block until the SQL executed successfully.
Parameters: - sql (str) – SQL statement
- project – project name, if not provided, will be the default project
- priority (int) – instance priority, 9 as default
- running_cluster – cluster to run this instance
- hints (dict) – settings for SQL, e.g. odps.mapred.map.split.size
Returns: instance
Return type: Example: >>> instance = odps.execute_sql('select * from dual') >>> with instance.open_reader() as reader: >>> for record in reader: # iterate to handle result with schema >>> # handle each record >>> >>> instance = odps.execute_sql('desc dual') >>> with instance.open_reader() as reader: >>> print(reader.raw) # without schema, just get the raw result
See also
-
execute_sql_cost
(sql, project=None, hints=None, **kwargs)[source]¶ Parameters: - sql (str) – SQL statement
- project – project name, if not provided, will be the default project
- hints (dict) – settings for SQL, e.g. odps.mapred.map.split.size
Returns: cost info in dict format
Return type: cost: dict
Example: >>> sql_cost = odps.execute_sql_cost('select * from dual') >>> sql_cost.udf_num 0 >>> sql_cost.complexity 1.0 >>> sql_cost.input_size 100
-
execute_xflow
(xflow_name, xflow_project=None, parameters=None, project=None, hints=None, priority=None)[source]¶ Run xflow by given name, xflow project, paremeters, block until xflow executed successfully.
Parameters: - xflow_name (str) – XFlow name
- xflow_project (str) – the project XFlow deploys
- parameters (dict) – parameters
- project – project name, if not provided, will be the default project
- hints (dict) – execution hints
- priority (int) – instance priority, 9 as default
Returns: instance
Return type: See also
-
exist_function
(name, project=None)[source]¶ If the function with given name exists or not.
Parameters: - name – function name
- project – project name, if not provided, will be the default project
Returns: True if the function exists or False
Return type: bool
-
exist_instance
(id_, project=None)[source]¶ If the instance with given id exists or not.
Parameters: - id – instance id
- project – project name, if not provided, will be the default project
Returns: True if exists or False
Return type: bool
-
exist_offline_model
(name, project=None)[source]¶ If the offline model with given name exists or not.
Parameters: - name – offline model’s name
- project – project name, if not provided, will be the default project
Returns: True if offline model exists else False
Return type: bool
-
exist_project
(name)[source]¶ If project name which provided exists or not.
Parameters: name – project name, if not provided, will be the default project Returns: True if exists or False Return type: bool
-
exist_resource
(name, project=None)[source]¶ If the resource with given name exists or not.
Parameters: - name – resource name
- project – project name, if not provided, will be the default project
Returns: True if exists or False
Return type: bool
-
exist_role
(name, project=None)[source]¶ Check if a role exists in a project
Parameters: - name – name of the role
- project – project name, if not provided, will be the default project
-
exist_table
(name, project=None)[source]¶ If the table with given name exists or not.
Parameters: - name – table name
- project – project name, if not provided, will be the default project
Returns: True if table exists or False
Return type: bool
-
exist_user
(name, project=None)[source]¶ Check if a user exists in the project
Parameters: - name – user name
- project – project name, if not provided, will be the default project
-
exist_volume
(name, project=None)[source]¶ If the volume with given name exists or not.
Parameters: - name (str) – volume name
- project (str) – project name, if not provided, will be the default project
Returns: True if exists or False
Return type: bool
-
exist_volume_partition
(volume, partition=None, project=None)[source]¶ If the volume with given name exists in a partition or not.
Parameters: - volume (str) – volume name
- partition (str) – partition name
- project (str) – project name, if not provided, will be the default project
-
exist_xflow
(name, project=None)[source]¶ If the xflow with given name exists or not.
Parameters: - name – xflow name
- project – project name, if not provided, will be the default project
Returns: True if exists or False
Return type: bool
-
get_function
(name, project=None)[source]¶ Get the function by given name
Parameters: - name – function name
- project – project name, if not provided, will be the default project
Returns: the right function
Raise: odps.errors.NoSuchObject
if not existsSee also
-
get_instance
(id_, project=None)[source]¶ Get instance by given instance id.
Parameters: - id – instance id
- project – project name, if not provided, will be the default project
Returns: the right instance
Return type: Raise: odps.errors.NoSuchObject
if not existsSee also
-
get_logview_address
(instance_id, hours=None, project=None)[source]¶ Get logview address by given instance id and hours.
Parameters: - instance_id – instance id
- hours –
- project – project name, if not provided, will be the default project
Returns: logview address
Return type: str
-
get_offline_model
(name, project=None)[source]¶ Get offline model by given name
Parameters: - name – offline model name
- project – project name, if not provided, will be the default project
Returns: offline model
Return type: Raise: odps.errors.NoSuchObject
if not exists
-
get_project
(name=None)[source]¶ Get project by given name.
Parameters: name – project name, if not provided, will be the default project Returns: the right project Return type: odps.models.Project
Raise: odps.errors.NoSuchObject
if not existsSee also
-
get_project_policy
(project=None)[source]¶ Get policy of a project
Parameters: project – project name, if not provided, will be the default project Returns: JSON object
-
get_resource
(name, project=None)[source]¶ Get a resource by given name
Parameters: - name – resource name
- project – project name, if not provided, will be the default project
Returns: the right resource
Return type: Raise: odps.errors.NoSuchObject
if not existsSee also
-
get_role_policy
(name, project=None)[source]¶ Get policy object of a role
Parameters: - name – name of the role
- project – project name, if not provided, will be the default project
Returns: JSON object
-
get_security_option
(option_name, project=None)[source]¶ Get one security option of a project
Parameters: - option_name – name of the security option. Please refer to ODPS options for more details.
- project – project name, if not provided, will be the default project
Returns: option value
-
get_security_options
(project=None)[source]¶ Get all security options of a project
Parameters: project – project name, if not provided, will be the default project Returns: SecurityConfiguration object
-
get_table
(name, project=None)[source]¶ Get table by given name.
Parameters: - name – table name
- project – project name, if not provided, will be the default project
Returns: the right table
Return type: Raise: odps.errors.NoSuchObject
if not existsSee also
-
get_volume
(name, project=None)[source]¶ Get volume by given name.
Parameters: - name (str) – volume name
- project (str) – project name, if not provided, will be the default project
Returns: volume object. Return type depends on the type of the volume.
Return type: odps.models.Volume
-
get_volume_file
(volume, path=None, project=None)[source]¶ Get a file under a partition of a parted volume, or a file / directory object under a file system volume.
Parameters: - volume (str) – name of the volume.
- path (str) – path of the directory to be created.
- project (str) – project name, if not provided, will be the default project.
Returns: directory object.
-
get_volume_partition
(volume, partition=None, project=None)[source]¶ Get partition in a parted volume by given name.
Parameters: - volume (str) – volume name
- partition (str) – partition name
- project (str) – project name, if not provided, will be the default project
Returns: partitions
Return type: odps.models.VolumePartition
-
get_xflow
(name, project=None)[source]¶ Get xflow by given name
Parameters: - name – xflow name
- project – project name, if not provided, will be the default project
Returns: xflow
Return type: odps.models.XFlow
Raise: odps.errors.NoSuchObject
if not existsSee also
odps.models.XFlow
-
get_xflow_results
(instance, project=None)[source]¶ The result given the results of xflow
Parameters: - instance (
odps.models.Instance
) – instance of xflow - project – project name, if not provided, will be the default project
Returns: xflow result
Return type: dict
- instance (
-
get_xflow_sub_instances
(instance, project=None)[source]¶ The result iterates the sub instance of xflow
Parameters: - instance (
odps.models.Instance
) – instance of xflow - project – project name, if not provided, will be the default project
Returns: sub instances dictionary
- instance (
-
iter_xflow_sub_instances
(instance, interval=1, project=None)[source]¶ The result iterates the sub instance of xflow and will wait till instance finish
Parameters: - instance (
odps.models.Instance
) – instance of xflow - interval – time interval to check
- project – project name, if not provided, will be the default project
Returns: sub instances dictionary
- instance (
-
list_functions
(project=None)[source]¶ List all functions of a project.
Parameters: project – project name, if not provided, will be the default project Returns: functions Return type: generator
-
list_instance_queueing_infos
(project=None, status=None, only_owner=None, quota_index=None)[source]¶ List instance queueing information.
Parameters: - project – project name, if not provided, will be the default project
- status – including ‘Running’, ‘Suspended’, ‘Terminated’
- only_owner (bool) – True will filter the instances created by current user
- quota_index (str) –
Returns: instance queueing infos
Return type: list
-
list_instances
(project=None, start_time=None, end_time=None, status=None, only_owner=None, quota_index=None, **kw)[source]¶ List instances of a project by given optional conditions including start time, end time, status and if only the owner.
Parameters: - project – project name, if not provided, will be the default project
- start_time (datetime, int or float) – the start time of filtered instances
- end_time (datetime, int or float) – the end time of filtered instances
- status – including ‘Running’, ‘Suspended’, ‘Terminated’
- only_owner (bool) – True will filter the instances created by current user
- quota_index (str) –
Returns: instances
Return type: list
-
list_offline_models
(project=None, prefix=None, owner=None)[source]¶ List offline models of project by optional filter conditions including prefix and owner.
Parameters: - project – project name, if not provided, will be the default project
- prefix – prefix of offline model’s name
- owner – Aliyun account
Returns: offline models
Return type: list
-
list_projects
(owner=None, user=None, group=None, prefix=None, max_items=None)[source]¶ List projects.
Parameters: - owner – Aliyun account, the owner which listed projects belong to
- user – name of the user who has access to listed projects
- group – name of the group listed projects belong to
- prefix – prefix of names of listed projects
- max_items – the maximal size of result set
Returns: projects in this endpoint.
Return type: generator
-
list_resources
(project=None)[source]¶ List all resources of a project.
Parameters: project – project name, if not provided, will be the default project Returns: resources Return type: generator
-
list_role_users
(name, project=None)[source]¶ List users who have the specified role.
Parameters: - name – name of the role
- project – project name, if not provided, will be the default project
Returns: collection of User objects
-
list_roles
(project=None)[source]¶ List all roles in a project
Parameters: project – project name, if not provided, will be the default project Returns: collection of role objects
-
list_tables
(project=None, prefix=None, owner=None)[source]¶ List all tables of a project. If prefix is provided, the listed tables will all start with this prefix. If owner is provided, the listed tables will be belong to such owner.
Parameters: - project – project name, if not provided, will be the default project
- prefix (str) – the listed tables start with this prefix
- owner (str) – Aliyun account, the owner which listed tables belong to
Returns: tables in this project, filtered by the optional prefix and owner.
Return type: generator
-
list_tables_model
(prefix='', project=None)¶ List all TablesModel in the given project.
Parameters: - prefix – model prefix
- project (str) – project name, if you want to look up in another project
Return type: list[str]
-
list_user_roles
(name, project=None)[source]¶ List roles of the specified user
Parameters: - name – user name
- project – project name, if not provided, will be the default project
Returns: collection of Role object
-
list_users
(project=None)[source]¶ List users in the project
Parameters: project – project name, if not provided, will be the default project Returns: collection of User objects
-
list_volume_files
(volume, partition=None, project=None)[source]¶ List files in a volume. In partitioned volumes, the function returns files under specified partition. In file system volumes, the function returns files under specified path.
Parameters: - volume (str) – volume name
- partition (str) – partition name for partitioned volumes, and path for file system volumes.
- project (str) – project name, if not provided, will be the default project
Returns: files
Return type: list
Example: >>> # List files under a partition in a partitioned volume. Two calls are equivalent. >>> odps.list_volume_files('parted_volume', 'partition_name') >>> odps.list_volume_files('/parted_volume/partition_name') >>> # List files under a path in a file system volume. Two calls are equivalent. >>> odps.list_volume_files('fs_volume', 'dir1/dir2') >>> odps.list_volume_files('/fs_volume/dir1/dir2')
-
list_volume_partitions
(volume, project=None)[source]¶ List partitions of a volume.
Parameters: - volume (str) – volume name
- project (str) – project name, if not provided, will be the default project
Returns: partitions
Return type: list
-
list_volumes
(project=None, owner=None)[source]¶ List volumes of a project.
Parameters: - project – project name, if not provided, will be the default project
- owner (str) – Aliyun account
Returns: volumes
Return type: list
-
list_xflows
(project=None, owner=None)[source]¶ List xflows of a project which can be filtered by the xflow owner.
Parameters: - project – project name, if not provided, will be the default project
- owner (str) – Aliyun account
Returns: xflows
Return type: list
-
move_volume_file
(old_path, new_path, replication=None, project=None)[source]¶ Move a file / directory object under a file system volume to another location in the same volume.
Parameters: - old_path (str) – old path of the volume file.
- new_path (str) – target path of the moved file.
- replication (int) – file replication.
- project (str) – project name, if not provided, will be the default project.
Returns: directory object.
-
open_resource
(name, project=None, mode='r+', encoding='utf-8')[source]¶ Open a file resource as file-like object. This is an elegant and pythonic way to handle file resource.
The argument
mode
stands for the open mode for this file resource. It can be binary mode if the ‘b’ is inside. For instance, ‘rb’ means opening the resource as read binary mode while ‘r+b’ means opening the resource as read+write binary mode. This is most import when the file is actually binary such as tar or jpeg file, so be aware of opening this file as a correct mode.Basically, the text mode can be ‘r’, ‘w’, ‘a’, ‘r+’, ‘w+’, ‘a+’ just like the builtin python
open
method.r
means read onlyw
means write only, the file will be truncated when openinga
means append onlyr+
means read+write without constraintw+
will truncate first then opening into read+writea+
can read+write, however the written content can only be appended to the end
Parameters: - name (
odps.models.FileResource
or str) – file resource or file resource name - project – project name, if not provided, will be the default project
- mode – the mode of opening file, described as above
- encoding – utf-8 as default
Returns: file-like object
Example: >>> with odps.open_resource('test_resource', 'r') as fp: >>> fp.read(1) # read one unicode character >>> fp.write('test') # wrong, cannot write under read mode >>> >>> with odps.open_resource('test_resource', 'wb') as fp: >>> fp.readlines() # wrong, cannot read under write mode >>> fp.write('hello world') # write bytes >>> >>> with odps.open_resource('test_resource') as fp: # default as read-write mode >>> fp.seek(5) >>> fp.truncate() >>> fp.flush()
-
open_volume_reader
(volume, partition=None, file_name=None, project=None, start=None, length=None, **kwargs)[source]¶ Open a volume file for read. A file-like object will be returned which can be used to read contents from volume files.
Parameters: - volume (str) – name of the volume
- partition (str) – name of the partition
- file_name (str) – name of the file
- project (str) – project name, if not provided, will be the default project
- start – start position
- length – length limit
- compress_option (CompressOption) – the compression algorithm, level and strategy
Example: >>> with odps.open_volume_reader('parted_volume', 'partition', 'file') as reader: >>> [print(line) for line in reader]
-
open_volume_writer
(volume, partition=None, project=None, **kwargs)[source]¶ Write data into a volume. This function behaves differently under different types of volumes.
Under partitioned volumes, all files under a partition should be uploaded in one submission. The method returns a writer object with whose open method you can open a file inside the volume and write to it, or you can use write method to write to specific files.
Under file system volumes, the method returns a file-like object.
Parameters: - volume (str) – name of the volume
- partition (str) – partition name for partitioned volumes, and path for file system volumes.
- project (str) – project name, if not provided, will be the default project
- compress_option (
odps.tunnel.CompressOption
) – the compression algorithm, level and strategy
Example: >>> # Writing to partitioned volumes >>> with odps.open_volume_writer('parted_volume', 'partition') as writer: >>> # both write methods are acceptable >>> writer.open('file1').write('some content') >>> writer.write('file2', 'some content') >>> # Writing to file system volumes >>> with odps.open_volume_writer('/fs_volume/dir1/file_name') as writer: >>> writer.write('some content')
-
read_table
(name, limit=None, start=0, step=None, project=None, partition=None, **kw)[source]¶ Read table’s records.
Parameters: - name (
odps.models.table.Table
or str) – table or table name - limit – the records’ size, if None will read all records from the table
- start – the record where read starts with
- step – default as 1
- project – project name, if not provided, will be the default project
- partition – the partition of this table to read
- columns (list) – the columns’ names which are the parts of table’s columns
- compress (bool) – if True, the data will be compressed during downloading
- compress_option (
odps.tunnel.CompressOption
) – the compression algorithm, level and strategy - endpoint – tunnel service URL
- reopen – reading the table will reuse the session which opened last time, if set to True will open a new download session, default as False
Returns: records
Return type: generator
Example: >>> for record in odps.read_table('test_table', 100): >>> # deal with such 100 records >>> for record in odps.read_table('test_table', partition='pt=test', start=100, limit=100): >>> # read the `pt=test` partition, skip 100 records and read 100 records
See also
- name (
-
run_archive_table
(table, partition=None, project=None, hints=None, priority=None)[source]¶ Start running a task to archive tables.
Parameters: - table – name of the table to archive
- partition – partition to archive
- project – project name, if not provided, will be the default project
- hints – settings for table archive task.
- priority – instance priority, 9 as default
Returns: instance
Return type:
-
run_merge_files
(table, partition=None, project=None, hints=None, priority=None)[source]¶ Start running a task to merge multiple files in tables.
Parameters: - table – name of the table to optimize
- partition – partition to optimize
- project – project name, if not provided, will be the default project
- hints – settings for merge task.
- priority – instance priority, 9 as default
Returns: instance
Return type:
-
run_security_query
(query, project=None, token=None)[source]¶ Run a security query to grant / revoke / query privileges
Parameters: - query – query text
- project – project name, if not provided, will be the default project
Returns: a JSON object representing the result.
-
run_sql
(sql, project=None, priority=None, running_cluster=None, hints=None, aliases=None, **kwargs)[source]¶ Run a given SQL statement asynchronously
Parameters: - sql (str) – SQL statement
- project – project name, if not provided, will be the default project
- priority (int) – instance priority, 9 as default
- running_cluster – cluster to run this instance
- hints (dict) – settings for SQL, e.g. odps.mapred.map.split.size
- aliases (dict) –
Returns: instance
Return type: See also
-
run_xflow
(xflow_name, xflow_project=None, parameters=None, project=None, hints=None, priority=None)[source]¶ Run xflow by given name, xflow project, paremeters asynchronously.
Parameters: - xflow_name (str) – XFlow name
- xflow_project (str) – the project XFlow deploys
- parameters (dict) – parameters
- project – project name, if not provided, will be the default project
- hints (dict) – execution hints
- priority (int) – instance priority, 9 as default
Returns: instance
Return type: See also
-
set_project_policy
(policy, project=None)[source]¶ Set policy of a project
Parameters: - policy – name of policy.
- project – project name, if not provided, will be the default project
Returns: JSON object
-
set_role_policy
(name, policy, project=None)[source]¶ Get policy object of project
Parameters: - name – name of the role
- policy – policy string or JSON object
- project – project name, if not provided, will be the default project
-
set_security_option
(option_name, value, project=None)[source]¶ Set a security option of a project
Parameters: - option_name – name of the security option. Please refer to ODPS options for more details.
- value – value of security option to be set.
- project – project name, if not provided, will be the default project.
-
stop_instance
(id_, project=None)[source]¶ Stop the running instance by given instance id.
Parameters: - id – instance id
- project – project name, if not provided, will be the default project
Returns: None
-
stop_job
(id_, project=None)¶ Stop the running instance by given instance id.
Parameters: - id – instance id
- project – project name, if not provided, will be the default project
Returns: None
-
write_table
(name, *block_records, **kw)[source]¶ Write records into given table.
Parameters: - name (
models.table.Table
or str) – table or table name - block_records – if given records only, the block id will be 0 as default.
- project – project name, if not provided, will be the default project
- partition – the partition of this table to write
- compress (bool) – if True, the data will be compressed during uploading
- compress_option (
odps.tunnel.CompressOption
) – the compression algorithm, level and strategy - endpoint – tunnel service URL
- reopen – writing the table will reuse the session which opened last time, if set to True will open a new upload session, default as False
Returns: None
Example: >>> odps.write_table('test_table', records) # write to block 0 as default >>> >>> odps.write_table('test_table', 0, records) # write to block 0 explicitly >>> >>> odps.write_table('test_table', 0, records1, 1, records2) # write to multi-blocks >>> >>> odps.write_table('test_table', records, partition='pt=test') # write to certain partition
See also
- name (
-
class
odps.models.
Project
(**kw)[source]¶ Project is the counterpart of database in a RDBMS.
By get an object of Project, users can get the properties like
name
,owner
,comment
,creation_time
,last_modified_time
, and so on.These properties will not load from remote ODPS service, unless users try to get them explicitly. If users want to check the newest status, try use
reload
method.Example: >>> project = odps.get_project('my_project') >>> project.last_modified_time # this property will be fetched from the remote ODPS service >>> project.last_modified_time # Once loaded, the property will not bring remote call >>> project.owner # so do the other properties, they are fetched together >>> project.reload() # force to update each properties >>> project.last_modified_time # already updated
-
class
odps.models.
Table
(**kwargs)[source]¶ Table means the same to the RDBMS table, besides, a table can consist of partitions.
Table’s properties are the same to the ones of
odps.models.Project
, which will not load from remote ODPS service until users try to get them.In order to write data into table, users should call the
open_writer
method with with statement. At the same time, theopen_reader
method is used to provide the ability to read records from a table or its partition.Example: >>> table = odps.get_table('my_table') >>> table.owner # first will load from remote >>> table.reload() # reload to update the properties >>> >>> for record in table.head(5): >>> # check the first 5 records >>> for record in table.head(5, partition='pt=test', columns=['my_column']) >>> # only check the `my_column` column from certain partition of this table >>> >>> with table.open_reader() as reader: >>> count = reader.count # How many records of a table or its partition >>> for record in record[0: count]: >>> # read all data, actually better to split into reading for many times >>> >>> with table.open_writer() as writer: >>> writer.write(records) >>> with table.open_writer(partition='pt=test', blocks=[0, 1]): >>> writer.write(0, gen_records(block=0)) >>> writer.write(1, gen_records(block=1)) # we can do this parallel
-
create_partition
(partition_spec, if_not_exists=False, async_=False, **kw)[source]¶ Create a partition within the table.
Parameters: - partition_spec – specification of the partition.
- if_not_exists –
- async –
Returns: partition object
Return type:
-
delete_partition
(partition_spec, if_exists=False, async_=False, **kw)[source]¶ Delete a partition within the table.
Parameters: - partition_spec – specification of the partition.
- if_exists –
- async –
-
drop
(async_=False, if_exists=False, **kw)[source]¶ Drop this table.
Parameters: async – run asynchronously if True Returns: None
-
exist_partition
(partition_spec)[source]¶ Check if a partition exists within the table.
Parameters: partition_spec – specification of the partition.
-
exist_partitions
(prefix_spec=None)[source]¶ Check if partitions with provided conditions exist.
Parameters: prefix_spec – prefix of partition Returns: whether partitions exist
-
get_ddl
(with_comments=True, if_not_exists=False)[source]¶ Get DDL SQL statement for the given table.
Parameters: with_comments – append comment for table and each column Returns: DDL statement
-
get_partition
(partition_spec)[source]¶ Get a partition with given specifications.
Parameters: partition_spec – specification of the partition. Returns: partition object Return type: odps.models.partition.Partition
-
head
(limit, partition=None, columns=None)[source]¶ Get the head records of a table or its partition.
Parameters: - limit – records’ size, 10000 at most
- partition – partition of this table
- columns (list) – the columns which is subset of the table columns
Returns: records
Return type: list
See also
-
iterate_partitions
(spec=None)[source]¶ Create an iterable object to iterate over partitions.
Parameters: spec – specification of the partition.
-
new_record
(values=None)[source]¶ Generate a record of the table.
Parameters: values (list) – the values of this records Returns: record Return type: odps.models.Record
Example: >>> table = odps.create_table('test_table', schema=Schema.from_lists(['name', 'id'], ['sring', 'string'])) >>> record = table.new_record() >>> record[0] = 'my_name' >>> record[1] = 'my_id' >>> record = table.new_record(['my_name', 'my_id'])
See also
-
open_reader
(partition=None, **kw)[source]¶ Open the reader to read the entire records from this table or its partition.
Parameters: - partition – partition of this table
- reopen (bool) – the reader will reuse last one, reopen is true means open a new reader.
- endpoint – the tunnel service URL
- compress_option (
odps.tunnel.CompressOption
) – compression algorithm, level and strategy - compress_algo – compression algorithm, work when
compress_option
is not provided, can bezlib
,snappy
- compress_level – used for
zlib
, work whencompress_option
is not provided - compress_strategy – used for
zlib
, work whencompress_option
is not provided - download_id – use existing download_id to download table contents
Returns: reader,
count
means the full size,status
means the tunnel statusExample: >>> with table.open_reader() as reader: >>> count = reader.count # How many records of a table or its partition >>> for record in reader[0: count]: >>> # read all data, actually better to split into reading for many times
-
open_writer
(partition=None, blocks=None, **kw)[source]¶ Open the writer to write records into this table or its partition.
Parameters: - partition – partition of this table
- blocks – block ids to open
- reopen (bool) – the reader will reuse last one, reopen is true means open a new reader.
- create_partition (bool) – if true, the partition will be created if not exist
- endpoint – the tunnel service URL
- upload_id – use existing upload_id to upload data
- compress_option (
odps.tunnel.CompressOption
) – compression algorithm, level and strategy - compress_algo – compression algorithm, work when
compress_option
is not provided, can bezlib
,snappy
- compress_level – used for
zlib
, work whencompress_option
is not provided - compress_strategy – used for
zlib
, work whencompress_option
is not provided
Returns: writer, status means the tunnel writer status
Example: >>> with table.open_writer() as writer: >>> writer.write(records) >>> with table.open_writer(partition='pt=test', blocks=[0, 1]): >>> writer.write(0, gen_records(block=0)) >>> writer.write(1, gen_records(block=1)) # we can do this parallel
-
-
odps.models.
Schema
¶ alias of
odps.models.table.TableSchema
-
class
odps.models.table.
TableSchema
(**kwargs)[source]¶ Schema includes the columns and partitions information of a
odps.models.Table
.There are two ways to initialize a Schema object, first is to provide columns and partitions, the second way is to call the class method
from_lists
. See the examples below:Example: >>> columns = [Column(name='num', type='bigint', comment='the column')] >>> partitions = [Partition(name='pt', type='string', comment='the partition')] >>> schema = Schema(columns=columns, partitions=partitions) >>> schema.columns [<column num, type bigint>, <partition pt, type string>] >>> >>> schema = Schema.from_lists(['num'], ['bigint'], ['pt'], ['string']) >>> schema.columns [<column num, type bigint>, <partition pt, type string>]
-
class
odps.models.partition.
Partition
(**kwargs)[source]¶ A partition is a collection of rows in a table whose partition columns are equal to specific values.
In order to write data into partition, users should call the
open_writer
method with with statement. At the same time, theopen_reader
method is used to provide the ability to read records from a partition. The behavior of these methods are the same as those in Table class except that there are no ‘partition’ params.-
drop
(async_=False, if_exists=False, **kw)[source]¶ Drop this partition.
Parameters: async – run asynchronously if True Returns: None
-
open_reader
(**kw)[source]¶ Open the reader to read the entire records from this partition.
Parameters: - reopen (bool) – the reader will reuse last one, reopen is true means open a new reader.
- endpoint – the tunnel service URL
- compress_option (
odps.tunnel.CompressOption
) – compression algorithm, level and strategy - compress_algo – compression algorithm, work when
compress_option
is not provided, can bezlib
,snappy
- compress_level – used for
zlib
, work whencompress_option
is not provided - compress_strategy – used for
zlib
, work whencompress_option
is not provided
Returns: reader,
count
means the full size,status
means the tunnel statusExample: >>> with partition.open_reader() as reader: >>> count = reader.count # How many records of a partition >>> for record in reader[0: count]: >>> # read all data, actually better to split into reading for many times
-
-
class
odps.models.
Record
(columns=None, schema=None, values=None)[source]¶ A record generally means the data of a single line in a table.
Example: >>> schema = Schema.from_lists(['name', 'id'], ['string', 'string']) >>> record = Record(schema=schema, values=['test', 'test2']) >>> record[0] = 'test' >>> record[0] >>> 'test' >>> record['name'] >>> 'test' >>> record[0:2] >>> ('test', 'test2') >>> record[0, 1] >>> ('test', 'test2') >>> record['name', 'id'] >>> for field in record: >>> print(field) ('name', u'test') ('id', u'test2') >>> len(record) 2 >>> 'name' in record True
-
class
odps.models.
Instance
(**kwargs)[source]¶ Instance means that a ODPS task will sometimes run as an instance.
status
can reflect the current situation of a instance.is_terminated
method indicates if the instance has finished.is_successful
method indicates if the instance runs successfully.wait_for_success
method will block the main process until the instance has finished.For a SQL instance, we can use open_reader to read the results.
Example: >>> instance = odps.execute_sql('select * from dual') # this sql return the structured data >>> with instance.open_reader() as reader: >>> # handle the record >>> >>> instance = odps.execute_sql('desc dual') # this sql do not return structured data >>> with instance.open_reader() as reader: >>> print(reader.raw) # just return the raw result
-
exception
DownloadSessionCreationError
(msg, request_id=None, code=None, host_id=None, instance_id=None)[source]¶
-
class
Task
(**kwargs)[source]¶ Task stands for each task inside an instance.
It has a name, a task type, the start to end time, and a running status.
-
class
TaskProgress
(**kwargs)[source]¶ TaskProgress reprents for the progress of a task.
A single TaskProgress may consist of several stages.
Example: >>> progress = instance.get_task_progress('task_name') >>> progress.get_stage_progress_formatted_string() 2015-11-19 16:39:07 M1_Stg1_job0:0/0/1[0%] R2_1_Stg1_job0:0/0/1[0%]
-
class
-
get_logview_address
(hours=None)[source]¶ Get logview address of the instance object by hours.
Parameters: hours – Returns: logview address Return type: str
-
get_sql_task_cost
()[source]¶ Get cost information of the sql task. Including input data size, number of UDF, Complexity of the sql task
Returns: cost info in dict format
-
get_task_cost
(task_name)[source]¶ Get task cost
Parameters: task_name – name of the task Returns: task cost Return type: Instance.TaskCost Example: >>> cost = instance.get_task_cost(instance.get_task_names()[0]) >>> cost.cpu_cost 200 >>> cost.memory_cost 4096 >>> cost.input_size 0
-
get_task_detail
(task_name)[source]¶ Get task’s detail
Parameters: task_name – task name Returns: the task’s detail Return type: list or dict according to the JSON
-
get_task_detail2
(task_name)[source]¶ Get task’s detail v2
Parameters: task_name – task name Returns: the task’s detail Return type: list or dict according to the JSON
-
get_task_info
(task_name, key)[source]¶ Get task related information.
Parameters: - task_name – name of the task
- key – key of the information item
Returns: a string of the task information
-
get_task_progress
(task_name)[source]¶ Get task’s current progress
Parameters: task_name – task_name Returns: the task’s progress Return type: odps.models.Instance.Task.TaskProgress
-
get_task_quota
(task_name)[source]¶ Get queueing info of the task. Note that time between two calls should larger than 30 seconds, otherwise empty dict is returned.
Parameters: task_name – name of the task Returns: quota info in dict format
-
get_task_result
(task_name)[source]¶ Get a single task result.
Parameters: task_name – task name Returns: task result Return type: str
-
get_task_results
()[source]¶ Get all the task results.
Returns: a dict which key is task name, and value is the task result as string Return type: dict
-
get_task_statuses
()[source]¶ Get all tasks’ statuses
Returns: a dict which key is the task name and value is the odps.models.Instance.Task
objectReturn type: dict
-
get_task_summary
(task_name)[source]¶ Get a task’s summary, mostly used for MapReduce.
Parameters: task_name – task name Returns: summary as a dict parsed from JSON Return type: dict
-
get_task_workers
(task_name=None, json_obj=None)[source]¶ Get workers from task :param task_name: task name :param json_obj: json object parsed from get_task_detail2 :return: list of workers
See also
-
get_worker_log
(log_id, log_type, size=0)[source]¶ Get logs from worker.
Parameters: - log_id – id of log, can be retrieved from details.
- log_type – type of logs. Possible log types contains coreinfo, hs_err_log, jstack, pstack, stderr, stdout, waterfall_summary
- size – length of the log to retrieve
Returns: log content
-
is_running
(retry=False)[source]¶ If this instance is still running.
Returns: True if still running else False Return type: bool
-
is_successful
(retry=False)[source]¶ If the instance runs successfully.
Returns: True if successful else False Return type: bool
-
is_terminated
(retry=False)[source]¶ If this instance has finished or not.
Returns: True if finished else False Return type: bool
-
open_reader
(*args, **kwargs)[source]¶ Open the reader to read records from the result of the instance. If tunnel is True, instance tunnel will be used. Otherwise conventional routine will be used. If instance tunnel is not available and tunnel is not specified,, the method will fall back to the conventional routine. Note that the number of records returned is limited unless options.limited_instance_tunnel is set to True or limit=True is configured under instance tunnel mode. Otherwise the number of records returned is always limited.
Parameters: - tunnel – if true, use instance tunnel to read from the instance. if false, use conventional routine. if absent, options.tunnel.use_instance_tunnel will be used and automatic fallback is enabled.
- limit (bool) – if True, enable the limitation
- reopen (bool) – the reader will reuse last one, reopen is true means open a new reader.
- endpoint – the tunnel service URL
- compress_option (
odps.tunnel.CompressOption
) – compression algorithm, level and strategy - compress_algo – compression algorithm, work when
compress_option
is not provided, can bezlib
,snappy
- compress_level – used for
zlib
, work whencompress_option
is not provided - compress_strategy – used for
zlib
, work whencompress_option
is not provided
Returns: reader,
count
means the full size,status
means the tunnel statusExample: >>> with instance.open_reader() as reader: >>> count = reader.count # How many records of a table or its partition >>> for record in reader[0: count]: >>> # read all data, actually better to split into reading for many times
-
put_task_info
(task_name, key, value)[source]¶ Put information into a task.
Parameters: - task_name – name of the task
- key – key of the information item
- value – value of the information item
-
exception
-
class
odps.models.
Resource
(**kwargs)[source]¶ Resource is useful when writing UDF or MapReduce. This is an abstract class.
Basically, resource can be either a file resource or a table resource. File resource can be
file
,py
,jar
,archive
in details.
-
class
odps.models.
FileResource
(**kw)[source]¶ File resource represents for a file.
Use
open
method to open this resource as an file-like object.-
flush
()[source]¶ Commit the change to ODPS if any change happens. Close will do this automatically.
Returns: None
-
open
(mode='r', encoding='utf-8')[source]¶ The argument
mode
stands for the open mode for this file resource. It can be binary mode if the ‘b’ is inside. For instance, ‘rb’ means opening the resource as read binary mode while ‘r+b’ means opening the resource as read+write binary mode. This is most import when the file is actually binary such as tar or jpeg file, so be aware of opening this file as a correct mode.Basically, the text mode can be ‘r’, ‘w’, ‘a’, ‘r+’, ‘w+’, ‘a+’ just like the builtin python
open
method.r
means read onlyw
means write only, the file will be truncated when openinga
means append onlyr+
means read+write without constraintw+
will truncate first then opening into read+writea+
can read+write, however the written content can only be appended to the end
Parameters: - mode – the mode of opening file, described as above
- encoding – utf-8 as default
Returns: file-like object
Example: >>> with resource.open('r') as fp: >>> fp.read(1) # read one unicode character >>> fp.write('test') # wrong, cannot write under read mode >>> >>> with resource.open('wb') as fp: >>> fp.readlines() # wrong, cannot read under write mode >>> fp.write('hello world') # write bytes >>> >>> with resource.open('test_resource', 'r+') as fp: # open as read-write mode >>> fp.seek(5) >>> fp.truncate() >>> fp.flush()
-
read
(size=-1)[source]¶ Read the file resource, read all as default.
Parameters: size – unicode or byte length depends on text mode or binary mode. Returns: unicode or bytes depends on text mode or binary mode Return type: str or unicode(Py2), bytes or str(Py3)
-
readline
(size=-1)[source]¶ Read a single line.
Parameters: size – If the size argument is present and non-negative, it is a maximum byte count (including the trailing newline) and an incomplete line may be returned. When size is not 0, an empty string is returned only when EOF is encountered immediately Returns: unicode or bytes depends on text mode or binary mode Return type: str or unicode(Py2), bytes or str(Py3)
-
readlines
(sizehint=-1)[source]¶ Read as lines.
Parameters: sizehint – If the optional sizehint argument is present, instead of reading up to EOF, whole lines totalling approximately sizehint bytes (possibly after rounding up to an internal buffer size) are read. Returns: lines Return type: list
-
seek
(pos, whence=0)[source]¶ Seek to some place.
Parameters: - pos – position to seek
- whence – if set to 2, will seek to the end
Returns: None
-
truncate
(size=None)[source]¶ Truncate the file resource’s size.
Parameters: size – If the optional size argument is present, the file is truncated to (at most) that size. The size defaults to the current position. Returns: None
-
-
class
odps.models.
ArchiveResource
(**kw)[source]¶ File resource representing for the compressed file like .zip/.tgz/.tar.gz/.tar/jar
-
class
odps.models.
TableResource
(**kw)[source]¶ Take a table as a resource.
-
partition
¶ Get the source table partition.
Returns: the source table partition
-
table
¶ Get the table object.
Returns: source table Return type: odps.models.Table
See also
-
-
class
odps.models.
Function
(**kwargs)[source]¶ Function can be used in UDF when user writes a SQL.
-
resources
¶ Return all the resources which this function refer to.
Returns: resources Return type: list See also
-
-
class
odps.models.
Worker
(**kwargs)[source]¶ Worker information class for worker information and log retrieval.
-
class
odps.models.ml.
OfflineModel
(**kwargs)[source]¶ Representing an ODPS offline model.
-
copy
(new_name, new_project=None, async_=False)[source]¶ Copy current model into a new location.
Parameters: - new_name – name of the new model
- new_project – new project name. if absent, original project name will be used
- async – if True, return the copy instance. otherwise return the newly-copied model
-