Model objects

class odps.models.Project(*args, **kwargs)[source]

Project is the counterpart of database in a RDBMS.

By get an object of Project, users can get the properties like name, owner, comment, creation_time, last_modified_time, and so on.

These properties will not load from remote ODPS service, unless users try to get them explicitly. If users want to check the newest status, try use reload method.

Example:

>>> project = odps.get_project('my_project')
>>> project.last_modified_time  # this property will be fetched from the remote ODPS service
>>> project.last_modified_time  # Once loaded, the property will not bring remote call
>>> project.owner  # so do the other properties, they are fetched together
>>> project.reload()  # force to update each properties
>>> project.last_modified_time  # already updated
class AuthQueryStatus(value)[source]
class ProjectStatus(value)[source]
class ProjectType(value)[source]
class odps.models.Table(*args, **kwargs)[source]

Table means the same to the RDBMS table, besides, a table can consist of partitions.

Table’s properties are the same to the ones of odps.models.Project, which will not load from remote ODPS service until users try to get them.

In order to write data into table, users should call the open_writer method with with statement. At the same time, the open_reader method is used to provide the ability to read records from a table or its partition.

Example:

>>> table = odps.get_table('my_table')
>>> table.owner  # first will load from remote
>>> table.reload()  # reload to update the properties
>>>
>>> for record in table.head(5):
>>>     # check the first 5 records
>>> for record in table.head(5, partition='pt=test', columns=['my_column'])
>>>     # only check the `my_column` column from certain partition of this table
>>>
>>> with table.open_reader() as reader:
>>>     count = reader.count  # How many records of a table or its partition
>>>     for record in reader[0: count]:
>>>         # read all data, actually better to split into reading for many times
>>>
>>> with table.open_writer() as writer:
>>>     writer.write(records)
>>> with table.open_writer(partition='pt=test', blocks=[0, 1]):
>>>     writer.write(0, gen_records(block=0))
>>>     writer.write(1, gen_records(block=1))  # we can do this parallel
name

Name of the table

comment

Comment of the table

owner

Owner of the table

creation_time

Creation time of the table in local time.

last_data_modified_time

Last data modified time of the table in local time.

table_schema

Schema of the table, in TableSchema type.

type

Type of the table, can be managed_table, external_table, view or materialized_view.

size

Logical size of the table.

lifecycle

Lifecycle of the table in days.

class Type(value)[source]
add_columns(columns, if_not_exists=False, async_=False, hints=None, **inst_kw)[source]

Add columns to the table.

Parameters:
  • columns – columns to add, can be a list of Column or a string of column definitions

  • if_not_exists – if True, will not raise exception when column exists

Example:

>>> table = odps.create_table('test_table', schema=TableSchema.from_lists(['name', 'id'], ['sring', 'string']))
>>> # add column by Column instance
>>> table.add_columns([Column('id2', 'string')])
>>> # add column by a string of column definitions
>>> table.add_columns("fid double, fid2 double")
change_partition_spec(old_partition_spec, new_partition_spec, async_=False, hints=None, **inst_kw)[source]

Change partition spec of specified partition of the table.

Parameters:
  • old_partition_spec – old partition spec

  • new_partition_spec – new partition spec

create_partition(partition_spec, if_not_exists=False, async_=False, hints=None, **inst_kw)[source]

Create a partition within the table.

Parameters:
  • partition_spec – specification of the partition.

  • if_not_exists

  • hints

  • async

Returns:

partition object

Return type:

odps.models.partition.Partition

delete_columns(columns, async_=False, hints=None, **inst_kw)[source]

Delete columns from the table.

Parameters:

columns – columns to delete, can be a list of column names

delete_partition(partition_spec, if_exists=False, async_=False, hints=None, **inst_kw)[source]

Delete a partition within the table.

Parameters:
  • partition_spec – specification of the partition.

  • if_exists

  • hints

  • async

drop(async_=False, if_exists=False, hints=None, **inst_kw)[source]

Drop this table.

Parameters:
  • async – run asynchronously if True

  • if_exists

  • hints

Returns:

None

exist_partition(partition_spec)[source]

Check if a partition exists within the table.

Parameters:

partition_spec – specification of the partition.

exist_partitions(prefix_spec=None)[source]

Check if partitions with provided conditions exist.

Parameters:

prefix_spec – prefix of partition

Returns:

whether partitions exist

get_ddl(with_comments=True, if_not_exists=False, force_table_ddl=False)[source]

Get DDL SQL statement for the given table.

Parameters:
  • with_comments – append comment for table and each column

  • if_not_exists – generate if not exists code for generated DDL

  • force_table_ddl – force generate table DDL if object is a view

Returns:

DDL statement

get_max_partition(spec=None, skip_empty=True, reverse=False)[source]

Get partition with maximal values within certain spec.

Parameters:
  • spec – parent partitions. if specified, will return partition with maximal value within specified parent partition

  • skip_empty – if True, will skip partitions without data

  • reverse – if True, will return minimal value

Returns:

Partition

get_partition(partition_spec)[source]

Get a partition with given specifications.

Parameters:

partition_spec – specification of the partition.

Returns:

partition object

Return type:

odps.models.partition.Partition

head(limit, partition=None, columns=None, use_legacy=True, timeout=None, tags=None)[source]

Get the head records of a table or its partition.

Parameters:
  • limit (int) – records’ size, 10000 at most

  • partition – partition of this table

  • columns (list) – the columns which is subset of the table columns

Returns:

records

Return type:

list

iter_pandas(partition=None, columns=None, batch_size=None, start=None, count=None, quota_name=None, append_partitions=None, tags=None, **kwargs)[source]

Iterate table data in blocks as pandas DataFrame

Parameters:
  • partition – partition of this table

  • columns (list) – columns to read

  • batch_size (int) – size of DataFrame batch to read

  • start (int) – start row index from 0

  • count (int) – data count to read

  • append_partitions (bool) – if True, partition values will be appended to the output

  • quota_name (str) – name of tunnel quota to use

iterate_partitions(spec=None, reverse=False)[source]

Create an iterable object to iterate over partitions.

Parameters:
  • spec – specification of the partition.

  • reverse – output partitions in reversed order

new_record(values=None)[source]

Generate a record of the table.

Parameters:

values (list) – the values of this records

Returns:

record

Return type:

odps.models.Record

Example:

>>> table = odps.create_table('test_table', schema=TableSchema.from_lists(['name', 'id'], ['sring', 'string']))
>>> record = table.new_record()
>>> record[0] = 'my_name'
>>> record[1] = 'my_id'
>>> record = table.new_record(['my_name', 'my_id'])
open_reader(partition=None, reopen=False, endpoint=None, download_id=None, timeout=None, arrow=False, columns=None, quota_name=None, async_mode=True, append_partitions=None, tags=None, **kw)[source]

Open the reader to read the entire records from this table or its partition.

Parameters:
  • partition – partition of this table

  • reopen (bool) – the reader will reuse last one, reopen is true means open a new reader.

  • endpoint – the tunnel service URL

  • download_id – use existing download_id to download table contents

  • arrow – use arrow tunnel to read data

  • columns – columns to read

  • quota_name – name of tunnel quota

  • async_mode – enable async mode to create tunnels, can set True if session creation takes a long time.

  • compress_option (odps.tunnel.CompressOption) – compression algorithm, level and strategy

  • compress_algo – compression algorithm, work when compress_option is not provided, can be zlib, snappy

  • compress_level – used for zlib, work when compress_option is not provided

  • compress_strategy – used for zlib, work when compress_option is not provided

  • append_partitions (bool) – if True, partition values will be appended to the output

Returns:

reader, count means the full size, status means the tunnel status

Example:

>>> with table.open_reader() as reader:
>>>     count = reader.count  # How many records of a table or its partition
>>>     for record in reader[0: count]:
>>>         # read all data, actually better to split into reading for many times
open_writer(partition=None, blocks=None, reopen=False, create_partition=False, commit=True, endpoint=None, upload_id=None, arrow=False, quota_name=None, tags=None, mp_context=None, on_exception=None, **kw)[source]

Open the writer to write records into this table or its partition.

Parameters:
  • partition – partition of this table

  • blocks – block ids to open

  • reopen (bool) – the reader will reuse last one, reopen is true means open a new reader.

  • create_partition (bool) – if true, the partition will be created if not exist

  • endpoint – the tunnel service URL

  • upload_id – use existing upload_id to upload data

  • arrow – use arrow tunnel to write data

  • quota_name – name of tunnel quota

  • overwrite (bool) – if True, will overwrite existing data

  • compress_option (odps.tunnel.CompressOption) – compression algorithm, level and strategy

  • compress_algo – compression algorithm, work when compress_option is not provided, can be zlib, snappy

  • compress_level – used for zlib, work when compress_option is not provided

  • compress_strategy – used for zlib, work when compress_option is not provided

Returns:

writer, status means the tunnel writer status

Example:

>>> with table.open_writer() as writer:
>>>     writer.write(records)
>>> with table.open_writer(partition='pt=test', blocks=[0, 1]):
>>>     writer.write(0, gen_records(block=0))
>>>     writer.write(1, gen_records(block=1))  # we can do this parallel
rename(new_name, async_=False, hints=None, **inst_kw)[source]

Rename the table.

Parameters:

new_name – new table name

rename_column(old_column_name, new_column_name, comment=None, async_=False, hints=None, **inst_kw)[source]

Rename a column in the table.

Parameters:
  • old_column_name – old column name

  • new_column_name – new column name

  • comment – new column comment, optional

set_cluster_info(new_cluster_info, async_=False, hints=None, **inst_kw)[source]

Set cluster info of current table.

set_comment(new_comment, async_=False, hints=None, **inst_kw)[source]

Set comment of current table.

Parameters:

new_comment – new comment

set_lifecycle(days, async_=False, hints=None, **inst_kw)[source]

Set lifecycle of current table.

Parameters:

days – lifecycle in days

set_owner(new_owner, async_=False, hints=None, **inst_kw)[source]

Set owner of current table.

Parameters:

new_owner – account of the new owner

set_storage_tier(storage_tier, partition_spec=None, async_=False, hints=None, **inst_kw)[source]

Set storage tier of current table or specific partition.

to_df()[source]

Create a PyODPS DataFrame from this table.

Returns:

DataFrame object

to_pandas(partition=None, columns=None, start=None, count=None, n_process=1, n_thread=1, quota_name=None, append_partitions=None, tags=None, **kwargs)[source]

Read table data into pandas DataFrame

Parameters:
  • partition – partition of this table

  • columns (list) – columns to read

  • start (int) – start row index from 0

  • count (int) – data count to read

  • n_process (int) – number of processes to accelerate reading

  • append_partitions (bool) – if True, partition values will be appended to the output

  • quota_name (str) – name of tunnel quota to use

touch(partition_spec=None, async_=False, hints=None, **inst_kw)[source]

Update the last modified time of the table or specified partition.

Parameters:

partition_spec – partition spec, optional

truncate(partition_spec=None, async_=False, hints=None, **inst_kw)[source]

truncate this table.

Parameters:
  • partition_spec – partition specs

  • hints

  • async – run asynchronously if True

Returns:

None

class odps.models.partition.Partition(*args, **kwargs)[source]

A partition is a collection of rows in a table whose partition columns are equal to specific values.

In order to write data into partition, users should call the open_writer method with with statement. At the same time, the open_reader method is used to provide the ability to read records from a partition. The behavior of these methods are the same as those in Table class except that there are no ‘partition’ params.

change_partition_spec(new_partition_spec, async_=False, hints=None)[source]

Change partition spec of current partition.

Parameters:

new_partition_spec – new partition spec

drop(async_=False, if_exists=False)[source]

Drop this partition.

Parameters:
  • async – run asynchronously if True

  • if_exists

Returns:

None

head(limit, columns=None)[source]

Get the head records of a partition

Parameters:
  • limit – records’ size, 10000 at most

  • columns (list) – the columns which is subset of the table columns

Returns:

records

Return type:

list

iter_pandas(columns=None, batch_size=None, start=None, count=None, quota_name=None, append_partitions=None, tags=None, **kwargs)[source]

Read partition data into pandas DataFrame

Parameters:
  • columns (list) – columns to read

  • batch_size (int) – size of DataFrame batch to read

  • start (int) – start row index from 0

  • count (int) – data count to read

  • quota_name (str) – name of tunnel quota to use

  • append_partitions (bool) – if True, partition values will be appended to the output

open_reader(**kw)[source]

Open the reader to read the entire records from this partition.

Parameters:
  • reopen (bool) – the reader will reuse last one, reopen is true means open a new reader.

  • endpoint – the tunnel service URL

  • compress_option (odps.tunnel.CompressOption) – compression algorithm, level and strategy

  • compress_algo – compression algorithm, work when compress_option is not provided, can be zlib, snappy

  • compress_level – used for zlib, work when compress_option is not provided

  • compress_strategy – used for zlib, work when compress_option is not provided

Returns:

reader, count means the full size, status means the tunnel status

Example:

>>> with partition.open_reader() as reader:
>>>     count = reader.count  # How many records of a partition
>>>     for record in reader[0: count]:
>>>         # read all data, actually better to split into reading for many times
set_storage_tier(storage_tier, async_=False, hints=None)[source]

Set storage tier of current partition.

to_df()[source]

Create a PyODPS DataFrame from this partition.

Returns:

DataFrame object

to_pandas(columns=None, start=None, count=None, n_process=1, n_thread=1, quota_name=None, append_partitions=None, tags=None, **kwargs)[source]

Read partition data into pandas DataFrame

Parameters:
  • columns (list) – columns to read

  • start (int) – start row index from 0

  • count (int) – data count to read

  • n_process (int) – number of processes to accelerate reading

  • quota_name (str) – name of tunnel quota to use

  • append_partitions (bool) – if True, partition values will be appended to the output

touch(async_=False, hints=None)[source]

Update the last modified time of the partition.

truncate(async_=False)[source]

Truncate current partition.

class odps.models.Instance(*args, **kwargs)[source]

Instance means that a ODPS task will sometimes run as an instance.

status can reflect the current situation of a instance. is_terminated method indicates if the instance has finished. is_successful method indicates if the instance runs successfully. wait_for_success method will block the main process until the instance has finished.

For a SQL instance, we can use open_reader to read the results.

Example:

>>> instance = odps.execute_sql('select * from dual')  # this sql return the structured data
>>> with instance.open_reader() as reader:
>>>     # handle the record
>>>
>>> instance = odps.execute_sql('desc dual')  # this sql do not return structured data
>>> with instance.open_reader() as reader:
>>>    print(reader.raw)  # just return the raw result
exception DownloadSessionCreationError(msg, request_id=None, code=None, host_id=None, instance_id=None, endpoint=None, tag=None, response_headers=None, status_code=None)[source]
class Status(value)[source]
class Task(**kwargs)[source]

Task stands for each task inside an instance.

It has a name, a task type, the start to end time, and a running status.

class TaskProgress(**kwargs)[source]

TaskProgress reprents for the progress of a task.

A single TaskProgress may consist of several stages.

Example:

>>> progress = instance.get_task_progress('task_name')
>>> progress.get_stage_progress_formatted_string()
2015-11-19 16:39:07 M1_Stg1_job0:0/0/1[0%]  R2_1_Stg1_job0:0/0/1[0%]
class TaskStatus(value)[source]
class TaskSummary(*args, **kwargs)[source]
get_logview_address(hours=None, use_legacy=None)[source]

Get logview address of the instance object by hours.

Parameters:

hours

Returns:

logview address

Return type:

str

get_sql_task_cost()[source]

Get cost information of the sql cost task, including input data size, number of UDF, Complexity of the sql task.

NOTE that DO NOT use this function directly as it cannot be applied to instances returned from SQL. Use o.execute_sql_cost instead.

Returns:

cost info in dict format

get_task_cost(task_name=None)[source]

Get task cost

Parameters:

task_name – name of the task

Returns:

task cost

Return type:

Instance.TaskCost

Example:

>>> cost = instance.get_task_cost(instance.get_task_names()[0])
>>> cost.cpu_cost
200
>>> cost.memory_cost
4096
>>> cost.input_size
0
get_task_detail(task_name=None)[source]

Get task’s detail

Parameters:

task_name – task name

Returns:

the task’s detail

Return type:

list or dict according to the JSON

get_task_detail2(task_name=None, **kw)[source]

Get task’s detail v2

Parameters:

task_name – task name

Returns:

the task’s detail

Return type:

list or dict according to the JSON

get_task_info(task_name, key, raise_empty=False)[source]

Get task related information.

Parameters:
  • task_name – name of the task

  • key – key of the information item

  • raise_empty – if True, will raise error when response is empty

Returns:

a string of the task information

get_task_names(retry=True, timeout=None)[source]

Get names of all tasks

Returns:

task names

Return type:

list

get_task_progress(task_name=None)[source]

Get task’s current progress

Parameters:

task_name – task_name

Returns:

the task’s progress

Return type:

odps.models.Instance.Task.TaskProgress

get_task_quota(task_name=None)[source]

Get queueing info of the task. Note that time between two calls should larger than 30 seconds, otherwise empty dict is returned.

Parameters:

task_name – name of the task

Returns:

quota info in dict format

get_task_result(task_name=None, timeout=None, retry=True)[source]

Get a single task result.

Parameters:

task_name – task name

Returns:

task result

Return type:

str

get_task_results(timeout=None, retry=True)[source]

Get all the task results.

Returns:

a dict which key is task name, and value is the task result as string

Return type:

dict

get_task_statuses(retry=True, timeout=None, on_exception=None)[source]

Get all tasks’ statuses

Returns:

a dict which key is the task name and value is the odps.models.Instance.Task object

Return type:

dict

get_task_summary(task_name=None)[source]

Get a task’s summary, mostly used for MapReduce.

Parameters:

task_name – task name

Returns:

summary as a dict parsed from JSON

Return type:

dict

get_task_workers(task_name=None, json_obj=None)[source]

Get workers from task :param task_name: task name :param json_obj: json object parsed from get_task_detail2 :return: list of workers

get_worker_log(log_id, log_type, size=0)[source]

Get logs from worker.

Parameters:
  • log_id – id of log, can be retrieved from details.

  • log_type – type of logs. Possible log types contains coreinfo, hs_err_log, jstack, pstack, stderr, stdout, waterfall_summary

  • size – length of the log to retrieve

Returns:

log content

is_running(retry=True, blocking=False, retry_timeout=None, on_exception=None)[source]

If this instance is still running.

Returns:

True if still running else False

Return type:

bool

is_successful(retry=True, retry_timeout=None, on_exception=None)[source]

If the instance runs successfully.

Returns:

True if successful else False

Return type:

bool

is_terminated(retry=True, blocking=False, retry_timeout=None, on_exception=None)[source]

If this instance has finished or not.

Returns:

True if finished else False

Return type:

bool

iter_pandas(columns=None, limit=None, batch_size=None, start=None, count=None, quota_name=None, tags=None, **kwargs)[source]

Iterate table data in blocks as pandas DataFrame. The limit argument follows definition of open_reader API.

Parameters:
  • columns (list) – columns to read

  • limit (bool) – if True, enable the limitation

  • batch_size (int) – size of DataFrame batch to read

  • start (int) – start row index from 0

  • count (int) – data count to read

  • quota_name (str) – name of tunnel quota to use

open_reader(*args, **kwargs)[source]

Open the reader to read records from the result of the instance. If tunnel is True, instance tunnel will be used. Otherwise conventional routine will be used. If instance tunnel is not available and tunnel is not specified, the method will fall back to the conventional routine. Note that the number of records returned is limited unless options.limited_instance_tunnel is set to True or limit=True is configured under instance tunnel mode. Otherwise the number of records returned is always limited.

Parameters:
  • tunnel – if true, use instance tunnel to read from the instance. if false, use conventional routine. if absent, options.tunnel.use_instance_tunnel will be used and automatic fallback is enabled.

  • limit (bool) – if True, enable the limitation

  • reopen (bool) – the reader will reuse last one, reopen is true means open a new reader.

  • endpoint – the tunnel service URL

  • compress_option (odps.tunnel.CompressOption) – compression algorithm, level and strategy

  • compress_algo – compression algorithm, work when compress_option is not provided, can be zlib, snappy

  • compress_level – used for zlib, work when compress_option is not provided

  • compress_strategy – used for zlib, work when compress_option is not provided

Returns:

reader, count means the full size, status means the tunnel status

Example:

>>> with instance.open_reader() as reader:
>>>     count = reader.count  # How many records of a table or its partition
>>>     for record in reader[0: count]:
>>>         # read all data, actually better to split into reading for many times
put_task_info(task_name, key, value, check_location=False, raise_empty=False)[source]

Put information into a task.

Parameters:
  • task_name – name of the task

  • key – key of the information item

  • value – value of the information item

  • check_location – raises if Location header is missing

  • raise_empty – if True, will raise error when response is empty

stop()[source]

Stop this instance.

Returns:

None

to_pandas(columns=None, limit=None, start=None, count=None, n_process=1, quota_name=None, tags=None, **kwargs)[source]

Read instance data into pandas DataFrame. The limit argument follows definition of open_reader API.

Parameters:
  • columns (list) – columns to read

  • limit (bool) – if True, enable the limitation

  • start (int) – start row index from 0

  • count (int) – data count to read

  • n_process (int) – number of processes to accelerate reading

  • quota_name (str) – name of tunnel quota to use

wait_for_completion(interval=1, timeout=None, max_interval=None, blocking=True, on_exception=None)[source]

Wait for the instance to complete, and neglect the consequence.

Parameters:
  • interval – time interval to check

  • max_interval – if specified, next check interval will be multiplied by 2 till max_interval is reached.

  • timeout – time

  • blocking – whether to block waiting at server side. Note that this option does not affect client behavior.

  • on_exception – custom error handling function accepting an Exception instance as input. If return value is True, error will be raised. Otherwise retry will continue.

Returns:

None

wait_for_success(interval=1, timeout=None, max_interval=None, blocking=True, on_exception=None)[source]

Wait for instance to complete, and check if the instance is successful.

Parameters:
  • interval – time interval to check

  • max_interval – if specified, next check interval will be multiplied by 2 till max_interval is reached.

  • timeout – time

  • blocking – whether to block waiting at server side. Note that this option does not affect client behavior.

  • on_exception – custom error handling function accepting an Exception instance as input. If return value is True, error will be raised. Otherwise retry will continue.

Returns:

None

Raise:

odps.errors.ODPSError if the instance failed

class odps.models.Resource(*args, **kwargs)[source]

Resource is useful when writing UDF or MapReduce. This is an abstract class.

Basically, resource can be either a file resource or a table resource. File resource can be file, py, jar, archive in details.

class Type(value)[source]
class odps.models.FileResource(*args, **kwargs)[source]

File resource represents for a file.

Use open method to open this resource as a file-like object.

class Mode(value)[source]
close()[source]

Close this file resource.

Returns:

None

flush()[source]

Commit the change to ODPS if any change happens. Close will do this automatically.

Returns:

None

open(mode='r', encoding='utf-8', stream=False, overwrite=None)[source]

The argument mode stands for the open mode for this file resource. It can be binary mode if the ‘b’ is inside. For instance, ‘rb’ means opening the resource as read binary mode while ‘r+b’ means opening the resource as read+write binary mode. This is most import when the file is actually binary such as tar or jpeg file, so be aware of opening this file as a correct mode.

Basically, the text mode can be ‘r’, ‘w’, ‘a’, ‘r+’, ‘w+’, ‘a+’ just like the builtin python open method.

  • r means read only

  • w means write only, the file will be truncated when opening

  • a means append only

  • r+ means read+write without constraint

  • w+ will truncate first then opening into read+write

  • a+ can read+write, however the written content can only be appended to the end

Parameters:
  • mode – the mode of opening file, described as above

  • encoding – utf-8 as default

  • stream – open in stream mode

  • overwrite – if True, will overwrite existing resource. True by default.

Returns:

file-like object

Example:

>>> with resource.open('r') as fp:
>>>     fp.read(1)  # read one unicode character
>>>     fp.write('test')  # wrong, cannot write under read mode
>>>
>>> with resource.open('wb') as fp:
>>>     fp.readlines()  # wrong, cannot read under write mode
>>>     fp.write('hello world')  # write bytes
>>>
>>> with resource.open('test_resource', 'r+') as fp:  # open as read-write mode
>>>     fp.seek(5)
>>>     fp.truncate()
>>>     fp.flush()
read(size=-1)[source]

Read the file resource, read all as default.

Parameters:

size – unicode or byte length depends on text mode or binary mode.

Returns:

unicode or bytes depends on text mode or binary mode

Return type:

str or unicode(Py2), bytes or str(Py3)

readline(size=-1)[source]

Read a single line.

Parameters:

size – If the size argument is present and non-negative, it is a maximum byte count (including the trailing newline) and an incomplete line may be returned. When size is not 0, an empty string is returned only when EOF is encountered immediately

Returns:

unicode or bytes depends on text mode or binary mode

Return type:

str or unicode(Py2), bytes or str(Py3)

readlines(sizehint=-1)[source]

Read as lines.

Parameters:

sizehint – If the optional sizehint argument is present, instead of reading up to EOF, whole lines totalling approximately sizehint bytes (possibly after rounding up to an internal buffer size) are read.

Returns:

lines

Return type:

list

seek(pos, whence=0)[source]

Seek to some place.

Parameters:
  • pos – position to seek

  • whence – if set to 2, will seek to the end

Returns:

None

tell()[source]

Tell the current position

Returns:

current position

truncate(size=None)[source]

Truncate the file resource’s size.

Parameters:

size – If the optional size argument is present, the file is truncated to (at most) that size. The size defaults to the current position.

Returns:

None

write(content)[source]

Write content into the file resource

Parameters:

content – content to write

Returns:

None

writelines(seq)[source]

Write lines into the file resource.

Parameters:

seq – lines

Returns:

None

class odps.models.PyResource(*args, **kwargs)[source]

File resource representing for the .py file.

class odps.models.JarResource(*args, **kwargs)[source]

File resource representing for the .jar file.

class odps.models.ArchiveResource(*args, **kwargs)[source]

File resource representing for the compressed file like .zip/.tgz/.tar.gz/.tar/jar

class odps.models.TableResource(*args, **kwargs)[source]

Take a table as a resource.

open_reader(**kwargs)[source]

Open reader on the table resource

open_writer(**kwargs)[source]

Open writer on the table resource

property partition

Get the source table partition.

Returns:

the source table partition

property table

Get the table object.

Returns:

source table

Return type:

odps.models.Table

update(table_project_name=None, table_schema_name=None, table_name=None, *args, **kw)[source]

Update this resource.

Parameters:
  • table_project_name – the source table’s project

  • table_name – the source table’s name

  • partition – the source table’s partition

Returns:

self

class odps.models.Function(*args, **kwargs)[source]

Function can be used in UDF when user writes a SQL.

drop()[source]

Delete this Function.

Returns:

None

property resources

Return all the resources which this function refer to.

Returns:

resources

Return type:

list

update()[source]

Update this function.

Returns:

None

class odps.models.Worker(**kwargs)[source]

Worker information class for worker information and log retrieval.

get_log(log_type, size=0)[source]

Get logs from worker.

Parameters:
  • log_type – type of logs. Possible log types contains coreinfo, hs_err_log, jstack, pstack, stderr, stdout, waterfall_summary

  • size – length of the log to retrieve

Returns:

log content

class odps.models.ml.OfflineModel(*args, **kwargs)[source]

Representing an ODPS offline model.

copy(new_name, new_project=None, async_=False)[source]

Copy current model into a new location.

Parameters:
  • new_name – name of the new model

  • new_project – new project name. if absent, original project name will be used

  • async – if True, return the copy instance. otherwise return the newly-copied model

get_model()[source]

Get PMML text of the current model. Note that model file obtained via this method might be incomplete due to size limitations.

class odps.models.security.User(*args, **kwargs)[source]