Model objects

class odps.models.Project(*args, **kwargs)[source]

Project is the counterpart of database in a RDBMS.

By get an object of Project, users can get the properties like name, owner, comment, creation_time, last_modified_time, and so on.

These properties will not load from remote ODPS service, unless users try to get them explicitly. If users want to check the newest status, try use reload method.

Example:

>>> project = odps.get_project('my_project')
>>> project.last_modified_time  # this property will be fetched from the remote ODPS service
>>> project.last_modified_time  # Once loaded, the property will not bring remote call
>>> project.owner  # so do the other properties, they are fetched together
>>> project.reload()  # force to update each properties
>>> project.last_modified_time  # already updated

class AuthQueryStatus(value)[source]

class ProjectStatus(value)[source]

class ProjectType(value)[source]

class odps.models.Table(*args, **kwargs)[source]

Table means the same to the RDBMS table, besides, a table can consist of partitions.

Table’s properties are the same to the ones of odps.models.Project, which will not load from remote ODPS service until users try to get them.

In order to write data into table, users should call the open_writer method with with statement. At the same time, the open_reader method is used to provide the ability to read records from a table or its partition.

Example:

>>> table = odps.get_table('my_table')
>>> table.owner  # first will load from remote
>>> table.reload()  # reload to update the properties
>>>
>>> for record in table.head(5):
>>>     # check the first 5 records
>>> for record in table.head(5, partition='pt=test', columns=['my_column'])
>>>     # only check the `my_column` column from certain partition of this table
>>>
>>> with table.open_reader() as reader:
>>>     count = reader.count  # How many records of a table or its partition
>>>     for record in reader[0: count]:
>>>         # read all data, actually better to split into reading for many times
>>>
>>> with table.open_writer() as writer:
>>>     writer.write(records)
>>> with table.open_writer(partition='pt=test', blocks=[0, 1]):
>>>     writer.write(0, gen_records(block=0))
>>>     writer.write(1, gen_records(block=1))  # we can do this parallel

name: Name of the table

comment: Comment of the table

owner: Owner of the table

creation_time: Creation time of the table in local time.

last_data_modified_time: Last data modified time of the table in local time.

table_schema: Schema of the table, in TableSchema type.

type: Type of the table, can be managed_table, external_table, view or materialized_view.

size: Logical size of the table.

lifecycle: Lifecycle of the table in days.

class Type(value)[source]

add_columns(columns, if_not_exists=False, async_=False, hints=None, **inst_kw)[source]

Add columns to the table.

Parameters:

columns – columns to add, can be a list of Column or a string of column definitions
if_not_exists – if True, will not raise exception when column exists

Example:

>>> table = odps.create_table('test_table', schema=TableSchema.from_lists(['name', 'id'], ['sring', 'string']))
>>> # add column by Column instance
>>> table.add_columns([Column('id2', 'string')])
>>> # add column by a string of column definitions
>>> table.add_columns("fid double, fid2 double")

change_partition_spec(old_partition_spec, new_partition_spec, async_=False, hints=None, **inst_kw)[source]

Change partition spec of specified partition of the table.

Parameters:

old_partition_spec – old partition spec
new_partition_spec – new partition spec

create_partition(partition_spec, if_not_exists=False, async_=False, hints=None, **inst_kw)[source]

Create a partition within the table.

Parameters:

partition_spec – specification of the partition.
if_not_exists
hints
async

Returns:

partition object

Return type:

odps.models.partition.Partition

delete_columns(columns, async_=False, hints=None, **inst_kw)[source]

Delete columns from the table.

Parameters:: columns – columns to delete, can be a list of column names

delete_partition(partition_spec, if_exists=False, async_=False, hints=None, **inst_kw)[source]

Delete a partition within the table.

Parameters:

partition_spec – specification of the partition.
if_exists
hints
async

drop(async_=False, if_exists=False, hints=None, **inst_kw)[source]

Drop this table.

Parameters:

async – run asynchronously if True
if_exists
hints

Returns:

None

exist_partition(partition_spec)[source]

Check if a partition exists within the table.

Parameters:: partition_spec – specification of the partition.

exist_partitions(prefix_spec=None)[source]

Check if partitions with provided conditions exist.

Parameters:: prefix_spec – prefix of partition
Returns:: whether partitions exist

get_ddl(with_comments=True, if_not_exists=False, force_table_ddl=False)[source]

Get DDL SQL statement for the given table.

Parameters:

with_comments – append comment for table and each column
if_not_exists – generate if not exists code for generated DDL
force_table_ddl – force generate table DDL if object is a view

Returns:

DDL statement

get_max_partition(spec=None, skip_empty=True, reverse=False)[source]

Get partition with maximal values within certain spec.

Parameters:

spec – parent partitions. if specified, will return partition with maximal value within specified parent partition
skip_empty – if True, will skip partitions without data
reverse – if True, will return minimal value

Returns:

Partition

get_partition(partition_spec)[source]

Get a partition with given specifications.

Parameters:: partition_spec – specification of the partition.
Returns:: partition object
Return type:: odps.models.partition.Partition

head(limit, partition=None, columns=None, use_legacy=True, timeout=None, tags=None)[source]

Get the head records of a table or its partition.

Parameters:

limit (int) – records’ size, 10000 at most
partition – partition of this table
columns (list) – the columns which is subset of the table columns

Returns:

records

Return type:

list

See also

odps.models.Record

iter_pandas(partition=None, columns=None, batch_size=None, start=None, count=None, quota_name=None, append_partitions=None, tags=None, **kwargs)[source]

Iterate table data in blocks as pandas DataFrame

Parameters:

partition – partition of this table
columns (list) – columns to read
batch_size (int) – size of DataFrame batch to read
start (int) – start row index from 0
count (int) – data count to read
append_partitions (bool) – if True, partition values will be appended to the output
quota_name (str) – name of tunnel quota to use

iterate_partitions(spec=None, reverse=False)[source]

Create an iterable object to iterate over partitions.

Parameters:

spec – specification of the partition.
reverse – output partitions in reversed order

new_record(values=None)[source]

Generate a record of the table.

Parameters:: values (list) – the values of this records
Returns:: record
Return type:: odps.models.Record
Example:

>>> table = odps.create_table('test_table', schema=TableSchema.from_lists(['name', 'id'], ['sring', 'string']))
>>> record = table.new_record()
>>> record[0] = 'my_name'
>>> record[1] = 'my_id'
>>> record = table.new_record(['my_name', 'my_id'])

See also

odps.models.Record

open_reader(partition=None, reopen=False, endpoint=None, download_id=None, timeout=None, arrow=False, columns=None, quota_name=None, async_mode=True, append_partitions=None, tags=None, **kw)[source]

Open the reader to read the entire records from this table or its partition.

Parameters:

partition – partition of this table
reopen (bool) – the reader will reuse last one, reopen is true means open a new reader.
endpoint – the tunnel service URL
download_id – use existing download_id to download table contents
arrow – use arrow tunnel to read data
columns – columns to read
quota_name – name of tunnel quota
async_mode – enable async mode to create tunnels, can set True if session creation takes a long time.
compress_option (odps.tunnel.CompressOption) – compression algorithm, level and strategy
compress_algo – compression algorithm, work when compress_option is not provided, can be zlib, snappy
compress_level – used for zlib, work when compress_option is not provided
compress_strategy – used for zlib, work when compress_option is not provided
append_partitions (bool) – if True, partition values will be appended to the output

Returns:

reader, count means the full size, status means the tunnel status

Example:

>>> with table.open_reader() as reader:
>>>     count = reader.count  # How many records of a table or its partition
>>>     for record in reader[0: count]:
>>>         # read all data, actually better to split into reading for many times

open_writer(partition=None, blocks=None, reopen=False, create_partition=False, commit=True, endpoint=None, upload_id=None, arrow=False, quota_name=None, tags=None, mp_context=None, on_exception=None, **kw)[source]

Open the writer to write records into this table or its partition.

Parameters:

partition – partition of this table
blocks – block ids to open
reopen (bool) – the reader will reuse last one, reopen is true means open a new reader.
create_partition (bool) – if true, the partition will be created if not exist
endpoint – the tunnel service URL
upload_id – use existing upload_id to upload data
arrow – use arrow tunnel to write data
quota_name – name of tunnel quota
overwrite (bool) – if True, will overwrite existing data
compress_option (odps.tunnel.CompressOption) – compression algorithm, level and strategy
compress_algo – compression algorithm, work when compress_option is not provided, can be zlib, snappy
compress_level – used for zlib, work when compress_option is not provided
compress_strategy – used for zlib, work when compress_option is not provided

Returns:

writer, status means the tunnel writer status

Example:

>>> with table.open_writer() as writer:
>>>     writer.write(records)
>>> with table.open_writer(partition='pt=test', blocks=[0, 1]):
>>>     writer.write(0, gen_records(block=0))
>>>     writer.write(1, gen_records(block=1))  # we can do this parallel

rename(new_name, async_=False, hints=None, **inst_kw)[source]

Rename the table.

Parameters:: new_name – new table name

rename_column(old_column_name, new_column_name, comment=None, async_=False, hints=None, **inst_kw)[source]

Rename a column in the table.

Parameters:

old_column_name – old column name
new_column_name – new column name
comment – new column comment, optional

set_cluster_info(new_cluster_info, async_=False, hints=None, **inst_kw)[source]: Set cluster info of current table.

set_comment(new_comment, async_=False, hints=None, **inst_kw)[source]

Set comment of current table.

Parameters:: new_comment – new comment

set_lifecycle(days, async_=False, hints=None, **inst_kw)[source]

Set lifecycle of current table.

Parameters:: days – lifecycle in days

set_owner(new_owner, async_=False, hints=None, **inst_kw)[source]

Set owner of current table.

Parameters:: new_owner – account of the new owner

set_storage_tier(storage_tier, partition_spec=None, async_=False, hints=None, **inst_kw)[source]: Set storage tier of current table or specific partition.

to_df()[source]

Create a PyODPS DataFrame from this table.

Returns:: DataFrame object

to_pandas(partition=None, columns=None, start=None, count=None, n_process=1, n_thread=1, quota_name=None, append_partitions=None, tags=None, **kwargs)[source]

Read table data into pandas DataFrame

Parameters:

partition – partition of this table
columns (list) – columns to read
start (int) – start row index from 0
count (int) – data count to read
n_process (int) – number of processes to accelerate reading
append_partitions (bool) – if True, partition values will be appended to the output
quota_name (str) – name of tunnel quota to use

touch(partition_spec=None, async_=False, hints=None, **inst_kw)[source]

Update the last modified time of the table or specified partition.

Parameters:: partition_spec – partition spec, optional

truncate(partition_spec=None, async_=False, hints=None, **inst_kw)[source]

truncate this table.

Parameters:

partition_spec – partition specs
hints
async – run asynchronously if True

Returns:

None

class odps.models.partition.Partition(*args, **kwargs)[source]

A partition is a collection of rows in a table whose partition columns are equal to specific values.

In order to write data into partition, users should call the open_writer method with with statement. At the same time, the open_reader method is used to provide the ability to read records from a partition. The behavior of these methods are the same as those in Table class except that there are no ‘partition’ params.

change_partition_spec(new_partition_spec, async_=False, hints=None)[source]

Change partition spec of current partition.

Parameters:: new_partition_spec – new partition spec

drop(async_=False, if_exists=False)[source]

Drop this partition.

Parameters:

async – run asynchronously if True
if_exists

Returns:

None

head(limit, columns=None)[source]

Get the head records of a partition

Parameters:

limit – records’ size, 10000 at most
columns (list) – the columns which is subset of the table columns

Returns:

records

Return type:

list

See also

odps.models.Record

iter_pandas(columns=None, batch_size=None, start=None, count=None, quota_name=None, append_partitions=None, tags=None, **kwargs)[source]

Read partition data into pandas DataFrame

Parameters:

columns (list) – columns to read
batch_size (int) – size of DataFrame batch to read
start (int) – start row index from 0
count (int) – data count to read
quota_name (str) – name of tunnel quota to use
append_partitions (bool) – if True, partition values will be appended to the output

open_reader(**kw)[source]

Open the reader to read the entire records from this partition.

Parameters:

reopen (bool) – the reader will reuse last one, reopen is true means open a new reader.
endpoint – the tunnel service URL
compress_option (odps.tunnel.CompressOption) – compression algorithm, level and strategy
compress_algo – compression algorithm, work when compress_option is not provided, can be zlib, snappy
compress_level – used for zlib, work when compress_option is not provided
compress_strategy – used for zlib, work when compress_option is not provided

Returns:

reader, count means the full size, status means the tunnel status

Example:

>>> with partition.open_reader() as reader:
>>>     count = reader.count  # How many records of a partition
>>>     for record in reader[0: count]:
>>>         # read all data, actually better to split into reading for many times

set_storage_tier(storage_tier, async_=False, hints=None)[source]: Set storage tier of current partition.

to_df()[source]

Create a PyODPS DataFrame from this partition.

Returns:: DataFrame object

to_pandas(columns=None, start=None, count=None, n_process=1, n_thread=1, quota_name=None, append_partitions=None, tags=None, **kwargs)[source]

Read partition data into pandas DataFrame

Parameters:

columns (list) – columns to read
start (int) – start row index from 0
count (int) – data count to read
n_process (int) – number of processes to accelerate reading
quota_name (str) – name of tunnel quota to use
append_partitions (bool) – if True, partition values will be appended to the output

touch(async_=False, hints=None)[source]: Update the last modified time of the partition.

truncate(async_=False)[source]: Truncate current partition.

class odps.models.Instance(*args, **kwargs)[source]

Instance means that a ODPS task will sometimes run as an instance.

status can reflect the current situation of a instance. is_terminated method indicates if the instance has finished. is_successful method indicates if the instance runs successfully. wait_for_success method will block the main process until the instance has finished.

For a SQL instance, we can use open_reader to read the results.

Example:

>>> instance = odps.execute_sql('select * from dual')  # this sql return the structured data
>>> with instance.open_reader() as reader:
>>>     # handle the record
>>>
>>> instance = odps.execute_sql('desc dual')  # this sql do not return structured data
>>> with instance.open_reader() as reader:
>>>    print(reader.raw)  # just return the raw result

exception DownloadSessionCreationError(msg, request_id=None, code=None, host_id=None, instance_id=None, endpoint=None, tag=None, response_headers=None, status_code=None)[source]

class Status(value)[source]

class Task(**kwargs)[source]

Task stands for each task inside an instance.

It has a name, a task type, the start to end time, and a running status.

class TaskProgress(**kwargs)[source]

TaskProgress reprents for the progress of a task.

A single TaskProgress may consist of several stages.

Example:

>>> progress = instance.get_task_progress('task_name')
>>> progress.get_stage_progress_formatted_string()
2015-11-19 16:39:07 M1_Stg1_job0:0/0/1[0%]  R2_1_Stg1_job0:0/0/1[0%]

class TaskStatus(value)[source]

class TaskSummary(*args, **kwargs)[source]

get_logview_address(hours=None, use_legacy=None)[source]

Get logview address of the instance object by hours.

Parameters:: hours
Returns:: logview address
Return type:: str

get_sql_task_cost()[source]

Get cost information of the sql cost task, including input data size, number of UDF, Complexity of the sql task.

NOTE that DO NOT use this function directly as it cannot be applied to instances returned from SQL. Use o.execute_sql_cost instead.

Returns:: cost info in dict format

get_task_cost(task_name=None)[source]

Get task cost

Parameters:: task_name – name of the task
Returns:: task cost
Return type:: Instance.TaskCost
Example:

>>> cost = instance.get_task_cost(instance.get_task_names()[0])
>>> cost.cpu_cost
200
>>> cost.memory_cost
4096
>>> cost.input_size
0

get_task_detail(task_name=None)[source]

Get task’s detail

Parameters:: task_name – task name
Returns:: the task’s detail
Return type:: list or dict according to the JSON

get_task_detail2(task_name=None, **kw)[source]

Get task’s detail v2

Parameters:: task_name – task name
Returns:: the task’s detail
Return type:: list or dict according to the JSON

get_task_info(task_name, key, raise_empty=False)[source]

Get task related information.

Parameters:

task_name – name of the task
key – key of the information item
raise_empty – if True, will raise error when response is empty

Returns:

a string of the task information

get_task_names(retry=True, timeout=None)[source]

Get names of all tasks

Returns:: task names
Return type:: list

get_task_progress(task_name=None)[source]

Get task’s current progress

Parameters:: task_name – task_name
Returns:: the task’s progress
Return type:: odps.models.Instance.Task.TaskProgress

get_task_quota(task_name=None)[source]

Get queueing info of the task. Note that time between two calls should larger than 30 seconds, otherwise empty dict is returned.

Parameters:: task_name – name of the task
Returns:: quota info in dict format

get_task_result(task_name=None, timeout=None, retry=True)[source]

Get a single task result.

Parameters:: task_name – task name
Returns:: task result
Return type:: str

get_task_results(timeout=None, retry=True)[source]

Get all the task results.

Returns:: a dict which key is task name, and value is the task result as string
Return type:: dict

get_task_statuses(retry=True, timeout=None, on_exception=None, status_only=False)[source]

Get all tasks’ statuses

Parameters:: status_only – if True, allow using cached status from sync response instead of making HTTP request
Returns:: a dict which key is the task name and value is the odps.models.Instance.Task object
Return type:: dict

get_task_summary(task_name=None, with_finalized=False)[source]

Get a task’s summary, mostly used for MapReduce.

Parameters:

task_name – task name
with_finalized – if True, return an empty TaskSummary with the finalized attribute when no summary body is available but the task ends.

Returns:

summary dict with extra attributes: * summary_text: plain text summary * json_summary: raw JSON string * finalized: (only when with_finalized=True) True if task is

finalized (summary won’t change), False if not yet finalized, None if server did not return the x-odps-task-finalized header.

Return type:

Instance.TaskSummary or None

get_task_workers(task_name=None, json_obj=None)[source]: Get workers from task :param task_name: task name :param json_obj: json object parsed from get_task_detail2 :return: list of workers

See also

odps.models.Worker

get_worker_log(log_id, log_type, size=0)[source]

Get logs from worker.

Parameters:

log_id – id of log, can be retrieved from details.
log_type – type of logs. Possible log types contains coreinfo, hs_err_log, jstack, pstack, stderr, stdout, waterfall_summary
size – length of the log to retrieve

Returns:

log content

is_running(retry=True, blocking=False, retry_timeout=None, on_exception=None)[source]

If this instance is still running.

Returns:: True if still running else False
Return type:: bool

is_successful(retry=True, retry_timeout=None, on_exception=None)[source]

If the instance runs successfully.

Returns:: True if successful else False
Return type:: bool

is_terminated(retry=True, blocking=False, retry_timeout=None, on_exception=None)[source]

If this instance has finished or not.

Returns:: True if finished else False
Return type:: bool

iter_pandas(columns=None, limit=None, batch_size=None, start=None, count=None, quota_name=None, tags=None, **kwargs)[source]

Iterate table data in blocks as pandas DataFrame. The limit argument follows definition of open_reader API.

Parameters:

columns (list) – columns to read
limit (bool) – if True, enable the limitation
batch_size (int) – size of DataFrame batch to read
start (int) – start row index from 0
count (int) – data count to read
quota_name (str) – name of tunnel quota to use

open_reader(*args, **kwargs)[source]

Open the reader to read records from the result of the instance. If tunnel is True, instance tunnel will be used. Otherwise conventional routine will be used. If instance tunnel is not available and tunnel is not specified, the method will fall back to the conventional routine. Note that the number of records returned is limited unless options.limited_instance_tunnel is set to True or limit=True is configured under instance tunnel mode. Otherwise the number of records returned is always limited.

Parameters:

tunnel – if true, use instance tunnel to read from the instance. if false, use conventional routine. if absent, options.tunnel.use_instance_tunnel will be used and automatic fallback is enabled.
limit (bool) – if True, enable the limitation
reopen (bool) – the reader will reuse last one, reopen is true means open a new reader.
endpoint – the tunnel service URL
compress_option (odps.tunnel.CompressOption) – compression algorithm, level and strategy
compress_algo – compression algorithm, work when compress_option is not provided, can be zlib, snappy
compress_level – used for zlib, work when compress_option is not provided
compress_strategy – used for zlib, work when compress_option is not provided

Returns:

reader, count means the full size, status means the tunnel status

Example:

>>> with instance.open_reader() as reader:
>>>     count = reader.count  # How many records of a table or its partition
>>>     for record in reader[0: count]:
>>>         # read all data, actually better to split into reading for many times

put_task_info(task_name, key, value, check_location=False, raise_empty=False)[source]

Put information into a task.

Parameters:

task_name – name of the task
key – key of the information item
value – value of the information item
check_location – raises if Location header is missing
raise_empty – if True, will raise error when response is empty

stop()[source]

Stop this instance.

Returns:: None

to_pandas(columns=None, limit=None, start=None, count=None, n_process=1, quota_name=None, tags=None, **kwargs)[source]

Read instance data into pandas DataFrame. The limit argument follows definition of open_reader API.

Parameters:

columns (list) – columns to read
limit (bool) – if True, enable the limitation
start (int) – start row index from 0
count (int) – data count to read
n_process (int) – number of processes to accelerate reading
quota_name (str) – name of tunnel quota to use

wait_for_completion(interval=1, timeout=None, max_interval=None, blocking=True, on_exception=None)[source]

Wait for the instance to complete, and neglect the consequence.

Parameters:

interval – time interval to check
max_interval – if specified, next check interval will be multiplied by 2 till max_interval is reached.
timeout – time
blocking – whether to block waiting at server side. Note that this option does not affect client behavior.
on_exception – custom error handling function accepting an Exception instance as input. If return value is True, error will be raised. Otherwise retry will continue.

Returns:

None

wait_for_success(interval=1, timeout=None, max_interval=None, blocking=True, on_exception=None)[source]

Wait for instance to complete, and check if the instance is successful.

Parameters:

interval – time interval to check
max_interval – if specified, next check interval will be multiplied by 2 till max_interval is reached.
timeout – time
blocking – whether to block waiting at server side. Note that this option does not affect client behavior.
on_exception – custom error handling function accepting an Exception instance as input. If return value is True, error will be raised. Otherwise retry will continue.

Returns:

None

Raise:

odps.errors.ODPSError if the instance failed

class odps.models.Resource(*args, **kwargs)[source]

Resource is useful when writing UDF or MapReduce. This is an abstract class.

Basically, resource can be either a file resource or a table resource. File resource can be file, py, jar, archive in details.

class Type(value)[source]

class odps.models.FileResource(*args, **kwargs)[source]

File resource represents for a file.

Use open method to open this resource as a file-like object.

class Mode(value)[source]

close()[source]

Close this file resource.

Returns:: None

flush()[source]

Commit the change to ODPS if any change happens. Close will do this automatically.

Returns:: None

open(mode='r', encoding='utf-8', stream=False, overwrite=None)[source]

The argument mode stands for the open mode for this file resource. It can be binary mode if the ‘b’ is inside. For instance, ‘rb’ means opening the resource as read binary mode while ‘r+b’ means opening the resource as read+write binary mode. This is most import when the file is actually binary such as tar or jpeg file, so be aware of opening this file as a correct mode.

Basically, the text mode can be ‘r’, ‘w’, ‘a’, ‘r+’, ‘w+’, ‘a+’ just like the builtin python open method.

r means read only
w means write only, the file will be truncated when opening
a means append only
r+ means read+write without constraint
w+ will truncate first then opening into read+write
a+ can read+write, however the written content can only be appended to the end

Parameters:

mode – the mode of opening file, described as above
encoding – utf-8 as default
stream – open in stream mode
overwrite – if True, will overwrite existing resource. True by default.

Returns:

file-like object

Example:

>>> with resource.open('r') as fp:
>>>     fp.read(1)  # read one unicode character
>>>     fp.write('test')  # wrong, cannot write under read mode
>>>
>>> with resource.open('wb') as fp:
>>>     fp.readlines()  # wrong, cannot read under write mode
>>>     fp.write('hello world')  # write bytes
>>>
>>> with resource.open('test_resource', 'r+') as fp:  # open as read-write mode
>>>     fp.seek(5)
>>>     fp.truncate()
>>>     fp.flush()

read(size=-1)[source]

Read the file resource, read all as default.

Parameters:: size – unicode or byte length depends on text mode or binary mode.
Returns:: unicode or bytes depends on text mode or binary mode
Return type:: bytes or str

readline(size=-1)[source]

Read a single line.

Parameters:: size – If the size argument is present and non-negative, it is a maximum byte count (including the trailing newline) and an incomplete line may be returned. When size is not 0, an empty string is returned only when EOF is encountered immediately
Returns:: unicode or bytes depends on text mode or binary mode
Return type:: str or unicode(Py2), bytes or str(Py3)

readlines(sizehint=-1)[source]

Read as lines.

Parameters:: sizehint – If the optional sizehint argument is present, instead of reading up to EOF, whole lines totalling approximately sizehint bytes (possibly after rounding up to an internal buffer size) are read.
Returns:: lines
Return type:: list

seek(pos, whence=0)[source]

Seek to some place.

Parameters:

pos – position to seek
whence – if set to 2, will seek to the end

Returns:

None

tell()[source]

Tell the current position

Returns:: current position

truncate(size=None)[source]

Truncate the file resource’s size.

Parameters:: size – If the optional size argument is present, the file is truncated to (at most) that size. The size defaults to the current position.
Returns:: None

write(content)[source]

Write content into the file resource

Parameters:: content – content to write
Returns:: None

writelines(seq)[source]

Write lines into the file resource.

Parameters:: seq – lines
Returns:: None

class odps.models.PyResource(*args, **kwargs)[source]: File resource representing for the .py file.

class odps.models.JarResource(*args, **kwargs)[source]: File resource representing for the .jar file.

class odps.models.ArchiveResource(*args, **kwargs)[source]: File resource representing for the compressed file like .zip/.tgz/.tar.gz/.tar/jar

class odps.models.TableResource(*args, **kwargs)[source]

Take a table as a resource.

open_reader(**kwargs)[source]: Open reader on the table resource

open_writer(**kwargs)[source]: Open writer on the table resource

property partition

Get the source table partition.

Returns:: the source table partition

property table

Get the table object.

Returns:: source table
Return type:: odps.models.Table

See also

odps.models.Table

update(table_project_name=None, table_schema_name=None, table_name=None, *args, **kw)[source]

Update this resource.

Parameters:

table_project_name – the source table’s project
table_name – the source table’s name
partition – the source table’s partition

Returns:

self

class odps.models.Function(*args, **kwargs)[source]

Function can be used in UDF when user writes a SQL.

drop()[source]

Delete this Function.

Returns:: None

property resources

Return all the resources which this function refer to.

Returns:: resources
Return type:: list

See also

odps.models.Resource

update()[source]

Update this function.

Returns:: None

class odps.models.Worker(**kwargs)[source]

Worker information class for worker information and log retrieval.

get_log(log_type, size=0)[source]

Get logs from worker.

Parameters:

log_type – type of logs. Possible log types contains coreinfo, hs_err_log, jstack, pstack, stderr, stdout, waterfall_summary
size – length of the log to retrieve

Returns:

log content

class odps.models.ml.OfflineModel(*args, **kwargs)[source]

Representing an ODPS offline model.

copy(new_name, new_project=None, async_=False)[source]

Copy current model into a new location.

Parameters:

new_name – name of the new model
new_project – new project name. if absent, original project name will be used
async – if True, return the copy instance. otherwise return the newly-copied model

get_model()[source]: Get PMML text of the current model. Note that model file obtained via this method might be incomplete due to size limitations.

class odps.models.security.User(*args, **kwargs)[source]