Model objects

class odps.models.Project(*args, **kwargs)[源代码]

Project is the counterpart of database in a RDBMS.

By get an object of Project, users can get the properties like name, owner, comment, creation_time, last_modified_time, and so on.

These properties will not load from remote ODPS service, unless users try to get them explicitly. If users want to check the newest status, try use reload method.

Example:

>>> project = odps.get_project('my_project')
>>> project.last_modified_time  # this property will be fetched from the remote ODPS service
>>> project.last_modified_time  # Once loaded, the property will not bring remote call
>>> project.owner  # so do the other properties, they are fetched together
>>> project.reload()  # force to update each properties
>>> project.last_modified_time  # already updated

class AuthQueryStatus(value)[源代码]

class ProjectStatus(value)[源代码]

class ProjectType(value)[源代码]

class odps.models.Table(*args, **kwargs)[源代码]

Table means the same to the RDBMS table, besides, a table can consist of partitions.

Table's properties are the same to the ones of odps.models.Project, which will not load from remote ODPS service until users try to get them.

In order to write data into table, users should call the open_writer method with with statement. At the same time, the open_reader method is used to provide the ability to read records from a table or its partition.

Example:

>>> table = odps.get_table('my_table')
>>> table.owner  # first will load from remote
>>> table.reload()  # reload to update the properties
>>>
>>> for record in table.head(5):
>>>     # check the first 5 records
>>> for record in table.head(5, partition='pt=test', columns=['my_column'])
>>>     # only check the `my_column` column from certain partition of this table
>>>
>>> with table.open_reader() as reader:
>>>     count = reader.count  # How many records of a table or its partition
>>>     for record in reader[0: count]:
>>>         # read all data, actually better to split into reading for many times
>>>
>>> with table.open_writer() as writer:
>>>     writer.write(records)
>>> with table.open_writer(partition='pt=test', blocks=[0, 1]):
>>>     writer.write(0, gen_records(block=0))
>>>     writer.write(1, gen_records(block=1))  # we can do this parallel

name: Name of the table

comment: Comment of the table

owner: Owner of the table

creation_time: Creation time of the table in local time.

last_data_modified_time: Last data modified time of the table in local time.

table_schema: Schema of the table, in TableSchema type.

type: Type of the table, can be managed_table, external_table, view or materialized_view.

size: Logical size of the table.

lifecycle: Lifecycle of the table in days.

class Type(value)[源代码]

add_columns(columns, if_not_exists=False, async_=False, hints=None, **inst_kw)[源代码]

Add columns to the table.

参数:

columns -- columns to add, can be a list of Column or a string of column definitions
if_not_exists -- if True, will not raise exception when column exists

Example:

>>> table = odps.create_table('test_table', schema=TableSchema.from_lists(['name', 'id'], ['sring', 'string']))
>>> # add column by Column instance
>>> table.add_columns([Column('id2', 'string')])
>>> # add column by a string of column definitions
>>> table.add_columns("fid double, fid2 double")

change_partition_spec(old_partition_spec, new_partition_spec, async_=False, hints=None, **inst_kw)[源代码]

Change partition spec of specified partition of the table.

参数:

old_partition_spec -- old partition spec
new_partition_spec -- new partition spec

create_partition(partition_spec, if_not_exists=False, async_=False, hints=None, **inst_kw)[源代码]

Create a partition within the table.

参数:

partition_spec -- specification of the partition.
if_not_exists
hints
async

返回:

partition object

返回类型:

odps.models.partition.Partition

delete_columns(columns, async_=False, hints=None, **inst_kw)[源代码]

Delete columns from the table.

参数:: columns -- columns to delete, can be a list of column names

delete_partition(partition_spec, if_exists=False, async_=False, hints=None, **inst_kw)[源代码]

Delete a partition within the table.

参数:

partition_spec -- specification of the partition.
if_exists
hints
async

drop(async_=False, if_exists=False, hints=None, **inst_kw)[源代码]

Drop this table.

参数:

async -- run asynchronously if True
if_exists
hints

返回:

None

exist_partition(partition_spec)[源代码]

Check if a partition exists within the table.

参数:: partition_spec -- specification of the partition.

exist_partitions(prefix_spec=None)[源代码]

Check if partitions with provided conditions exist.

参数:: prefix_spec -- prefix of partition
返回:: whether partitions exist

get_ddl(with_comments=True, if_not_exists=False, force_table_ddl=False)[源代码]

Get DDL SQL statement for the given table.

参数:

with_comments -- append comment for table and each column
if_not_exists -- generate if not exists code for generated DDL
force_table_ddl -- force generate table DDL if object is a view

返回:

DDL statement

get_max_partition(spec=None, skip_empty=True, reverse=False)[源代码]

Get partition with maximal values within certain spec.

参数:

spec -- parent partitions. if specified, will return partition with maximal value within specified parent partition
skip_empty -- if True, will skip partitions without data
reverse -- if True, will return minimal value

返回:

Partition

get_partition(partition_spec)[源代码]

Get a partition with given specifications.

参数:: partition_spec -- specification of the partition.
返回:: partition object
返回类型:: odps.models.partition.Partition

head(limit, partition=None, columns=None, use_legacy=True, timeout=None, tags=None)[源代码]

Get the head records of a table or its partition.

参数:

limit (int) -- records' size, 10000 at most
partition -- partition of this table
columns (list) -- the columns which is subset of the table columns

返回:

records

返回类型:

list

参见

odps.models.Record

iter_pandas(partition=None, columns=None, batch_size=None, start=None, count=None, quota_name=None, append_partitions=None, tags=None, **kwargs)[源代码]

Iterate table data in blocks as pandas DataFrame

参数:

partition -- partition of this table
columns (list) -- columns to read
batch_size (int) -- size of DataFrame batch to read
start (int) -- start row index from 0
count (int) -- data count to read
append_partitions (bool) -- if True, partition values will be appended to the output
quota_name (str) -- name of tunnel quota to use

iterate_partitions(spec=None, reverse=False)[源代码]

Create an iterable object to iterate over partitions.

参数:

spec -- specification of the partition.
reverse -- output partitions in reversed order

new_record(values=None)[源代码]

Generate a record of the table.

参数:: values (list) -- the values of this records
返回:: record
返回类型:: odps.models.Record
Example:

>>> table = odps.create_table('test_table', schema=TableSchema.from_lists(['name', 'id'], ['sring', 'string']))
>>> record = table.new_record()
>>> record[0] = 'my_name'
>>> record[1] = 'my_id'
>>> record = table.new_record(['my_name', 'my_id'])

参见

odps.models.Record

open_reader(partition=None, reopen=False, endpoint=None, download_id=None, timeout=None, arrow=False, columns=None, quota_name=None, async_mode=True, append_partitions=None, tags=None, **kw)[源代码]

Open the reader to read the entire records from this table or its partition.

参数:

partition -- partition of this table
reopen (bool) -- the reader will reuse last one, reopen is true means open a new reader.
endpoint -- the tunnel service URL
download_id -- use existing download_id to download table contents
arrow -- use arrow tunnel to read data
columns -- columns to read
quota_name -- name of tunnel quota
async_mode -- enable async mode to create tunnels, can set True if session creation takes a long time.
compress_option (odps.tunnel.CompressOption) -- compression algorithm, level and strategy
compress_algo -- compression algorithm, work when compress_option is not provided, can be zlib, snappy
compress_level -- used for zlib, work when compress_option is not provided
compress_strategy -- used for zlib, work when compress_option is not provided
append_partitions (bool) -- if True, partition values will be appended to the output

返回:

reader, count means the full size, status means the tunnel status

Example:

>>> with table.open_reader() as reader:
>>>     count = reader.count  # How many records of a table or its partition
>>>     for record in reader[0: count]:
>>>         # read all data, actually better to split into reading for many times

open_writer(partition=None, blocks=None, reopen=False, create_partition=False, commit=True, endpoint=None, upload_id=None, arrow=False, quota_name=None, tags=None, mp_context=None, on_exception=None, **kw)[源代码]

Open the writer to write records into this table or its partition.

参数:

partition -- partition of this table
blocks -- block ids to open
reopen (bool) -- the reader will reuse last one, reopen is true means open a new reader.
create_partition (bool) -- if true, the partition will be created if not exist
endpoint -- the tunnel service URL
upload_id -- use existing upload_id to upload data
arrow -- use arrow tunnel to write data
quota_name -- name of tunnel quota
overwrite (bool) -- if True, will overwrite existing data
compress_option (odps.tunnel.CompressOption) -- compression algorithm, level and strategy
compress_algo -- compression algorithm, work when compress_option is not provided, can be zlib, snappy
compress_level -- used for zlib, work when compress_option is not provided
compress_strategy -- used for zlib, work when compress_option is not provided

返回:

writer, status means the tunnel writer status

Example:

>>> with table.open_writer() as writer:
>>>     writer.write(records)
>>> with table.open_writer(partition='pt=test', blocks=[0, 1]):
>>>     writer.write(0, gen_records(block=0))
>>>     writer.write(1, gen_records(block=1))  # we can do this parallel

rename(new_name, async_=False, hints=None, **inst_kw)[源代码]

Rename the table.

参数:: new_name -- new table name

rename_column(old_column_name, new_column_name, comment=None, async_=False, hints=None, **inst_kw)[源代码]

Rename a column in the table.

参数:

old_column_name -- old column name
new_column_name -- new column name
comment -- new column comment, optional

set_cluster_info(new_cluster_info, async_=False, hints=None, **inst_kw)[源代码]: Set cluster info of current table.

set_comment(new_comment, async_=False, hints=None, **inst_kw)[源代码]

Set comment of current table.

参数:: new_comment -- new comment

set_lifecycle(days, async_=False, hints=None, **inst_kw)[源代码]

Set lifecycle of current table.

参数:: days -- lifecycle in days

set_owner(new_owner, async_=False, hints=None, **inst_kw)[源代码]

Set owner of current table.

参数:: new_owner -- account of the new owner

set_storage_tier(storage_tier, partition_spec=None, async_=False, hints=None, **inst_kw)[源代码]: Set storage tier of current table or specific partition.

to_df()[源代码]

Create a PyODPS DataFrame from this table.

返回:: DataFrame object

to_pandas(partition=None, columns=None, start=None, count=None, n_process=1, n_thread=1, quota_name=None, append_partitions=None, tags=None, **kwargs)[源代码]

Read table data into pandas DataFrame

参数:

partition -- partition of this table
columns (list) -- columns to read
start (int) -- start row index from 0
count (int) -- data count to read
n_process (int) -- number of processes to accelerate reading
append_partitions (bool) -- if True, partition values will be appended to the output
quota_name (str) -- name of tunnel quota to use

touch(partition_spec=None, async_=False, hints=None, **inst_kw)[源代码]

Update the last modified time of the table or specified partition.

参数:: partition_spec -- partition spec, optional

truncate(partition_spec=None, async_=False, hints=None, **inst_kw)[源代码]

truncate this table.

参数:

partition_spec -- partition specs
hints
async -- run asynchronously if True

返回:

None

class odps.models.partition.Partition(*args, **kwargs)[源代码]

A partition is a collection of rows in a table whose partition columns are equal to specific values.

In order to write data into partition, users should call the open_writer method with with statement. At the same time, the open_reader method is used to provide the ability to read records from a partition. The behavior of these methods are the same as those in Table class except that there are no 'partition' params.

change_partition_spec(new_partition_spec, async_=False, hints=None)[源代码]

Change partition spec of current partition.

参数:: new_partition_spec -- new partition spec

drop(async_=False, if_exists=False)[源代码]

Drop this partition.

参数:

async -- run asynchronously if True
if_exists

返回:

None

head(limit, columns=None)[源代码]

Get the head records of a partition

参数:

limit -- records' size, 10000 at most
columns (list) -- the columns which is subset of the table columns

返回:

records

返回类型:

list

参见

odps.models.Record

iter_pandas(columns=None, batch_size=None, start=None, count=None, quota_name=None, append_partitions=None, tags=None, **kwargs)[源代码]

Read partition data into pandas DataFrame

参数:

columns (list) -- columns to read
batch_size (int) -- size of DataFrame batch to read
start (int) -- start row index from 0
count (int) -- data count to read
quota_name (str) -- name of tunnel quota to use
append_partitions (bool) -- if True, partition values will be appended to the output

open_reader(**kw)[源代码]

Open the reader to read the entire records from this partition.

参数:

reopen (bool) -- the reader will reuse last one, reopen is true means open a new reader.
endpoint -- the tunnel service URL
compress_option (odps.tunnel.CompressOption) -- compression algorithm, level and strategy
compress_algo -- compression algorithm, work when compress_option is not provided, can be zlib, snappy
compress_level -- used for zlib, work when compress_option is not provided
compress_strategy -- used for zlib, work when compress_option is not provided

返回:

reader, count means the full size, status means the tunnel status

Example:

>>> with partition.open_reader() as reader:
>>>     count = reader.count  # How many records of a partition
>>>     for record in reader[0: count]:
>>>         # read all data, actually better to split into reading for many times

set_storage_tier(storage_tier, async_=False, hints=None)[源代码]: Set storage tier of current partition.

to_df()[源代码]

Create a PyODPS DataFrame from this partition.

返回:: DataFrame object

to_pandas(columns=None, start=None, count=None, n_process=1, n_thread=1, quota_name=None, append_partitions=None, tags=None, **kwargs)[源代码]

Read partition data into pandas DataFrame

参数:

columns (list) -- columns to read
start (int) -- start row index from 0
count (int) -- data count to read
n_process (int) -- number of processes to accelerate reading
quota_name (str) -- name of tunnel quota to use
append_partitions (bool) -- if True, partition values will be appended to the output

touch(async_=False, hints=None)[源代码]: Update the last modified time of the partition.

truncate(async_=False)[源代码]: Truncate current partition.

class odps.models.Instance(*args, **kwargs)[源代码]

Instance means that a ODPS task will sometimes run as an instance.

status can reflect the current situation of a instance. is_terminated method indicates if the instance has finished. is_successful method indicates if the instance runs successfully. wait_for_success method will block the main process until the instance has finished.

For a SQL instance, we can use open_reader to read the results.

Example:

>>> instance = odps.execute_sql('select * from dual')  # this sql return the structured data
>>> with instance.open_reader() as reader:
>>>     # handle the record
>>>
>>> instance = odps.execute_sql('desc dual')  # this sql do not return structured data
>>> with instance.open_reader() as reader:
>>>    print(reader.raw)  # just return the raw result

exception DownloadSessionCreationError(msg, request_id=None, code=None, host_id=None, instance_id=None, endpoint=None, tag=None, response_headers=None, status_code=None)[源代码]

class Status(value)[源代码]

class Task(**kwargs)[源代码]

Task stands for each task inside an instance.

It has a name, a task type, the start to end time, and a running status.

class TaskProgress(**kwargs)[源代码]

TaskProgress reprents for the progress of a task.

A single TaskProgress may consist of several stages.

Example:

>>> progress = instance.get_task_progress('task_name')
>>> progress.get_stage_progress_formatted_string()
2015-11-19 16:39:07 M1_Stg1_job0:0/0/1[0%]  R2_1_Stg1_job0:0/0/1[0%]

class TaskStatus(value)[源代码]

class TaskSummary(*args, **kwargs)[源代码]

get_logview_address(hours=None, use_legacy=None)[源代码]

Get logview address of the instance object by hours.

参数:: hours
返回:: logview address
返回类型:: str

get_sql_task_cost()[源代码]

Get cost information of the sql cost task, including input data size, number of UDF, Complexity of the sql task.

NOTE that DO NOT use this function directly as it cannot be applied to instances returned from SQL. Use o.execute_sql_cost instead.

返回:: cost info in dict format

get_task_cost(task_name=None)[源代码]

Get task cost

参数:: task_name -- name of the task
返回:: task cost
返回类型:: Instance.TaskCost
Example:

>>> cost = instance.get_task_cost(instance.get_task_names()[0])
>>> cost.cpu_cost
200
>>> cost.memory_cost
4096
>>> cost.input_size
0

get_task_detail(task_name=None)[源代码]

Get task's detail

参数:: task_name -- task name
返回:: the task's detail
返回类型:: list or dict according to the JSON

get_task_detail2(task_name=None, **kw)[源代码]

Get task's detail v2

参数:: task_name -- task name
返回:: the task's detail
返回类型:: list or dict according to the JSON

get_task_info(task_name, key, raise_empty=False)[源代码]

Get task related information.

参数:

task_name -- name of the task
key -- key of the information item
raise_empty -- if True, will raise error when response is empty

返回:

a string of the task information

get_task_names(retry=True, timeout=None)[源代码]

Get names of all tasks

返回:: task names
返回类型:: list

get_task_progress(task_name=None)[源代码]

Get task's current progress

参数:: task_name -- task_name
返回:: the task's progress
返回类型:: odps.models.Instance.Task.TaskProgress

get_task_quota(task_name=None)[源代码]

Get queueing info of the task. Note that time between two calls should larger than 30 seconds, otherwise empty dict is returned.

参数:: task_name -- name of the task
返回:: quota info in dict format

get_task_result(task_name=None, timeout=None, retry=True)[源代码]

Get a single task result.

参数:: task_name -- task name
返回:: task result
返回类型:: str

get_task_results(timeout=None, retry=True)[源代码]

Get all the task results.

返回:: a dict which key is task name, and value is the task result as string
返回类型:: dict

get_task_statuses(retry=True, timeout=None, on_exception=None, status_only=False)[源代码]

Get all tasks' statuses

参数:: status_only -- if True, allow using cached status from sync response instead of making HTTP request
返回:: a dict which key is the task name and value is the odps.models.Instance.Task object
返回类型:: dict

get_task_summary(task_name=None, with_finalized=False)[源代码]

Get a task's summary, mostly used for MapReduce.

参数:

task_name -- task name
with_finalized -- if True, return an empty TaskSummary with the finalized attribute when no summary body is available but the task ends.

返回:

summary dict with extra attributes: * summary_text: plain text summary * json_summary: raw JSON string * finalized: (only when with_finalized=True) True if task is

finalized (summary won't change), False if not yet finalized, None if server did not return the x-odps-task-finalized header.

返回类型:

Instance.TaskSummary or None

get_task_workers(task_name=None, json_obj=None)[源代码]: Get workers from task :param task_name: task name :param json_obj: json object parsed from get_task_detail2 :return: list of workers

参见

odps.models.Worker

get_worker_log(log_id, log_type, size=0)[源代码]

Get logs from worker.

参数:

log_id -- id of log, can be retrieved from details.
log_type -- type of logs. Possible log types contains coreinfo, hs_err_log, jstack, pstack, stderr, stdout, waterfall_summary
size -- length of the log to retrieve

返回:

log content

is_running(retry=True, blocking=False, retry_timeout=None, on_exception=None)[源代码]

If this instance is still running.

返回:: True if still running else False
返回类型:: bool

is_successful(retry=True, retry_timeout=None, on_exception=None)[源代码]

If the instance runs successfully.

返回:: True if successful else False
返回类型:: bool

is_terminated(retry=True, blocking=False, retry_timeout=None, on_exception=None)[源代码]

If this instance has finished or not.

返回:: True if finished else False
返回类型:: bool

iter_pandas(columns=None, limit=None, batch_size=None, start=None, count=None, quota_name=None, tags=None, **kwargs)[源代码]

Iterate table data in blocks as pandas DataFrame. The limit argument follows definition of open_reader API.

参数:

columns (list) -- columns to read
limit (bool) -- if True, enable the limitation
batch_size (int) -- size of DataFrame batch to read
start (int) -- start row index from 0
count (int) -- data count to read
quota_name (str) -- name of tunnel quota to use

open_reader(*args, **kwargs)[源代码]

Open the reader to read records from the result of the instance. If tunnel is True, instance tunnel will be used. Otherwise conventional routine will be used. If instance tunnel is not available and tunnel is not specified, the method will fall back to the conventional routine. Note that the number of records returned is limited unless options.limited_instance_tunnel is set to True or limit=True is configured under instance tunnel mode. Otherwise the number of records returned is always limited.

参数:

tunnel -- if true, use instance tunnel to read from the instance. if false, use conventional routine. if absent, options.tunnel.use_instance_tunnel will be used and automatic fallback is enabled.
limit (bool) -- if True, enable the limitation
reopen (bool) -- the reader will reuse last one, reopen is true means open a new reader.
endpoint -- the tunnel service URL
compress_option (odps.tunnel.CompressOption) -- compression algorithm, level and strategy
compress_algo -- compression algorithm, work when compress_option is not provided, can be zlib, snappy
compress_level -- used for zlib, work when compress_option is not provided
compress_strategy -- used for zlib, work when compress_option is not provided

返回:

reader, count means the full size, status means the tunnel status

Example:

>>> with instance.open_reader() as reader:
>>>     count = reader.count  # How many records of a table or its partition
>>>     for record in reader[0: count]:
>>>         # read all data, actually better to split into reading for many times

put_task_info(task_name, key, value, check_location=False, raise_empty=False)[源代码]

Put information into a task.

参数:

task_name -- name of the task
key -- key of the information item
value -- value of the information item
check_location -- raises if Location header is missing
raise_empty -- if True, will raise error when response is empty

stop()[源代码]

Stop this instance.

返回:: None

to_pandas(columns=None, limit=None, start=None, count=None, n_process=1, quota_name=None, tags=None, **kwargs)[源代码]

Read instance data into pandas DataFrame. The limit argument follows definition of open_reader API.

参数:

columns (list) -- columns to read
limit (bool) -- if True, enable the limitation
start (int) -- start row index from 0
count (int) -- data count to read
n_process (int) -- number of processes to accelerate reading
quota_name (str) -- name of tunnel quota to use

wait_for_completion(interval=1, timeout=None, max_interval=None, blocking=True, on_exception=None)[源代码]

Wait for the instance to complete, and neglect the consequence.

参数:

interval -- time interval to check
max_interval -- if specified, next check interval will be multiplied by 2 till max_interval is reached.
timeout -- time
blocking -- whether to block waiting at server side. Note that this option does not affect client behavior.
on_exception -- custom error handling function accepting an Exception instance as input. If return value is True, error will be raised. Otherwise retry will continue.

返回:

None

wait_for_success(interval=1, timeout=None, max_interval=None, blocking=True, on_exception=None)[源代码]

Wait for instance to complete, and check if the instance is successful.

参数:

interval -- time interval to check
max_interval -- if specified, next check interval will be multiplied by 2 till max_interval is reached.
timeout -- time
blocking -- whether to block waiting at server side. Note that this option does not affect client behavior.
on_exception -- custom error handling function accepting an Exception instance as input. If return value is True, error will be raised. Otherwise retry will continue.

返回:

None

Raise:

odps.errors.ODPSError if the instance failed

class odps.models.Resource(*args, **kwargs)[源代码]

Resource is useful when writing UDF or MapReduce. This is an abstract class.

Basically, resource can be either a file resource or a table resource. File resource can be file, py, jar, archive in details.

参见

odps.models.FileResource, odps.models.PyResource, odps.models.JarResource, odps.models.ArchiveResource, odps.models.TableResource

class Type(value)[源代码]

class odps.models.FileResource(*args, **kwargs)[源代码]

File resource represents for a file.

Use open method to open this resource as a file-like object.

class Mode(value)[源代码]

close()[源代码]

Close this file resource.

返回:: None

flush()[源代码]

Commit the change to ODPS if any change happens. Close will do this automatically.

返回:: None

open(mode='r', encoding='utf-8', stream=False, overwrite=None)[源代码]

The argument mode stands for the open mode for this file resource. It can be binary mode if the 'b' is inside. For instance, 'rb' means opening the resource as read binary mode while 'r+b' means opening the resource as read+write binary mode. This is most import when the file is actually binary such as tar or jpeg file, so be aware of opening this file as a correct mode.

Basically, the text mode can be 'r', 'w', 'a', 'r+', 'w+', 'a+' just like the builtin python open method.

r means read only
w means write only, the file will be truncated when opening
a means append only
r+ means read+write without constraint
w+ will truncate first then opening into read+write
a+ can read+write, however the written content can only be appended to the end

参数:

mode -- the mode of opening file, described as above
encoding -- utf-8 as default
stream -- open in stream mode
overwrite -- if True, will overwrite existing resource. True by default.

返回:

file-like object

Example:

>>> with resource.open('r') as fp:
>>>     fp.read(1)  # read one unicode character
>>>     fp.write('test')  # wrong, cannot write under read mode
>>>
>>> with resource.open('wb') as fp:
>>>     fp.readlines()  # wrong, cannot read under write mode
>>>     fp.write('hello world')  # write bytes
>>>
>>> with resource.open('test_resource', 'r+') as fp:  # open as read-write mode
>>>     fp.seek(5)
>>>     fp.truncate()
>>>     fp.flush()

read(size=-1)[源代码]

Read the file resource, read all as default.

参数:: size -- unicode or byte length depends on text mode or binary mode.
返回:: unicode or bytes depends on text mode or binary mode
返回类型:: bytes or str

readline(size=-1)[源代码]

Read a single line.

参数:: size -- If the size argument is present and non-negative, it is a maximum byte count (including the trailing newline) and an incomplete line may be returned. When size is not 0, an empty string is returned only when EOF is encountered immediately
返回:: unicode or bytes depends on text mode or binary mode
返回类型:: str or unicode(Py2), bytes or str(Py3)

readlines(sizehint=-1)[源代码]

Read as lines.

参数:: sizehint -- If the optional sizehint argument is present, instead of reading up to EOF, whole lines totalling approximately sizehint bytes (possibly after rounding up to an internal buffer size) are read.
返回:: lines
返回类型:: list

seek(pos, whence=0)[源代码]

Seek to some place.

参数:

pos -- position to seek
whence -- if set to 2, will seek to the end

返回:

None

tell()[源代码]

Tell the current position

返回:: current position

truncate(size=None)[源代码]

Truncate the file resource's size.

参数:: size -- If the optional size argument is present, the file is truncated to (at most) that size. The size defaults to the current position.
返回:: None

write(content)[源代码]

Write content into the file resource

参数:: content -- content to write
返回:: None

writelines(seq)[源代码]

Write lines into the file resource.

参数:: seq -- lines
返回:: None

class odps.models.PyResource(*args, **kwargs)[源代码]: File resource representing for the .py file.

class odps.models.JarResource(*args, **kwargs)[源代码]: File resource representing for the .jar file.

class odps.models.ArchiveResource(*args, **kwargs)[源代码]: File resource representing for the compressed file like .zip/.tgz/.tar.gz/.tar/jar

class odps.models.TableResource(*args, **kwargs)[源代码]

Take a table as a resource.

open_reader(**kwargs)[源代码]: Open reader on the table resource

open_writer(**kwargs)[源代码]: Open writer on the table resource

property partition

Get the source table partition.

返回:: the source table partition

property table

Get the table object.

返回:: source table
返回类型:: odps.models.Table

参见

odps.models.Table

update(table_project_name=None, table_schema_name=None, table_name=None, *args, **kw)[源代码]

Update this resource.

参数:

table_project_name -- the source table's project
table_name -- the source table's name
partition -- the source table's partition

返回:

self

class odps.models.Function(*args, **kwargs)[源代码]

Function can be used in UDF when user writes a SQL.

drop()[源代码]

Delete this Function.

返回:: None

property resources

Return all the resources which this function refer to.

返回:: resources
返回类型:: list

参见

odps.models.Resource

update()[源代码]

Update this function.

返回:: None

class odps.models.Worker(**kwargs)[源代码]

Worker information class for worker information and log retrieval.

get_log(log_type, size=0)[源代码]

Get logs from worker.

参数:

log_type -- type of logs. Possible log types contains coreinfo, hs_err_log, jstack, pstack, stderr, stdout, waterfall_summary
size -- length of the log to retrieve

返回:

log content

class odps.models.ml.OfflineModel(*args, **kwargs)[源代码]

Representing an ODPS offline model.

copy(new_name, new_project=None, async_=False)[源代码]

Copy current model into a new location.

参数:

new_name -- name of the new model
new_project -- new project name. if absent, original project name will be used
async -- if True, return the copy instance. otherwise return the newly-copied model

get_model()[源代码]: Get PMML text of the current model. Note that model file obtained via this method might be incomplete due to size limitations.

class odps.models.security.User(*args, **kwargs)[源代码]