Model objects

class odps.models.Project(**kwargs)[源代码]

Project is the counterpart of database in a RDBMS.

By get an object of Project, users can get the properties like name, owner, comment, creation_time, last_modified_time, and so on.

These properties will not load from remote ODPS service, unless users try to get them explicitly. If users want to check the newest status, try use reload method.

Example:

>>> project = odps.get_project('my_project')
>>> project.last_modified_time  # this property will be fetched from the remote ODPS service
>>> project.last_modified_time  # Once loaded, the property will not bring remote call
>>> project.owner  # so do the other properties, they are fetched together
>>> project.reload()  # force to update each properties
>>> project.last_modified_time  # already updated
class AuthQueryStatus(value)[源代码]
class ProjectStatus(value)[源代码]
class ProjectType(value)[源代码]
class odps.models.Table(**kwargs)[源代码]

Table means the same to the RDBMS table, besides, a table can consist of partitions.

Table's properties are the same to the ones of odps.models.Project, which will not load from remote ODPS service until users try to get them.

In order to write data into table, users should call the open_writer method with with statement. At the same time, the open_reader method is used to provide the ability to read records from a table or its partition.

Example:

>>> table = odps.get_table('my_table')
>>> table.owner  # first will load from remote
>>> table.reload()  # reload to update the properties
>>>
>>> for record in table.head(5):
>>>     # check the first 5 records
>>> for record in table.head(5, partition='pt=test', columns=['my_column'])
>>>     # only check the `my_column` column from certain partition of this table
>>>
>>> with table.open_reader() as reader:
>>>     count = reader.count  # How many records of a table or its partition
>>>     for record in reader[0: count]:
>>>         # read all data, actually better to split into reading for many times
>>>
>>> with table.open_writer() as writer:
>>>     writer.write(records)
>>> with table.open_writer(partition='pt=test', blocks=[0, 1]):
>>>     writer.write(0, gen_records(block=0))
>>>     writer.write(1, gen_records(block=1))  # we can do this parallel
name

Name of the table

comment

Comment of the table

owner

Owner of the table

creation_time

Creation time of the table in local time.

last_data_modified_time

Last data modified time of the table in local time.

table_schema

Schema of the table, in TableSchema type.

type

Type of the table, can be managed_table, external_table, view or materialized_view.

size

Logical size of the table.

lifecycle

Lifecycle of the table in days.

class Type(value)[源代码]
add_columns(columns, if_not_exists=False, async_=False, hints=None)[源代码]

Add columns to the table.

参数:
  • columns -- columns to add, can be a list of Column or a string of column definitions

  • if_not_exists -- if True, will not raise exception when column exists

Example:

>>> table = odps.create_table('test_table', schema=TableSchema.from_lists(['name', 'id'], ['sring', 'string']))
>>> # add column by Column instance
>>> table.add_columns([Column('id2', 'string')])
>>> # add column by a string of column definitions
>>> table.add_columns("fid double, fid2 double")
change_partition_spec(old_partition_spec, new_partition_spec, async_=False, hints=None)[源代码]

Change partition spec of specified partition of the table.

参数:
  • old_partition_spec -- old partition spec

  • new_partition_spec -- new partition spec

create_partition(partition_spec, if_not_exists=False, async_=False, hints=None)[源代码]

Create a partition within the table.

参数:
  • partition_spec -- specification of the partition.

  • if_not_exists

  • hints

  • async

返回:

partition object

返回类型:

odps.models.partition.Partition

delete_columns(columns, async_=False, hints=None)[源代码]

Delete columns from the table.

参数:

columns -- columns to delete, can be a list of column names

delete_partition(partition_spec, if_exists=False, async_=False, hints=None)[源代码]

Delete a partition within the table.

参数:
  • partition_spec -- specification of the partition.

  • if_exists

  • hints

  • async

drop(async_=False, if_exists=False, hints=None)[源代码]

Drop this table.

参数:
  • async -- run asynchronously if True

  • if_exists

  • hints

返回:

None

exist_partition(partition_spec)[源代码]

Check if a partition exists within the table.

参数:

partition_spec -- specification of the partition.

exist_partitions(prefix_spec=None)[源代码]

Check if partitions with provided conditions exist.

参数:

prefix_spec -- prefix of partition

返回:

whether partitions exist

get_ddl(with_comments=True, if_not_exists=False, force_table_ddl=False)[源代码]

Get DDL SQL statement for the given table.

参数:
  • with_comments -- append comment for table and each column

  • if_not_exists -- generate if not exists code for generated DDL

  • force_table_ddl -- force generate table DDL if object is a view

返回:

DDL statement

get_max_partition(spec=None, skip_empty=True, reverse=False)[源代码]

Get partition with maximal values within certain spec.

参数:
  • spec -- parent partitions. if specified, will return partition with maximal value within specified parent partition

  • skip_empty -- if True, will skip partitions without data

  • reverse -- if True, will return minimal value

返回:

Partition

get_partition(partition_spec)[源代码]

Get a partition with given specifications.

参数:

partition_spec -- specification of the partition.

返回:

partition object

返回类型:

odps.models.partition.Partition

head(limit, partition=None, columns=None, use_legacy=True, timeout=None, tags=None)[源代码]

Get the head records of a table or its partition.

参数:
  • limit (int) -- records' size, 10000 at most

  • partition -- partition of this table

  • columns (list) -- the columns which is subset of the table columns

返回:

records

返回类型:

list

iter_pandas(partition=None, columns=None, batch_size=None, start=None, count=None, quota_name=None, append_partitions=None, tags=None, **kwargs)[源代码]

Iterate table data in blocks as pandas DataFrame

参数:
  • partition -- partition of this table

  • columns (list) -- columns to read

  • batch_size (int) -- size of DataFrame batch to read

  • start (int) -- start row index from 0

  • count (int) -- data count to read

  • append_partitions (bool) -- if True, partition values will be appended to the output

  • quota_name (str) -- name of tunnel quota to use

iterate_partitions(spec=None, reverse=False)[源代码]

Create an iterable object to iterate over partitions.

参数:
  • spec -- specification of the partition.

  • reverse -- output partitions in reversed order

new_record(values=None)[源代码]

Generate a record of the table.

参数:

values (list) -- the values of this records

返回:

record

返回类型:

odps.models.Record

Example:

>>> table = odps.create_table('test_table', schema=TableSchema.from_lists(['name', 'id'], ['sring', 'string']))
>>> record = table.new_record()
>>> record[0] = 'my_name'
>>> record[1] = 'my_id'
>>> record = table.new_record(['my_name', 'my_id'])
open_reader(partition=None, reopen=False, endpoint=None, download_id=None, timeout=None, arrow=False, columns=None, quota_name=None, async_mode=True, append_partitions=None, tags=None, **kw)[源代码]

Open the reader to read the entire records from this table or its partition.

参数:
  • partition -- partition of this table

  • reopen (bool) -- the reader will reuse last one, reopen is true means open a new reader.

  • endpoint -- the tunnel service URL

  • download_id -- use existing download_id to download table contents

  • arrow -- use arrow tunnel to read data

  • columns -- columns to read

  • quota_name -- name of tunnel quota

  • async_mode -- enable async mode to create tunnels, can set True if session creation takes a long time.

  • compress_option (odps.tunnel.CompressOption) -- compression algorithm, level and strategy

  • compress_algo -- compression algorithm, work when compress_option is not provided, can be zlib, snappy

  • compress_level -- used for zlib, work when compress_option is not provided

  • compress_strategy -- used for zlib, work when compress_option is not provided

  • append_partitions (bool) -- if True, partition values will be appended to the output

返回:

reader, count means the full size, status means the tunnel status

Example:

>>> with table.open_reader() as reader:
>>>     count = reader.count  # How many records of a table or its partition
>>>     for record in reader[0: count]:
>>>         # read all data, actually better to split into reading for many times
open_writer(partition=None, blocks=None, reopen=False, create_partition=False, commit=True, endpoint=None, upload_id=None, arrow=False, quota_name=None, tags=None, mp_context=None, **kw)[源代码]

Open the writer to write records into this table or its partition.

参数:
  • partition -- partition of this table

  • blocks -- block ids to open

  • reopen (bool) -- the reader will reuse last one, reopen is true means open a new reader.

  • create_partition (bool) -- if true, the partition will be created if not exist

  • endpoint -- the tunnel service URL

  • upload_id -- use existing upload_id to upload data

  • arrow -- use arrow tunnel to write data

  • quota_name -- name of tunnel quota

  • overwrite (bool) -- if True, will overwrite existing data

  • compress_option (odps.tunnel.CompressOption) -- compression algorithm, level and strategy

  • compress_algo -- compression algorithm, work when compress_option is not provided, can be zlib, snappy

  • compress_level -- used for zlib, work when compress_option is not provided

  • compress_strategy -- used for zlib, work when compress_option is not provided

返回:

writer, status means the tunnel writer status

Example:

>>> with table.open_writer() as writer:
>>>     writer.write(records)
>>> with table.open_writer(partition='pt=test', blocks=[0, 1]):
>>>     writer.write(0, gen_records(block=0))
>>>     writer.write(1, gen_records(block=1))  # we can do this parallel
rename(new_name, async_=False, hints=None)[源代码]

Rename the table.

参数:

new_name -- new table name

rename_column(old_column_name, new_column_name, comment=None, async_=False, hints=None)[源代码]

Rename a column in the table.

参数:
  • old_column_name -- old column name

  • new_column_name -- new column name

  • comment -- new column comment, optional

set_cluster_info(new_cluster_info, async_=False, hints=None)[源代码]

Set cluster info of current table.

set_comment(new_comment, async_=False, hints=None)[源代码]

Set comment of current table.

参数:

new_comment -- new comment

set_lifecycle(days, async_=False, hints=None)[源代码]

Set lifecycle of current table.

参数:

days -- lifecycle in days

set_owner(new_owner, async_=False, hints=None)[源代码]

Set owner of current table.

参数:

new_owner -- account of the new owner

set_storage_tier(storage_tier, partition_spec=None, async_=False, hints=None)[源代码]

Set storage tier of current table or specific partition.

to_df()[源代码]

Create a PyODPS DataFrame from this table.

返回:

DataFrame object

to_pandas(partition=None, columns=None, start=None, count=None, n_process=1, quota_name=None, append_partitions=None, tags=None, **kwargs)[源代码]

Read table data into pandas DataFrame

参数:
  • partition -- partition of this table

  • columns (list) -- columns to read

  • start (int) -- start row index from 0

  • count (int) -- data count to read

  • n_process (int) -- number of processes to accelerate reading

  • append_partitions (bool) -- if True, partition values will be appended to the output

  • quota_name (str) -- name of tunnel quota to use

touch(partition_spec=None, async_=False, hints=None)[源代码]

Update the last modified time of the table or specified partition.

参数:

partition_spec -- partition spec, optional

truncate(partition_spec=None, async_=False, hints=None)[源代码]

truncate this table.

参数:
  • partition_spec -- partition specs

  • hints

  • async -- run asynchronously if True

返回:

None

class odps.models.partition.Partition(**kwargs)[源代码]

A partition is a collection of rows in a table whose partition columns are equal to specific values.

In order to write data into partition, users should call the open_writer method with with statement. At the same time, the open_reader method is used to provide the ability to read records from a partition. The behavior of these methods are the same as those in Table class except that there are no 'partition' params.

change_partition_spec(new_partition_spec, async_=False, hints=None)[源代码]

Change partition spec of current partition.

参数:

new_partition_spec -- new partition spec

drop(async_=False, if_exists=False)[源代码]

Drop this partition.

参数:
  • async -- run asynchronously if True

  • if_exists

返回:

None

head(limit, columns=None)[源代码]

Get the head records of a partition

参数:
  • limit -- records' size, 10000 at most

  • columns (list) -- the columns which is subset of the table columns

返回:

records

返回类型:

list

iter_pandas(columns=None, batch_size=None, start=None, count=None, quota_name=None, append_partitions=None, tags=None, **kwargs)[源代码]

Read partition data into pandas DataFrame

参数:
  • columns (list) -- columns to read

  • batch_size (int) -- size of DataFrame batch to read

  • start (int) -- start row index from 0

  • count (int) -- data count to read

  • quota_name (str) -- name of tunnel quota to use

  • append_partitions (bool) -- if True, partition values will be appended to the output

open_reader(**kw)[源代码]

Open the reader to read the entire records from this partition.

参数:
  • reopen (bool) -- the reader will reuse last one, reopen is true means open a new reader.

  • endpoint -- the tunnel service URL

  • compress_option (odps.tunnel.CompressOption) -- compression algorithm, level and strategy

  • compress_algo -- compression algorithm, work when compress_option is not provided, can be zlib, snappy

  • compress_level -- used for zlib, work when compress_option is not provided

  • compress_strategy -- used for zlib, work when compress_option is not provided

返回:

reader, count means the full size, status means the tunnel status

Example:

>>> with partition.open_reader() as reader:
>>>     count = reader.count  # How many records of a partition
>>>     for record in reader[0: count]:
>>>         # read all data, actually better to split into reading for many times
set_storage_tier(storage_tier, async_=False, hints=None)[源代码]

Set storage tier of current partition.

to_df()[源代码]

Create a PyODPS DataFrame from this partition.

返回:

DataFrame object

to_pandas(columns=None, start=None, count=None, n_process=1, quota_name=None, append_partitions=None, tags=None, **kwargs)[源代码]

Read partition data into pandas DataFrame

参数:
  • columns (list) -- columns to read

  • start (int) -- start row index from 0

  • count (int) -- data count to read

  • n_process (int) -- number of processes to accelerate reading

  • quota_name (str) -- name of tunnel quota to use

  • append_partitions (bool) -- if True, partition values will be appended to the output

touch(async_=False, hints=None)[源代码]

Update the last modified time of the partition.

truncate(async_=False)[源代码]

Truncate current partition.

class odps.models.Instance(**kwargs)[源代码]

Instance means that a ODPS task will sometimes run as an instance.

status can reflect the current situation of a instance. is_terminated method indicates if the instance has finished. is_successful method indicates if the instance runs successfully. wait_for_success method will block the main process until the instance has finished.

For a SQL instance, we can use open_reader to read the results.

Example:

>>> instance = odps.execute_sql('select * from dual')  # this sql return the structured data
>>> with instance.open_reader() as reader:
>>>     # handle the record
>>>
>>> instance = odps.execute_sql('desc dual')  # this sql do not return structured data
>>> with instance.open_reader() as reader:
>>>    print(reader.raw)  # just return the raw result
exception DownloadSessionCreationError(msg, request_id=None, code=None, host_id=None, instance_id=None, endpoint=None, tag=None, response_headers=None)[源代码]
class Status(value)[源代码]
class Task(**kwargs)[源代码]

Task stands for each task inside an instance.

It has a name, a task type, the start to end time, and a running status.

class TaskProgress(**kwargs)[源代码]

TaskProgress reprents for the progress of a task.

A single TaskProgress may consist of several stages.

Example:

>>> progress = instance.get_task_progress('task_name')
>>> progress.get_stage_progress_formatted_string()
2015-11-19 16:39:07 M1_Stg1_job0:0/0/1[0%]  R2_1_Stg1_job0:0/0/1[0%]
class TaskStatus(value)[源代码]
class TaskSummary(*args, **kwargs)[源代码]
get_logview_address(hours=None, use_legacy=None)[源代码]

Get logview address of the instance object by hours.

参数:

hours

返回:

logview address

返回类型:

str

get_sql_task_cost()[源代码]

Get cost information of the sql cost task, including input data size, number of UDF, Complexity of the sql task.

NOTE that DO NOT use this function directly as it cannot be applied to instances returned from SQL. Use o.execute_sql_cost instead.

返回:

cost info in dict format

get_task_cost(task_name=None)[源代码]

Get task cost

参数:

task_name -- name of the task

返回:

task cost

返回类型:

Instance.TaskCost

Example:

>>> cost = instance.get_task_cost(instance.get_task_names()[0])
>>> cost.cpu_cost
200
>>> cost.memory_cost
4096
>>> cost.input_size
0
get_task_detail(task_name=None)[源代码]

Get task's detail

参数:

task_name -- task name

返回:

the task's detail

返回类型:

list or dict according to the JSON

get_task_detail2(task_name=None, **kw)[源代码]

Get task's detail v2

参数:

task_name -- task name

返回:

the task's detail

返回类型:

list or dict according to the JSON

get_task_info(task_name, key, raise_empty=False)[源代码]

Get task related information.

参数:
  • task_name -- name of the task

  • key -- key of the information item

  • raise_empty -- if True, will raise error when response is empty

返回:

a string of the task information

get_task_names(retry=True, timeout=None)[源代码]

Get names of all tasks

返回:

task names

返回类型:

list

get_task_progress(task_name=None)[源代码]

Get task's current progress

参数:

task_name -- task_name

返回:

the task's progress

返回类型:

odps.models.Instance.Task.TaskProgress

get_task_quota(task_name=None)[源代码]

Get queueing info of the task. Note that time between two calls should larger than 30 seconds, otherwise empty dict is returned.

参数:

task_name -- name of the task

返回:

quota info in dict format

get_task_result(task_name=None, timeout=None, retry=True)[源代码]

Get a single task result.

参数:

task_name -- task name

返回:

task result

返回类型:

str

get_task_results(timeout=None, retry=True)[源代码]

Get all the task results.

返回:

a dict which key is task name, and value is the task result as string

返回类型:

dict

get_task_statuses(retry=True, timeout=None)[源代码]

Get all tasks' statuses

返回:

a dict which key is the task name and value is the odps.models.Instance.Task object

返回类型:

dict

get_task_summary(task_name=None)[源代码]

Get a task's summary, mostly used for MapReduce.

参数:

task_name -- task name

返回:

summary as a dict parsed from JSON

返回类型:

dict

get_task_workers(task_name=None, json_obj=None)[源代码]

Get workers from task :param task_name: task name :param json_obj: json object parsed from get_task_detail2 :return: list of workers

get_worker_log(log_id, log_type, size=0)[源代码]

Get logs from worker.

参数:
  • log_id -- id of log, can be retrieved from details.

  • log_type -- type of logs. Possible log types contains coreinfo, hs_err_log, jstack, pstack, stderr, stdout, waterfall_summary

  • size -- length of the log to retrieve

返回:

log content

is_running(retry=True, blocking=False, retry_timeout=None)[源代码]

If this instance is still running.

返回:

True if still running else False

返回类型:

bool

is_successful(retry=True, retry_timeout=None)[源代码]

If the instance runs successfully.

返回:

True if successful else False

返回类型:

bool

is_terminated(retry=True, blocking=False, retry_timeout=None)[源代码]

If this instance has finished or not.

返回:

True if finished else False

返回类型:

bool

iter_pandas(columns=None, limit=None, batch_size=None, start=None, count=None, quota_name=None, tags=None, **kwargs)[源代码]

Iterate table data in blocks as pandas DataFrame. The limit argument follows definition of open_reader API.

参数:
  • columns (list) -- columns to read

  • limit (bool) -- if True, enable the limitation

  • batch_size (int) -- size of DataFrame batch to read

  • start (int) -- start row index from 0

  • count (int) -- data count to read

  • quota_name (str) -- name of tunnel quota to use

open_reader(*args, **kwargs)[源代码]

Open the reader to read records from the result of the instance. If tunnel is True, instance tunnel will be used. Otherwise conventional routine will be used. If instance tunnel is not available and tunnel is not specified, the method will fall back to the conventional routine. Note that the number of records returned is limited unless options.limited_instance_tunnel is set to True or limit=True is configured under instance tunnel mode. Otherwise the number of records returned is always limited.

参数:
  • tunnel -- if true, use instance tunnel to read from the instance. if false, use conventional routine. if absent, options.tunnel.use_instance_tunnel will be used and automatic fallback is enabled.

  • limit (bool) -- if True, enable the limitation

  • reopen (bool) -- the reader will reuse last one, reopen is true means open a new reader.

  • endpoint -- the tunnel service URL

  • compress_option (odps.tunnel.CompressOption) -- compression algorithm, level and strategy

  • compress_algo -- compression algorithm, work when compress_option is not provided, can be zlib, snappy

  • compress_level -- used for zlib, work when compress_option is not provided

  • compress_strategy -- used for zlib, work when compress_option is not provided

返回:

reader, count means the full size, status means the tunnel status

Example:

>>> with instance.open_reader() as reader:
>>>     count = reader.count  # How many records of a table or its partition
>>>     for record in reader[0: count]:
>>>         # read all data, actually better to split into reading for many times
put_task_info(task_name, key, value, check_location=False, raise_empty=False)[源代码]

Put information into a task.

参数:
  • task_name -- name of the task

  • key -- key of the information item

  • value -- value of the information item

  • check_location -- raises if Location header is missing

  • raise_empty -- if True, will raise error when response is empty

stop()[源代码]

Stop this instance.

返回:

None

to_pandas(columns=None, limit=None, start=None, count=None, n_process=1, quota_name=None, tags=None, **kwargs)[源代码]

Read instance data into pandas DataFrame. The limit argument follows definition of open_reader API.

参数:
  • columns (list) -- columns to read

  • limit (bool) -- if True, enable the limitation

  • start (int) -- start row index from 0

  • count (int) -- data count to read

  • n_process (int) -- number of processes to accelerate reading

  • quota_name (str) -- name of tunnel quota to use

wait_for_completion(interval=1, timeout=None, max_interval=None, blocking=True)[源代码]

Wait for the instance to complete, and neglect the consequence.

参数:
  • interval -- time interval to check

  • max_interval -- if specified, next check interval will be multiplied by 2 till max_interval is reached.

  • timeout -- time

  • blocking -- whether to block waiting at server side. Note that this option does not affect client behavior.

返回:

None

wait_for_success(interval=1, timeout=None, max_interval=None, blocking=True)[源代码]

Wait for instance to complete, and check if the instance is successful.

参数:
  • interval -- time interval to check

  • max_interval -- if specified, next check interval will be multiplied by 2 till max_interval is reached.

  • timeout -- time

  • blocking -- whether to block waiting at server side. Note that this option does not affect client behavior.

返回:

None

Raise:

odps.errors.ODPSError if the instance failed

class odps.models.Resource(**kwargs)[源代码]

Resource is useful when writing UDF or MapReduce. This is an abstract class.

Basically, resource can be either a file resource or a table resource. File resource can be file, py, jar, archive in details.

class Type(value)[源代码]
class odps.models.FileResource(**kwargs)[源代码]

File resource represents for a file.

Use open method to open this resource as a file-like object.

class Mode(value)[源代码]
close()[源代码]

Close this file resource.

返回:

None

flush()[源代码]

Commit the change to ODPS if any change happens. Close will do this automatically.

返回:

None

open(mode='r', encoding='utf-8', stream=False, overwrite=None)[源代码]

The argument mode stands for the open mode for this file resource. It can be binary mode if the 'b' is inside. For instance, 'rb' means opening the resource as read binary mode while 'r+b' means opening the resource as read+write binary mode. This is most import when the file is actually binary such as tar or jpeg file, so be aware of opening this file as a correct mode.

Basically, the text mode can be 'r', 'w', 'a', 'r+', 'w+', 'a+' just like the builtin python open method.

  • r means read only

  • w means write only, the file will be truncated when opening

  • a means append only

  • r+ means read+write without constraint

  • w+ will truncate first then opening into read+write

  • a+ can read+write, however the written content can only be appended to the end

参数:
  • mode -- the mode of opening file, described as above

  • encoding -- utf-8 as default

  • stream -- open in stream mode

  • overwrite -- if True, will overwrite existing resource. True by default.

返回:

file-like object

Example:

>>> with resource.open('r') as fp:
>>>     fp.read(1)  # read one unicode character
>>>     fp.write('test')  # wrong, cannot write under read mode
>>>
>>> with resource.open('wb') as fp:
>>>     fp.readlines()  # wrong, cannot read under write mode
>>>     fp.write('hello world')  # write bytes
>>>
>>> with resource.open('test_resource', 'r+') as fp:  # open as read-write mode
>>>     fp.seek(5)
>>>     fp.truncate()
>>>     fp.flush()
read(size=-1)[源代码]

Read the file resource, read all as default.

参数:

size -- unicode or byte length depends on text mode or binary mode.

返回:

unicode or bytes depends on text mode or binary mode

返回类型:

str or unicode(Py2), bytes or str(Py3)

readline(size=-1)[源代码]

Read a single line.

参数:

size -- If the size argument is present and non-negative, it is a maximum byte count (including the trailing newline) and an incomplete line may be returned. When size is not 0, an empty string is returned only when EOF is encountered immediately

返回:

unicode or bytes depends on text mode or binary mode

返回类型:

str or unicode(Py2), bytes or str(Py3)

readlines(sizehint=-1)[源代码]

Read as lines.

参数:

sizehint -- If the optional sizehint argument is present, instead of reading up to EOF, whole lines totalling approximately sizehint bytes (possibly after rounding up to an internal buffer size) are read.

返回:

lines

返回类型:

list

seek(pos, whence=0)[源代码]

Seek to some place.

参数:
  • pos -- position to seek

  • whence -- if set to 2, will seek to the end

返回:

None

tell()[源代码]

Tell the current position

返回:

current position

truncate(size=None)[源代码]

Truncate the file resource's size.

参数:

size -- If the optional size argument is present, the file is truncated to (at most) that size. The size defaults to the current position.

返回:

None

write(content)[源代码]

Write content into the file resource

参数:

content -- content to write

返回:

None

writelines(seq)[源代码]

Write lines into the file resource.

参数:

seq -- lines

返回:

None

class odps.models.PyResource(**kwargs)[源代码]

File resource representing for the .py file.

class odps.models.JarResource(**kwargs)[源代码]

File resource representing for the .jar file.

class odps.models.ArchiveResource(**kwargs)[源代码]

File resource representing for the compressed file like .zip/.tgz/.tar.gz/.tar/jar

class odps.models.TableResource(**kwargs)[源代码]

Take a table as a resource.

open_reader(**kwargs)[源代码]

Open reader on the table resource

open_writer(**kwargs)[源代码]

Open writer on the table resource

property partition

Get the source table partition.

返回:

the source table partition

property table

Get the table object.

返回:

source table

返回类型:

odps.models.Table

update(table_project_name=None, table_schema_name=None, table_name=None, *args, **kw)[源代码]

Update this resource.

参数:
  • table_project_name -- the source table's project

  • table_name -- the source table's name

  • partition -- the source table's partition

返回:

self

class odps.models.Function(**kwargs)[源代码]

Function can be used in UDF when user writes a SQL.

drop()[源代码]

Delete this Function.

返回:

None

property resources

Return all the resources which this function refer to.

返回:

resources

返回类型:

list

update()[源代码]

Update this function.

返回:

None

class odps.models.Worker(**kwargs)[源代码]

Worker information class for worker information and log retrieval.

get_log(log_type, size=0)[源代码]

Get logs from worker.

参数:
  • log_type -- type of logs. Possible log types contains coreinfo, hs_err_log, jstack, pstack, stderr, stdout, waterfall_summary

  • size -- length of the log to retrieve

返回:

log content

class odps.models.ml.OfflineModel(**kwargs)[源代码]

Representing an ODPS offline model.

copy(new_name, new_project=None, async_=False)[源代码]

Copy current model into a new location.

参数:
  • new_name -- name of the new model

  • new_project -- new project name. if absent, original project name will be used

  • async -- if True, return the copy instance. otherwise return the newly-copied model

get_model()[源代码]

Get PMML text of the current model. Note that model file obtained via this method might be incomplete due to size limitations.

class odps.models.security.User(**kwargs)[源代码]