Configuration

PyODPS provides a series of configuration options, which can be obtained through odps.options. Here is a simple example:

from odps import options
# configure lifecycle for all output tables (option lifecycle)
options.lifecycle = 30
# handle string type as bytes when downloading with Tunnel (option tunnel.string_as_binary)
options.tunnel.string_as_binary = True
# get more records when sorting the DataFrame with MaxCompute
options.df.odps.sort.limit = 100000000

The following lists configurable MaxCompute options:

General configurations

Option

Description

Default value

endpoint

MaxCompute Endpoint

None

default_project

Default project

None

logview_host

LogView host name

None

logview_hours

LogView holding time (hours)

24

local_timezone

Used time zone. None indicates that PyODPS takes no actions, True indicates local time, and False indicates UTC. The time zone of pytz package can also be used.

None

lifecycle

Life cycles of all tables

None

verify_ssl

Verify SSL certificate of the server end

True

temp_lifecycle

Life cycles of temporary tables

1

biz_id

User ID

None

verbose

Whether to print logs

False

verbose_log

Log receiver

None

chunk_size

Size of write buffer

65536

retry_times

Request retry times

4

pool_connections

Number of cached connections in the connection pool

10

pool_maxsize

Maximum capacity of the connection pool

10

connect_timeout

Connection time-out

120

read_timeout

Read time-out

120

api_proxy

Proxy address for APIs

None

data_proxy

Proxy address for data transfer

None

completion_size

Limit on the number of object complete listing items

10

table_auto_flush_time

Data submission interval when uploading data with table.open_writer

150

display.notebook_widget

Use interactive plugins

True

sql.settings

Global hints for MaxCompute SQL

None

sql.use_odps2_extension

Enable MaxCompute 2.0 language extension

None

sql.always_enable_schema

Enable Schema level under any scenario

None

Data upload/download configurations

Option

Description

Default value

tunnel.endpoint

Tunnel Endpoint

None

tunnel.use_instance_tunnel

Use Instance Tunnel to obtain the execution result

True

tunnel.limit_instance_tunnel

Limit the number of results obtained by Instance Tunnel

None

tunnel.string_as_binary

Use bytes instead of unicode in the string type

False

tunnel.quota_name

Name of the tunnel quota to use

False

tunnel.block_buffer_size

Buffer size for block tunnel writers

20 * 1024 ** 2

DataFrame configurations

Option

Description

Default value

interactive

Whether in an interactive environment

Depend on the detection value

df.analyze

Whether to enable non-MaxCompute built-in functions

True

df.optimize

Whether to enable DataFrame overall optimization

True

df.optimizes.pp

Whether to enable DataFrame predicate push down optimization

True

df.optimizes.cp

Whether to enable DataFrame column tailoring optimization

True

df.optimizes.tunnel

Whether to enable DataFrame tunnel optimization

True

df.quote

Whether to use `` to mark fields and table names at the end of MaxCompute SQL

True

df.image

Image name that is used for DataFrame running

None

df.libraries

Third-party library (resource name) that is used for DataFrame running

None

df.supersede_libraries

Use uploaded package resource to supersede the version provided by MaxCompute

True

df.odps.sort.limit

Limit count when sort is performed

10000