Configuration
PyODPS provides a series of configuration options, which can be obtained through odps.options
. Here is a simple example:
from odps import options
# configure lifecycle for all output tables (option lifecycle)
options.lifecycle = 30
# handle string type as bytes when downloading with Tunnel (option tunnel.string_as_binary)
options.tunnel.string_as_binary = True
# get more records when sorting the DataFrame with MaxCompute
options.df.odps.sort.limit = 100000000
The following lists configurable MaxCompute options:
General configurations
Option |
Description |
Default value |
---|---|---|
endpoint |
MaxCompute Endpoint |
None |
default_project |
Default project |
None |
logview_host |
LogView host name |
None |
logview_hours |
LogView holding time (hours) |
24 |
local_timezone |
Used time zone. None indicates that PyODPS takes no actions, True indicates local time, and False indicates UTC. The time zone of pytz package can also be used. |
None |
lifecycle |
Life cycles of all tables |
None |
verify_ssl |
Verify SSL certificate of the server end |
True |
temp_lifecycle |
Life cycles of temporary tables |
1 |
biz_id |
User ID |
None |
verbose |
Whether to print logs |
False |
verbose_log |
Log receiver |
None |
chunk_size |
Size of write buffer |
65536 |
retry_times |
Request retry times |
4 |
pool_connections |
Number of cached connections in the connection pool |
10 |
pool_maxsize |
Maximum capacity of the connection pool |
10 |
connect_timeout |
Connection time-out |
120 |
read_timeout |
Read time-out |
120 |
api_proxy |
Proxy address for APIs |
None |
data_proxy |
Proxy address for data transfer |
None |
completion_size |
Limit on the number of object complete listing items |
10 |
table_auto_flush_time |
Data submission interval when uploading data with |
150 |
display.notebook_widget |
Use interactive plugins |
True |
sql.settings |
Global hints for MaxCompute SQL |
None |
sql.use_odps2_extension |
Enable MaxCompute 2.0 language extension |
None |
sql.always_enable_schema |
Enable Schema level under any scenario |
None |
Data upload/download configurations
Option |
Description |
Default value |
---|---|---|
tunnel.endpoint |
Tunnel Endpoint |
None |
tunnel.use_instance_tunnel |
Use Instance Tunnel to obtain the execution result |
True |
tunnel.limit_instance_tunnel |
Limit the number of results obtained by Instance Tunnel |
None |
tunnel.string_as_binary |
Use bytes instead of unicode in the string type |
False |
tunnel.quota_name |
Name of the tunnel quota to use |
False |
tunnel.block_buffer_size |
Buffer size for block tunnel writers |
20 * 1024 ** 2 |
DataFrame configurations
Option |
Description |
Default value |
---|---|---|
interactive |
Whether in an interactive environment |
Depend on the detection value |
df.analyze |
Whether to enable non-MaxCompute built-in functions |
True |
df.optimize |
Whether to enable DataFrame overall optimization |
True |
df.optimizes.pp |
Whether to enable DataFrame predicate push down optimization |
True |
df.optimizes.cp |
Whether to enable DataFrame column tailoring optimization |
True |
df.optimizes.tunnel |
Whether to enable DataFrame tunnel optimization |
True |
df.quote |
Whether to use `` to mark fields and table names at the end of MaxCompute SQL |
True |
df.image |
Image name that is used for DataFrame running |
None |
df.libraries |
Third-party library (resource name) that is used for DataFrame running |
None |
df.supersede_libraries |
Use uploaded package resource to supersede the version provided by MaxCompute |
True |
df.odps.sort.limit |
Limit count when |
10000 |