User experience enhancement
Command line
PyODPS provides an enhanced command line experience. Once you have set up your account the first time, you no longer need to re-enter account information at logon.
from odps.inter import setup, enter, teardown
Configure your account as follows:
import os
# Make sure environment variable ALIBABA_CLOUD_ACCESS_KEY_ID already set to Access Key ID of user
# while environment variable ALIBABA_CLOUD_ACCESS_KEY_SECRET set to Access Key Secret of user.
# Not recommended to hardcode Access Key ID or Access Key Secret in your code.
setup(
os.getenv('ALIBABA_CLOUD_ACCESS_KEY_ID'),
os.getenv('ALIBABA_CLOUD_ACCESS_KEY_SECRET'),
'**your-project**',
endpoint='**your-endpoint**',
)
When the ``room``parameter has not been specified, the ``default``room is set.
You can then call the enter method to create a Room object in any Python interactive interface, as follows:
room = enter()
You can retrieve the ODPS object as follows:
o = room.odps
o.get_table('dual')
odps.Table
name: odps_test_sqltask_finance.`dual`
schema:
c_int_a : bigint
c_int_b : bigint
c_double_a : double
c_double_b : double
c_string_a : string
c_string_b : string
c_bool_a : boolean
c_bool_b : boolean
c_datetime_a : datetime
c_datetime_b : datetime
Note
Note: The ODPS object is not automatically updated when you change the setup of the room. You need to call enter() again to retrieve the new Room object.
You can store commonly used MaxCompute tables or resources in the room.
room.store('stored-table', o.get_table('dual'), desc='Simple Table Store Example')
You can call the display
method to print the stored objects in a table format:
room.display()
default | desc |
---|---|
name | |
stored-table | Simple Table Store Example |
iris | Anderson's Iris data |
You can use room['stored-table']
or room.iris
to retrieve the stored objects.
room['stored-table']
odps.Table
name: odps_test_sqltask_finance.`dual`
schema:
c_int_a : bigint
c_int_b : bigint
c_double_a : double
c_double_b : double
c_string_a : string
c_string_b : string
c_bool_a : boolean
c_bool_b : boolean
c_datetime_a : datetime
c_datetime_b : datetime
You can call the drop method to delete objects.
room.drop('stored-table')
room.display()
default | desc |
---|---|
name | |
iris | Anderson's Iris data |
You can delete a room by calling teardown. When no parameter has been specified, the default room is deleted.
teardown()
IPython
PyODPS provides IPython plugins to make it easy to perform MaxCompute operations.
Some commands are provided for command line enhancement. You can load the plugins as follows:
%load_ext odps
%enter
<odps.inter.Room at 0x11341df10>
Now the global o and odps variables can be retrieved, as follows:
o.get_table('dual')
odps.get_table('dual')
odps.Table
name: odps_test_sqltask_finance.`dual`
schema:
c_int_a : bigint
c_int_b : bigint
c_double_a : double
c_double_b : double
c_string_a : string
c_string_b : string
c_bool_a : boolean
c_bool_b : boolean
c_datetime_a : datetime
c_datetime_b : datetime
%stores
default | desc |
---|---|
name | |
iris | Anderson's Iris data |
Object name completion
PyODPS enhances the code completion feature that is provided by IPython. When you are writing statements such as o.get_xxx
, the object name is automatically completed.
For example, enter the following statement in IPython. Note that <tab> is not the actual input character. When the entry is complete, move the cursor to the end of the statement and click the Tab button.
o.get_table(<tab>
If you know the first few characters of the object name, for example:
o.get_table('tabl<tab>
IPython automatically enters a table name starting with tabl.
This feature can also complete the names of objects from different projects.
o.get_table(project='project_name', name='tabl<tab>
o.get_table('tabl<tab>', project='project_name')
A list is given when multiple matching objects exist. The maximum length of the list is set by options.completion_size
. The default is 10.
SQL statements
PyODPS provides SQL plugins to execute MaxCompute SQL statements. The following is a single-line SQL statement.
%sql select * from pyodps_iris limit 5
sepallength | sepalwidth | petallength | petalwidth | name | |
---|---|---|---|---|---|
0 | 5.1 | 3.5 | 1.4 | 0.2 | Iris-setosa |
1 | 4.9 | 3.0 | 1.4 | 0.2 | Iris-setosa |
2 | 4.7 | 3.2 | 1.3 | 0.2 | Iris-setosa |
3 | 4.6 | 3.1 | 1.5 | 0.2 | Iris-setosa |
4 | 5.0 | 3.6 | 1.4 | 0.2 | Iris-setosa |
You can use %%sql
to execute multi-line SQL statements.
%%sql
select * from pyodps_iris
where sepallength < 5
limit 5
sepallength | sepalwidth | petallength | petalwidth | name | |
---|---|---|---|---|---|
0 | 4.9 | 3.0 | 1.4 | 0.2 | Iris-setosa |
1 | 4.7 | 3.2 | 1.3 | 0.2 | Iris-setosa |
2 | 4.6 | 3.1 | 1.5 | 0.2 | Iris-setosa |
3 | 4.6 | 3.4 | 1.4 | 0.3 | Iris-setosa |
4 | 4.4 | 2.9 | 1.4 | 0.2 | Iris-setosa |
To execute parameterized SQL statements, you can use :parameter
to specify the parameter.
In [1]: %load_ext odps
In [2]: mytable = 'dual'
In [3]: %sql select * from :mytable
|==========================================| 1 / 1 (100.00%) 2s
Out[3]:
c_int_a c_int_b c_double_a c_double_b c_string_a c_string_b c_bool_a \
0 0 0 -1203 0 0 -1203 True
c_bool_b c_datetime_a c_datetime_b
0 False 2012-03-30 23:59:58 2012-03-30 23:59:59
For SQL runtime parameters, you can use %set
to set a global parameter, or use set within a SQL Cell to set a local parameter.
In [17]: %%sql
set odps.sql.mapper.split.size = 16;
select * from pyodps_iris;
The example above sets a local parameter.
In [18]: %set odps.sql.mapper.split.size = 16
The example above sets a global parameter.
Upload pandas DataFrame to MaxCompute tables
PyODPS provides the following command to upload pandas DataFrame objects to MaxCompute tables:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.arange(9).reshape(3, 3), columns=list('abc'))
%persist df pyodps_pandas_df
The first parameter df
is the variable name. pyodps_pandas_df
is the MaxCompute table name.