User experience enhancement

Command line

PyODPS provides an enhanced command line experience. Once you have set up your account the first time, you no longer need to re-enter account information at logon.

from odps.inter import setup, enter, teardown

Configure your account as follows:

import os
# Make sure environment variable ALIBABA_CLOUD_ACCESS_KEY_ID already set to Access Key ID of user
# while environment variable ALIBABA_CLOUD_ACCESS_KEY_SECRET set to Access Key Secret of user.
# Not recommended to hardcode Access Key ID or Access Key Secret in your code.
setup(
    os.getenv('ALIBABA_CLOUD_ACCESS_KEY_ID'),
    os.getenv('ALIBABA_CLOUD_ACCESS_KEY_SECRET'),
    '**your-project**',
    endpoint='**your-endpoint**',
)

When the ``room``parameter has not been specified, the ``default``room is set.

You can then call the enter method to create a Room object in any Python interactive interface, as follows:

room = enter()

You can retrieve the ODPS object as follows:

o = room.odps

o.get_table('dual')

odps.Table
  name: odps_test_sqltask_finance.`dual`
  schema:
    c_int_a                 : bigint
    c_int_b                 : bigint
    c_double_a              : double
    c_double_b              : double
    c_string_a              : string
    c_string_b              : string
    c_bool_a                : boolean
    c_bool_b                : boolean
    c_datetime_a            : datetime
    c_datetime_b            : datetime

Note

Note: The ODPS object is not automatically updated when you change the setup of the room. You need to call enter() again to retrieve the new Room object.

You can store commonly used MaxCompute tables or resources in the room.

room.store('stored-table', o.get_table('dual'), desc='Simple Table Store Example')

You can call the displaymethod to print the stored objects in a table format:

room.display()

default	desc
name
stored-table	Simple Table Store Example
iris	Anderson's Iris data

You can use room['stored-table']or room.iristo retrieve the stored objects.

room['stored-table']

odps.Table
  name: odps_test_sqltask_finance.`dual`
  schema:
    c_int_a                 : bigint
    c_int_b                 : bigint
    c_double_a              : double
    c_double_b              : double
    c_string_a              : string
    c_string_b              : string
    c_bool_a                : boolean
    c_bool_b                : boolean
    c_datetime_a            : datetime
    c_datetime_b            : datetime

You can call the drop method to delete objects.

room.drop('stored-table')

room.display()

default	desc
name
iris	Anderson's Iris data

You can delete a room by calling teardown. When no parameter has been specified, the default room is deleted.

teardown()

IPython

PyODPS provides IPython plugins to make it easy to perform MaxCompute operations.

Some commands are provided for command line enhancement. You can load the plugins as follows:

%load_ext odps

%enter

<odps.inter.Room at 0x11341df10>

Now the global o and odps variables can be retrieved, as follows:

o.get_table('dual')
odps.get_table('dual')

odps.Table
  name: odps_test_sqltask_finance.`dual`
  schema:
    c_int_a                 : bigint
    c_int_b                 : bigint
    c_double_a              : double
    c_double_b              : double
    c_string_a              : string
    c_string_b              : string
    c_bool_a                : boolean
    c_bool_b                : boolean
    c_datetime_a            : datetime
    c_datetime_b            : datetime

%stores

default	desc
name
iris	Anderson's Iris data

Object name completion

PyODPS enhances the code completion feature that is provided by IPython. When you are writing statements such as o.get_xxx, the object name is automatically completed.

For example, enter the following statement in IPython. Note that <tab> is not the actual input character. When the entry is complete, move the cursor to the end of the statement and click the Tab button.

o.get_table(<tab>

If you know the first few characters of the object name, for example:

o.get_table('tabl<tab>

IPython automatically enters a table name starting with tabl.

This feature can also complete the names of objects from different projects.

o.get_table(project='project_name', name='tabl<tab>
o.get_table('tabl<tab>', project='project_name')

A list is given when multiple matching objects exist. The maximum length of the list is set by options.completion_size. The default is 10.

SQL statements

PyODPS provides SQL plugins to execute MaxCompute SQL statements. The following is a single-line SQL statement.

%sql select * from pyodps_iris limit 5

	sepallength	sepalwidth	petallength	petalwidth	name
0	5.1	3.5	1.4	0.2	Iris-setosa
1	4.9	3.0	1.4	0.2	Iris-setosa
2	4.7	3.2	1.3	0.2	Iris-setosa
3	4.6	3.1	1.5	0.2	Iris-setosa
4	5.0	3.6	1.4	0.2	Iris-setosa

You can use %%sql to execute multi-line SQL statements.

%%sql

select * from pyodps_iris
where sepallength < 5
limit 5

	sepallength	sepalwidth	petallength	petalwidth	name
0	4.9	3.0	1.4	0.2	Iris-setosa
1	4.7	3.2	1.3	0.2	Iris-setosa
2	4.6	3.1	1.5	0.2	Iris-setosa
3	4.6	3.4	1.4	0.3	Iris-setosa
4	4.4	2.9	1.4	0.2	Iris-setosa

To execute parameterized SQL statements, you can use :parameterto specify the parameter.

In [1]: %load_ext odps

In [2]: mytable = 'dual'

In [3]: %sql select * from :mytable
|==========================================|   1 /  1  (100.00%)         2s
Out[3]:
   c_int_a  c_int_b  c_double_a  c_double_b  c_string_a  c_string_b c_bool_a  \
0        0        0       -1203           0           0       -1203     True

  c_bool_b         c_datetime_a         c_datetime_b
0    False  2012-03-30 23:59:58  2012-03-30 23:59:59

For SQL runtime parameters, you can use %set to set a global parameter, or use set within a SQL Cell to set a local parameter.

In [17]: %%sql
         set odps.sql.mapper.split.size = 16;
         select * from pyodps_iris;

The example above sets a local parameter.

In [18]: %set odps.sql.mapper.split.size = 16

The example above sets a global parameter.

Upload pandas DataFrame to MaxCompute tables

PyODPS provides the following command to upload pandas DataFrame objects to MaxCompute tables:

import pandas as pd
import numpy as np

df = pd.DataFrame(np.arange(9).reshape(3, 3), columns=list('abc'))

%persist df pyodps_pandas_df

The first parameter dfis the variable name. pyodps_pandas_dfis the MaxCompute table name.