User experience enhancement

Command line

PyODPS provides an enhanced command line experience. Once you have set up your account the first time, you no longer need to re-enter account information at logon.

from odps.inter import setup, enter, teardown

Configure your account as follows:

setup('**your-access_id**', '**your-access-key**', '**your-project**', endpoint='**your-endpoint**')

When the ``room``parameter has not been specified, the ``default``room is set.

You can then call the enter method to create a Room object in any Python interactive interface, as follows:

room = enter()

You can retrieve the ODPS object as follows:

o = room.odps
o.get_table('dual')
odps.Table
  name: odps_test_sqltask_finance.`dual`
  schema:
    c_int_a                 : bigint
    c_int_b                 : bigint
    c_double_a              : double
    c_double_b              : double
    c_string_a              : string
    c_string_b              : string
    c_bool_a                : boolean
    c_bool_b                : boolean
    c_datetime_a            : datetime
    c_datetime_b            : datetime

Note

Note: The ODPS object is not automatically updated when you change the setup of the room. You need to call enter() again to retrieve the new Room object.

You can store commonly used MaxCompute tables or resources in the room.

room.store('stored-table', o.get_table('dual'), desc='Simple Table Store Example')

You can call the displaymethod to print the stored objects in a table format:

room.display()
default desc
name
stored-table Simple Table Store Example
iris Anderson's Iris data

You can use room['stored-table']or room.iristo retrieve the stored objects.

room['stored-table']
odps.Table
  name: odps_test_sqltask_finance.`dual`
  schema:
    c_int_a                 : bigint
    c_int_b                 : bigint
    c_double_a              : double
    c_double_b              : double
    c_string_a              : string
    c_string_b              : string
    c_bool_a                : boolean
    c_bool_b                : boolean
    c_datetime_a            : datetime
    c_datetime_b            : datetime

You can call the drop method to delete objects.

room.drop('stored-table')
room.display()
default desc
name
iris Anderson's Iris data

You can delete a room by calling teardown. When no parameter has been specified, the default room is deleted.

teardown()

IPython

PyODPS provides IPython plugins to make it easy to perform MaxCompute operations.

Some commands are provided for command line enhancement. You can load the plugins as follows:

%load_ext odps
%enter
<odps.inter.Room at 0x11341df10>

Now the global o and odps variables can be retrieved, as follows:

o.get_table('dual')
odps.get_table('dual')
odps.Table
  name: odps_test_sqltask_finance.`dual`
  schema:
    c_int_a                 : bigint
    c_int_b                 : bigint
    c_double_a              : double
    c_double_b              : double
    c_string_a              : string
    c_string_b              : string
    c_bool_a                : boolean
    c_bool_b                : boolean
    c_datetime_a            : datetime
    c_datetime_b            : datetime
%stores
default desc
name
iris Anderson's Iris data

Object name completion

PyODPS enhances the code completion feature that is provided by IPython. When you are writing statements such as o.get_xxx, the object name is automatically completed.

For example, enter the following statement in IPython. Note that <tab> is not the actual input character. When the entry is complete, move the cursor to the end of the statement and click the Tab button.

o.get_table(<tab>

If you know the first few characters of the object name, for example:

o.get_table('tabl<tab>

IPython automatically enters a table name starting with tabl.

This feature can also complete the names of objects from different projects.

o.get_table(project='project_name', name='tabl<tab>
o.get_table('tabl<tab>', project='project_name')

A list is given when multiple matching objects exist. The maximum length of the list is set by options.completion_size. The default is 10.

SQL statements

PyODPS provides SQL plugins to execute MaxCompute SQL statements. The following is a single-line SQL statement.

%sql select * from pyodps_iris limit 5
sepallength sepalwidth petallength petalwidth name
0 5.1 3.5 1.4 0.2 Iris-setosa
1 4.9 3.0 1.4 0.2 Iris-setosa
2 4.7 3.2 1.3 0.2 Iris-setosa
3 4.6 3.1 1.5 0.2 Iris-setosa
4 5.0 3.6 1.4 0.2 Iris-setosa

You can use %%sql to execute multi-line SQL statements.

%%sql

select * from pyodps_iris
where sepallength < 5
limit 5
sepallength sepalwidth petallength petalwidth name
0 4.9 3.0 1.4 0.2 Iris-setosa
1 4.7 3.2 1.3 0.2 Iris-setosa
2 4.6 3.1 1.5 0.2 Iris-setosa
3 4.6 3.4 1.4 0.3 Iris-setosa
4 4.4 2.9 1.4 0.2 Iris-setosa

To execute parameterized SQL statements, you can use :parameterto specify the parameter.

In [1]: %load_ext odps

In [2]: mytable = 'dual'

In [3]: %sql select * from :mytable
|==========================================|   1 /  1  (100.00%)         2s
Out[3]:
   c_int_a  c_int_b  c_double_a  c_double_b  c_string_a  c_string_b c_bool_a  \
0        0        0       -1203           0           0       -1203     True

  c_bool_b         c_datetime_a         c_datetime_b
0    False  2012-03-30 23:59:58  2012-03-30 23:59:59

For SQL runtime parameters, you can use %set to set a global parameter, or use set within a SQL Cell to set a local parameter.

In [17]: %%sql
         set odps.sql.mapper.split.size = 16;
         select * from pyodps_iris;

The example above sets a local parameter.

In [18]: %set odps.sql.mapper.split.size = 16

The example above sets a global parameter.

Upload pandas DataFrame to MaxCompute tables

PyODPS provides the following command to upload pandas DataFrame objects to MaxCompute tables:

import pandas as pd
import numpy as np

df = pd.DataFrame(np.arange(9).reshape(3, 3), columns=list('abc'))
%persist df pyodps_pandas_df

The first parameter dfis the variable name. pyodps_pandas_dfis the MaxCompute table name.

Jupyter Notebook

PyODPS enhances features such as result exploration and progress display in Jupyter Notebook.

Result exploration

PyODPS provides a data exploration feature for SQL Cell and DataFrame in Jupyter Notebook. You can use interactive data exploration tools to browse data and create graphs.

If the execution result is a DataFrame object, PyODPS reads the result and displays it in a paged table. Click a page number or the pre/next button to browse the data.

_images/dfview-data-grid.png

You can choose different graphs to present data on the top of the page. In addition to the data table, a bar chart, pie chart, line chart, and scatter plot are also provided. The following example is a scatter plot created from the default fields, which are the first three fields.

_images/dfview-scatter.png

In a graph mode, you can click the Configuration button on the top right corner to change the settings of a graph. For example, select the name for the grouping column, select the petallength for the X-axis, and select the petalwidth for the Y-axis. As you can see in the following graph, this dimension setting makes it easy to understand the data:

_images/dfview-scatter-opts.png

For bar charts and pie charts, you can select an aggregate function for the value fields. The default function is sum for bar charts, and count for pie charts. You can click the function name after the value fields to change it.

For line charts, the values on the X-axis cannot be null. The graph may not meet expectations.

_images/dfview-pie-aggsel.png

After finishing your graph, click the Download button to save it.

Note

Note: You must have Pandas and ipywidgets properly installed to use this feature.

Progress bar

The execution of large jobs usually takes a long time. Therefore, PyODPS provides progress bars to show the progress of the execution. When DataFrame objects, machine learning jobs, and SQL statements starting with %sql are executing in Jupyter Notebook, a list of these jobs and their overall progress are displayed as follows:

_images/progress_example.png

When you click a job name, a dialog box pops up, displaying the progress of each task in the job.

_images/task_details_dialog.png

After the execution has been completed, a message pops up, displaying whether the job has been successful.

_images/exec_notify.png