Frequently asked questions

如何查看当前使用的 PyODPS 版本

import odps
print(odps.__version__)

Installation failure/error

For more information, see PyODPS installation FAQ (Chinese version only) .

Project not found error

This error is caused by an error in the configuration of Endpoint. For more information, see MaxCompute activation and service connections by region . Check to see if the ODPS object parameter position is correct.

如何手动指定 Tunnel Endpoint

You can create your MaxCompute (ODPS) entrance object with an extra `tunnel_endpoint` parameter, as shown in the following code. Asterisks should be removed.

from odps import ODPS

o = ODPS('**your-access-id**', '**your-secret-access-key**', '**your-default-project**',
         endpoint='**your-end-point**', tunnel_endpoint='**your-tunnel-endpoint**')

An error occurred while reading data: “project is protected”. How can I deal with this error?

The project security policy disables reading data from tables. To retrieve all the data, you can apply the following solutions:

  • Contact the Project Owner to add exceptions.
  • Use DataWorks or other masking tool to mask the data and export the data as an unprotected project before reading.

To retrieve part of the data, you can apply the following solutions:

  • Use o.execute_sql('select * from <table_name>').open_reader()
  • Use DataFrame, o.get_table('<table_name>').to_df()

An error occurred while using IPython and Jupyter: ImportError. How can I deal with this error?

If running from odps import errors does not fix the error, you need to execute pip install -U jupyter to install the ipython component.

I can only retrieve a maximum of 10,000 items of data by executing SQL command open_reader. How can I retrieve more than 10,000 items of data?

Use create table as select ... to save the SQL execution result to a table, and use table.open_reader to read data.

An error occurred while uploading pandas DataFrame to MaxCompute ODPS: ODPSError: ODPS entrance should be provided. How can I deal with this error?

You need to set the ODPS object to global in one of the three following ways:

  • When using room mechanism , %enter , configure the global ODPS object.
  • Call the to_global method when using the ODPS object.
  • Use the MaxCompute parameter DataFrame(pd_df).persist('your_table', odps=odps).

How can I use max_pt in DataFrame?

Use the odps.df.func module to call the built-in functions of MaxCompute.

from odps.df import func
df = o.get_table('your_table').to_df()
df[df.ds == func.max_pt('your_project.your_table')]  # ds is a partition column

Error “table lifecycle is not specified in mandatory mode” occurred when persisting DataFrame to table

Your project requires that every table should be created with a lifecycle. Thus you should run the code below every time you run your own code.

from odps import options
options.lifecycle = 7  # or your expected lifecycle in days

Error “Please add put { “odps.sql.submit.mode” : “script”} for multi-statement query in settings” occurred when executing SQL scripts

Please read set runtime parameters for more information.

How to enumerate rows in PyODPS DataFrame

We do not support enumerating over every row in PyODPS DataFrame. As PyODPS DataFrame mainly focuses on handling huge amount of data, enumerating over every row means low efficiency and is discouraged. We recommend using `apply` or `map_reduce` methods of DataFrame to parallelize your enumerations. Details can be found in this article . If you are sure that your code cannot be parallelized using methods listed above, and the cost of enumeration is tolerable, you may use `to_pandas` to convert your DataFrame into Pandas, or persist your DataFrame into a MaxCompute table and read it via `read_table` method or table tunnel.