java.net member

Rechercher dans ce site

The Complete Guide to Managing Datastore on Google App Engine for Java using Bulk Loader

>> 14 February 2011


You can download a pdf copy here: http://apps.java-javafx.com/download.jsp

The bulk loader tool can upload and download data to and from your application's datastore, or development server.

The bulk loader tool communicates with your application running on App Engine, or development server, using remote_api, a request handler included with the App Engine runtime environment. This way the Bulk Loader can access the datastore using the remote application.

Data Format

CSV, XML, and text files(Format is customized using a configuration file). The default is a special format (No need for configuration file)

SDK Version

1.3.8 or sup

Configuration File

A configuration file can be generated automatically using the "bulk loader" tool. This file must be edited to suit your needs

Installing a Bulk Loader

Please note that you must have python 2.5 or sup, installed and configured.
The bulk loader is available via the "appcfg.py" command. To have "appcfg.py", you must download and install the Python SDK of Google App Engine.

  • Download Python SDK
  • Unzip. This will give "google_appengine" folder
  • You can call "appcfg.py" directly from the "google_appengine" folder or add this folder to your path
  • A quick test is to tape in a terminal : ./appcfg.py (Linux).
       You'll get some messages about using this program
  • Add RemoteApi to your URI mapping. In war/WEB-INF/web.xml, add these lines
     web.xml

     <servlet>
     <servlet-name>RemoteApi</servlet-name>
     <servlet-  class>com.google.apphosting.utils.remoteapi.RemoteApiServlet
    </servlet-class>
   </servlet>
      
   <servlet-mapping>
   <servlet-name>RemoteApi</servlet-name>
   <url-pattern>/remote_api</url-pattern>
   </servlet-mapping>


  • Add a security constraint (only admin has access to remote_api). Add the following lines to web.xml
      <security-constraint>

      <web-resource-name>remote-api</web-resource-name>
      <web-resource-collection>
             <url-pattern>/remote_api</url-pattern>
     </web-resource-collection>
     <auth-constraint>
              <role-name>admin</role-name>
    </auth-constraint>
           
    </security-constraint>



  • Upload your application to Google App Engine Server
  • I advise you to create a folder for your data and configuration file (something like Data) inside "google_appengine" folder, to keep them from the other content.

Note:
The handler servlet class is :    com.google.apphosting.utils.remoteapi.RemoteApiServlet.

Download and Upload Entities without configuration

Download
Commands:

All Entities of all kinds

appcfg.py download_data --application=<app-id> --url=http://<appname>.appspot.com/[remote_api_path] --filename=<data-filename>

Only one kind
appcfg.py download_data --application=<app-id> --kind=<kind> --url=http://<appname>.appspot.com/[remote_api_path] --filename=<data-filename>

Download only one kind from development server
appcfg.py download_data --application=<app-id> --kind=<kind> --url=http://localhost:8888/[remote_api_path] –filename=<data-filename>

Note:
The resulting data file are in special format, suitable for backup and restoring
Downloading all entities of all kinds only works on App Engine, and does not work with the development server (download one kind works fine). Uploading all entities of all kinds can be done in development server

When data are downloaded, the entities are stored along with their original keys. When the data is uploaded, the original keys are used. If an entity exists in the datastore with the same key as an entity being uploaded, the entity in the datastore is replaced.
You can use upload_data to replace the data in the app from which it was dumped, or you can use it to upload the data to a different application. Please note here that keys may not be the same as the original application.


Example

Download entities of one kind without configuration.
In Datastore I have one kind "Person", with only one entity (or record)
package com.java_javafx.ka;

import com.google.appengine.api.datastore.Key;
import javax.jdo.annotations.*;



/**
 * @author Kaesar ALNIJRES
 *
 */
@PersistenceCapable(identityType = IdentityType.APPLICATION)
public class Person {
   
    @PrimaryKey
    @Persistent(valueStrategy = IdGeneratorStrategy.IDENTITY)
    private Key key;

    @Persistent
    private String name="";
   
    @Persistent
    private String firstName="";

    /**
     * @return the key
     */
    public Key getKey() {
        return key;
    }

    /**
     * @param key the key to set
     */
    public void setKey(Key key) {
        this.key = key;
    }

    /**
     * @return the name
     */
    public String getName() {
        return name;
    }

    /**
     * @param name the name to set
     */
    public void setName(String name) {
        this.name = name;
    }

    /**
     * @return the firstName
     */
    public String getFirstName() {
        return firstName;
    }

    /**
     * @param firstName the firstName to set
     */
    public void setFirstName(String firstName) {
        this.firstName = firstName;
    }

   

   
}


Development datastore with Person

In a terminal (Linux)
Go to the "google_appengine" folder and type :

./appcfg.py download_data --application=gaej-12examples --kind=Person --url=http://gaej-12examples.appspot.com/remote_api --filename=Data/Person

Note:
  • gaej-12examples, is my own application ID
  • This is a direct call to "appcfg.py", don't forget to use the current directory (.) for Linux
  • In --filename=Data/Person. I have created a folder inside "google_appengine", to keep data. Give the file name any name you want.

Dowloading data from application server

Example

Download one kind from development server

In a terminal (linux) type :

./appcfg.py download_data --application=gaej-12examples --kind=Person --url=http://localhost:8888/remote_api --filename=Data/dev_Personne



Downloading one kind from development server

Trying to download all kinds from development server returns an error

Upload Data

Command

To upload data to the app's datastore from a file created by appcfg.py download_data, run the following command:

All entities of all kinds

appcfg.py upload_data --application=<app-id> --filename=<data-filename> --url=http://<appname>.appspot.com/[remote_api_path]

Only one kind

appcfg.py upload_data --application=<app-id> --kind=<kind> --filename=<data-filename> --url=http://<appname>.appspot.com/[remote_api_path]

Loading Data Into the Development Server (all kinds)

appcfg.py upload_data --application=<app-id> --filename=<data-filename> --url=http://localhost:8080/[remote_api_path]

Loading Data Into the Development Server (one kind)

appcfg.py upload_data --application=<app-id> --kind=<kind> --filename=<data-filename> --url=http://localhost:8080/[remote_api_path]

Example :
Will upload one kind downloaded in the previous example

In a terminal (Linux) type :

./appcfg.py upload_data --application=gaej-12examples --kind=Person --filename=Data/Person --url=http://gaej-12examples.appspot.com/remote_api

Note:
Remember the downloaded data is "Person" in "Data" folder


Uploading to application server, kind Person

Example

Uploading to development server

In a terminal (linux) type


./appcfg.py upload_data --application=gaej-12examples --kind=Person --filename=Data/Person --url=http://localhost:8888/remote_api

Note:
For localhost use admin for "Email" and nothing as password


Datastore Viewer before uploading the data (development server)

Uploading data to development server
Now data are in datastore

Bulk Loader Configuration File

You can configure the bulk loader with a configuration file that specifies the format of uploaded and downloaded data. You can specify for instance that the downloaded data be in CSV format.
The bulk loader can generate a configuration file based on your app's datastore. This configuration file must then be edited to suite your needs. The most important parts of this file is the "transformers" (how the data should be transformed upon input and output). The generated file is based on the datastore statistics, this one must be up-to-date to get a correct result.

Generate an automatic configuration file

Command

appcfg.py create_bulkloader_config --filename=<file name>.yaml --url=http://<appname>.appspot.com/remote_api

Note:
Generate a configuration file only works on App Engine, and not on the development server.


Example
Always from the same Datastore, used for the test, create a configuration file

In a terminal (linux) type :

./appcfg.py create_bulkloader_config --filename=Data/ka_config.yaml --url=http://gaej-12examples.appspot.com/remote_api

Create a configuration file

Edit the configuration file 

As mentioned above the interesting part here is the "transformers",one set for each kind. You need to specify the format of input or output for each "kind", namely the "connector" item (instead of "TODO:"). The fields "connector" must be filled  before you can use this file.A connector can be csv, xml (import and export), or simpletext (for export only)
It's also possible to give some options to each connector, like for example, giving a different external name for a property.
For an automatic key generation on import you can remove the entire __key__ property from the property map. The key will be omitted on export.



ka_config.yaml (before edition)

# Autogenerated bulkloader.yaml file.
# You must edit this file before using it. TODO: Remove this line when done.
# At a minimum address the items marked with TODO:
#  * Fill in connector and connector_options
#  * Review the property_map.
#    - Ensure the 'external_name' matches the name of your CSV column,
#      XML tag, etc.
#    - Check that __key__ property is what you want. Its value will become
#      the key name on import, and on export the value will be the Key
#      object.  If you would like automatic key generation on import and
#      omitting the key on export, you can remove the entire __key__
#      property from the property map.

# If you have module(s) with your model classes, add them here. Also
# change the kind properties to model_class.
python_preamble:
- import: base64
- import: re
- import: google.appengine.ext.bulkload.transform
- import: google.appengine.ext.bulkload.bulkloader_wizard
- import: google.appengine.ext.db
- import: google.appengine.api.datastore
- import: google.appengine.api.users

transformers:

- kind: Person
  connector: # TODO: Choose a connector here: csv, simplexml, etc...
  connector_options:
    # TODO: Add connector options here--these are specific to each connector.
  property_map:
    - property: __key__
      external_name: key
      export_transform: transform.key_id_or_name_as_string

    - property: firstName
      external_name: firstName
      # Type: String Stats: 1 properties of this type in this kind.

    - property: name
      external_name: name
      # Type: String Stats: 1 properties of this type in this kind.


ka_config.yaml (after edition)

Only a "connector" is specified. All default options stay the same


# Autogenerated bulkloader.yaml file.
# You must edit this file before using it. TODO: Remove this line when done.
# At a minimum address the items marked with TODO:
#  * Fill in connector and connector_options
#  * Review the property_map.
#    - Ensure the 'external_name' matches the name of your CSV column,
#      XML tag, etc.
#    - Check that __key__ property is what you want. Its value will become
#      the key name on import, and on export the value will be the Key
#      object.  If you would like automatic key generation on import and
#      omitting the key on export, you can remove the entire __key__
#      property from the property map.

# If you have module(s) with your model classes, add them here. Also
# change the kind properties to model_class.
python_preamble:
- import: base64
- import: re
- import: google.appengine.ext.bulkload.transform
- import: google.appengine.ext.bulkload.bulkloader_wizard
- import: google.appengine.ext.db
- import: google.appengine.api.datastore
- import: google.appengine.api.users

transformers:

- kind: Person
  connector: csv
  connector_options:
    # TODO: Add connector options here--these are specific to each connector.
  property_map:
    - property: __key__
      external_name: key
      export_transform: transform.key_id_or_name_as_string

    - property: firstName
      external_name: firstName
      # Type: String Stats: 1 properties of this type in this kind.

    - property: name
      external_name: name
      # Type: String Stats: 1 properties of this type in this kind.


Download and Upload Entities Using Configuration File


It's now obvious, that using a configuration file for downloading and uploading data, gives us the opportunity to choose the format of data to import or export. It's also possible to customize some options for properties (connector and connector options)

Download Data

Command :

Download a kind

appcfg.py download_data --application=<app-id> --filename=<data-filename>.csv --url=http://<appname>.appspot.com/[remote_api_path] --config_file=<config-filename>.yaml

Note:
  • kind argument is required for this operation
  • Although it's not possible to create a configuration file using the development server, it's possible to download data from it using a configuration file created from the application server

Example:

Download data from app's server, using config file
In a terminal (linux) type:


./appcfg.py download_data --config_file=Data/ka_config.yaml --filename=Data/data.csv --kind=Person --url=http://gaej-12examples.appspot.com/remote_api


Download data (csv)


data.csv
firstName,key,name
Kaesar,1,ALNIJRES

Uploading
To upload data to the app's datastore from a file downloaded using download_data, and a configuration file.

Command

Upload a kind to the Application Server

appcfg.py upload_data --application=<app-id> --filename=<data-filename>.csv --url=http://<appname>.appspot.com/[remote_api_path] --config_file=<config-filename>.yaml –kind=<kind-name>

Upload a kind to the development server

appcfg.py upload_data --application=<app-id> --filename=<data-filename>.csv –url=http://localhost:8888/[remote_api_path] --config_file=<config-filename>.yaml –kind=<kind-name>

Note:
  • kind argument is required
  • Application ID is required to upload to the development server

Example

Uploading to the development server using a configuration file and data file

In a terminal (linux) type :

./appcfg.py upload_data --filename=Data/data2.csv --url=http://localhost:8888/remote_api --config_file=Data/ka_config.yaml --application=gaej-12examples --kind=Person

Upload and add Entities

It's possible, using a configuration file and a data file to add entities

Example:

We upload data to application server using the downloaded file (edited), and the configuration file

Edit Data/data.csv
Add one entity to our kind Person

firstName,key,name
Kaesar,1,ALNIJRES
no,2,nobody


In a terminal (linux) type :

./appcfg.py upload_data --config_file=Data/ka_config.yaml --filename=Data/data.csv --url=http://gaej-12examples.appspot.com/remote_api --kind=Person

Datastore with the new entity



Create a new Kind and Entities

Using the bulk loader, a configuration file and a data file, it's possible to create a new kind on the datastore and populate it with entities.
In this case we upload new data using an edited config_file

Example :

We want to add a new Kind "Location", this contains a "street" and "city" and upload it to development server ( works with application server also)

1-First we add "Location" with its properties to our config_file in "transformers: " section

ka_confing.yaml

# Autogenerated bulkloader.yaml file.
# You must edit this file before using it. TODO: Remove this line when done.
# At a minimum address the items marked with TODO:
#  * Fill in connector and connector_options
#  * Review the property_map.
#    - Ensure the 'external_name' matches the name of your CSV column,
#      XML tag, etc.
#    - Check that __key__ property is what you want. Its value will become
#      the key name on import, and on export the value will be the Key
#      object.  If you would like automatic key generation on import and
#      omitting the key on export, you can remove the entire __key__
#      property from the property map.

# If you have module(s) with your model classes, add them here. Also
# change the kind properties to model_class.
python_preamble:
- import: base64
- import: re
- import: google.appengine.ext.bulkload.transform
- import: google.appengine.ext.bulkload.bulkloader_wizard
- import: google.appengine.ext.db
- import: google.appengine.api.datastore
- import: google.appengine.api.users

transformers:

- kind: Person
  connector: csv
  connector_options:
    # TODO: Add connector options here--these are specific to each connector.
  property_map:
    - property: __key__
      external_name: key
      export_transform: transform.key_id_or_name_as_string

    - property: firstName
      external_name: firstName
      # Type: String Stats: 2 properties of this type in this kind.

    - property: name
      external_name: name
      # Type: String Stats: 2 properties of this type in this kind.
      #---------------------------------------------
- kind: Location

  connector: csv

  connector_options:

    # TODO: Add connector options here--these are specific to each connector.

  property_map:

  
    - property: street

      external_name: street

    
    - property: city

      external_name: city

    
 2-Provide a CSV with "street" and "city"In a file (something like location.csv)


location.csv

street,city
myStreet,myCity


In a terminal (linux) type:

./appcfg.py upload_data --filename=Data/location.csv --url=http://localhost:8888/remote_api --config_file=Data/ka_config.yaml --application=gaej-12examples –kind=Location

Upload data file to create the "Location"

Location is now in Datastore

0 comments:

Post a Comment

  © Blogger template Simple n' Sweet by Ourblogtemplates.com 2009

Back to TOP