The Complete Guide to Managing Datastore on Google App Engine for Java using Bulk Loader
>> 14 February 2011
You can download a pdf copy here: http://apps.java-javafx.com/download.jsp
The bulk loader tool can upload and download data to and from your application's datastore, or development server.
The bulk loader tool communicates with your application running on App Engine, or development server, using remote_api, a request handler included with the App Engine runtime environment. This way the Bulk Loader can access the datastore using the remote application.
Data Format
SDK Version
1.3.8 or sup
Configuration File
Installing a Bulk Loader
The bulk loader is available via the "appcfg.py" command. To have "appcfg.py", you must download and install the Python SDK of Google App Engine.
- Download Python SDK
- Unzip. This will give "google_appengine" folder
- You can call "appcfg.py" directly from the "google_appengine" folder or add this folder to your path
- A quick test is to tape in a terminal : ./appcfg.py (Linux).
- Add RemoteApi to your URI mapping. In war/WEB-INF/web.xml, add these lines
<servlet>
<servlet-name>RemoteApi</servlet-name>
<servlet- class>com.google.apphosting.utils.remoteapi.RemoteApiServlet
</servlet-class>
</servlet>
<servlet-mapping>
<servlet-name>RemoteApi</servlet-name>
<url-pattern>/remote_api</url-pattern>
</servlet-mapping>
- Add a security constraint (only admin has access to remote_api). Add the following lines to web.xml
<web-resource-name>remote-api</web-resource-name>
<web-resource-collection>
<url-pattern>/remote_api</url-pattern>
</web-resource-collection>
<auth-constraint>
<role-name>admin</role-name>
</auth-constraint>
</security-constraint>
- Upload your application to Google App Engine Server
- I advise you to create a folder for your data and configuration file (something like Data) inside "google_appengine" folder, to keep them from the other content.
Note:
The handler servlet class is : com.google.apphosting.utils.remoteapi.RemoteApiServlet.
Download and Upload Entities without configuration
Download
Commands:
All Entities of all kinds
All Entities of all kinds
appcfg.py download_data --application=<app-id> --url=http://<appname>.appspot.com/[remote_api_path] --filename=<data-filename>
Only one kind
appcfg.py download_data --application=<app-id> --kind=<kind> --url=http://<appname>.appspot.com/[remote_api_path] --filename=<data-filename>
Download only one kind from development server
appcfg.py download_data --application=<app-id> --kind=<kind> --url=http://localhost:8888/[remote_api_path] –filename=<data-filename>
Note:The resulting data file are in special format, suitable for backup and restoring
Downloading all entities of all kinds only works on App Engine, and does not work with the development server (download one kind works fine). Uploading all entities of all kinds can be done in development server
When data are downloaded, the entities are stored along with their original keys. When the data is uploaded, the original keys are used. If an entity exists in the datastore with the same key as an entity being uploaded, the entity in the datastore is replaced.
You can use upload_data to replace the data in the app from which it was dumped, or you can use it to upload the data to a different application. Please note here that keys may not be the same as the original application.
Example
Download entities of one kind without configuration.
In Datastore I have one kind "Person", with only one entity (or record)
package com.java_javafx.ka;
import com.google.appengine.api.datastore.Key;
import javax.jdo.annotations.*;
/**
* @author Kaesar ALNIJRES
*
*/
@PersistenceCapable(identityType = IdentityType.APPLICATION)
public class Person {
@PrimaryKey
@Persistent(valueStrategy = IdGeneratorStrategy.IDENTITY)
private Key key;
@Persistent
private String name="";
@Persistent
private String firstName="";
/**
* @return the key
*/
public Key getKey() {
return key;
}
/**
* @param key the key to set
*/
public void setKey(Key key) {
this.key = key;
}
/**
* @return the name
*/
public String getName() {
return name;
}
/**
* @param name the name to set
*/
public void setName(String name) {
this.name = name;
}
/**
* @return the firstName
*/
public String getFirstName() {
return firstName;
}
/**
* @param firstName the firstName to set
*/
public void setFirstName(String firstName) {
this.firstName = firstName;
}
}
Development datastore with Person
Go to the "google_appengine" folder and type :
./appcfg.py download_data --application=gaej-12examples --kind=Person --url=http://gaej-12examples.appspot.com/remote_api --filename=Data/Person
Note:
- gaej-12examples, is my own application ID
- This is a direct call to "appcfg.py", don't forget to use the current directory (.) for Linux
- In --filename=Data/Person. I have created a folder inside "google_appengine", to keep data. Give the file name any name you want.
Dowloading data from application server
Example
Download one kind from development server
In a terminal (linux) type :
./appcfg.py download_data --application=gaej-12examples --kind=Person --url=http://localhost:8888/remote_api --filename=Data/dev_Personne
Downloading one kind from development server
Trying to download all kinds from development server returns an error
Upload Data
To upload data to the app's datastore from a file created by appcfg.py download_data, run the following command:
All entities of all kinds
appcfg.py upload_data --application=<app-id> --filename=<data-filename> --url=http://<appname>.appspot.com/[remote_api_path]
Only one kind
appcfg.py upload_data --application=<app-id> --kind=<kind> --filename=<data-filename> --url=http://<appname>.appspot.com/[remote_api_path]
Loading Data Into the Development Server (all kinds)
appcfg.py upload_data --application=<app-id> --filename=<data-filename> --url=http://localhost:8080/[remote_api_path]
Loading Data Into the Development Server (one kind)
appcfg.py upload_data --application=<app-id> --kind=<kind> --filename=<data-filename> --url=http://localhost:8080/[remote_api_path]
Example :
Will upload one kind downloaded in the previous example
In a terminal (Linux) type :
./appcfg.py upload_data --application=gaej-12examples --kind=Person --filename=Data/Person --url=http://gaej-12examples.appspot.com/remote_api
Note:
Remember the downloaded data is "Person" in "Data" folder
Uploading to application server, kind Person
Example
Uploading to development server
In a terminal (linux) type
./appcfg.py upload_data --application=gaej-12examples --kind=Person --filename=Data/Person --url=http://localhost:8888/remote_api
Note:
For localhost use admin for "Email" and nothing as password
Datastore Viewer before uploading the data (development server)
Uploading data to development server
Now data are in datastore
Bulk Loader Configuration File
You can configure the bulk loader with a configuration file that specifies the format of uploaded and downloaded data. You can specify for instance that the downloaded data be in CSV format.
The bulk loader can generate a configuration file based on your app's datastore. This configuration file must then be edited to suite your needs. The most important parts of this file is the "transformers" (how the data should be transformed upon input and output). The generated file is based on the datastore statistics, this one must be up-to-date to get a correct result.
Generate an automatic configuration file
Command
appcfg.py create_bulkloader_config --filename=<file name>.yaml --url=http://<appname>.appspot.com/remote_api
Note:
Generate a configuration file only works on App Engine, and not on the development server.
Example
Always from the same Datastore, used for the test, create a configuration file
In a terminal (linux) type :
./appcfg.py create_bulkloader_config --filename=Data/ka_config.yaml --url=http://gaej-12examples.appspot.com/remote_api
Create a configuration file
Edit the configuration file
As mentioned above the interesting part here is the "transformers",one set for each kind. You need to specify the format of input or output for each "kind", namely the "connector" item (instead of "TODO:"). The fields "connector" must be filled before you can use this file.A connector can be csv, xml (import and export), or simpletext (for export only)
It's also possible to give some options to each connector, like for example, giving a different external name for a property.
For an automatic key generation on import you can remove the entire __key__ property from the property map. The key will be omitted on export.
ka_config.yaml (before edition)
# Autogenerated bulkloader.yaml file.
# You must edit this file before using it. TODO: Remove this line when done.
# At a minimum address the items marked with TODO:
# * Fill in connector and connector_options
# * Review the property_map.
# - Ensure the 'external_name' matches the name of your CSV column,
# XML tag, etc.
# - Check that __key__ property is what you want. Its value will become
# the key name on import, and on export the value will be the Key
# object. If you would like automatic key generation on import and
# omitting the key on export, you can remove the entire __key__
# property from the property map.
# If you have module(s) with your model classes, add them here. Also
# change the kind properties to model_class.
python_preamble:
- import: base64
- import: re
- import: google.appengine.ext.bulkload.transform
- import: google.appengine.ext.bulkload.bulkloader_wizard
- import: google.appengine.ext.db
- import: google.appengine.api.datastore
- import: google.appengine.api.users
transformers:
- kind: Person
connector: # TODO: Choose a connector here: csv, simplexml, etc...
connector_options:
# TODO: Add connector options here--these are specific to each connector.
property_map:
- property: __key__
external_name: key
export_transform: transform.key_id_or_name_as_string
- property: firstName
external_name: firstName
# Type: String Stats: 1 properties of this type in this kind.
- property: name
external_name: name
# Type: String Stats: 1 properties of this type in this kind.
ka_config.yaml (after edition)
Only a "connector" is specified. All default options stay the same
# Autogenerated bulkloader.yaml file.
# You must edit this file before using it. TODO: Remove this line when done.
# At a minimum address the items marked with TODO:
# * Fill in connector and connector_options
# * Review the property_map.
# - Ensure the 'external_name' matches the name of your CSV column,
# XML tag, etc.
# - Check that __key__ property is what you want. Its value will become
# the key name on import, and on export the value will be the Key
# object. If you would like automatic key generation on import and
# omitting the key on export, you can remove the entire __key__
# property from the property map.
# If you have module(s) with your model classes, add them here. Also
# change the kind properties to model_class.
python_preamble:
- import: base64
- import: re
- import: google.appengine.ext.bulkload.transform
- import: google.appengine.ext.bulkload.bulkloader_wizard
- import: google.appengine.ext.db
- import: google.appengine.api.datastore
- import: google.appengine.api.users
transformers:
- kind: Person
connector: csv
connector_options:
# TODO: Add connector options here--these are specific to each connector.
property_map:
- property: __key__
external_name: key
export_transform: transform.key_id_or_name_as_string
- property: firstName
external_name: firstName
# Type: String Stats: 1 properties of this type in this kind.
- property: name
external_name: name
# Type: String Stats: 1 properties of this type in this kind.
Download and Upload Entities Using Configuration File
It's now obvious, that using a configuration file for downloading and uploading data, gives us the opportunity to choose the format of data to import or export. It's also possible to customize some options for properties (connector and connector options)
Download Data
Command :
Download a kind
appcfg.py download_data --application=<app-id> --filename=<data-filename>.csv --url=http://<appname>.appspot.com/[remote_api_path] --config_file=<config-filename>.yaml
Note:
- kind argument is required for this operation
- Although it's not possible to create a configuration file using the development server, it's possible to download data from it using a configuration file created from the application server
Example:
Download data from app's server, using config file
In a terminal (linux) type:
./appcfg.py download_data --config_file=Data/ka_config.yaml --filename=Data/data.csv --kind=Person --url=http://gaej-12examples.appspot.com/remote_api
Download data (csv)
data.csv
firstName,key,name
Kaesar,1,ALNIJRES
Kaesar,1,ALNIJRES
Uploading
To upload data to the app's datastore from a file downloaded using download_data, and a configuration file.
Command
Upload a kind to the Application Server
appcfg.py upload_data --application=<app-id> --filename=<data-filename>.csv --url=http://<appname>.appspot.com/[remote_api_path] --config_file=<config-filename>.yaml –kind=<kind-name>
Upload a kind to the development server
appcfg.py upload_data --application=<app-id> --filename=<data-filename>.csv –url=http://localhost:8888/[remote_api_path] --config_file=<config-filename>.yaml –kind=<kind-name>
Note:
- kind argument is required
- Application ID is required to upload to the development server
Example
Uploading to the development server using a configuration file and data file
In a terminal (linux) type :
./appcfg.py upload_data --filename=Data/data2.csv --url=http://localhost:8888/remote_api --config_file=Data/ka_config.yaml --application=gaej-12examples --kind=Person
Upload and add Entities
It's possible, using a configuration file and a data file to add entities
Example:
We upload data to application server using the downloaded file (edited), and the configuration file
Edit Data/data.csv
Add one entity to our kind Person
firstName,key,name
Kaesar,1,ALNIJRES
no,2,nobody
In a terminal (linux) type :
./appcfg.py upload_data --config_file=Data/ka_config.yaml --filename=Data/data.csv --url=http://gaej-12examples.appspot.com/remote_api --kind=Person
Datastore with the new entity
Create a new Kind and Entities
Using the bulk loader, a configuration file and a data file, it's possible to create a new kind on the datastore and populate it with entities.
In this case we upload new data using an edited config_file
Example :
We want to add a new Kind "Location", this contains a "street" and "city" and upload it to development server ( works with application server also)
1-First we add "Location" with its properties to our config_file in "transformers: " section
ka_confing.yaml
# Autogenerated bulkloader.yaml file.
# You must edit this file before using it. TODO: Remove this line when done.
# At a minimum address the items marked with TODO:
# * Fill in connector and connector_options
# * Review the property_map.
# - Ensure the 'external_name' matches the name of your CSV column,
# XML tag, etc.
# - Check that __key__ property is what you want. Its value will become
# the key name on import, and on export the value will be the Key
# object. If you would like automatic key generation on import and
# omitting the key on export, you can remove the entire __key__
# property from the property map.
# If you have module(s) with your model classes, add them here. Also
# change the kind properties to model_class.
python_preamble:
- import: base64
- import: re
- import: google.appengine.ext.bulkload.transform
- import: google.appengine.ext.bulkload.bulkloader_wizard
- import: google.appengine.ext.db
- import: google.appengine.api.datastore
- import: google.appengine.api.users
transformers:
- kind: Person
connector: csv
connector_options:
# TODO: Add connector options here--these are specific to each connector.
property_map:
- property: __key__
external_name: key
export_transform: transform.key_id_or_name_as_string
- property: firstName
external_name: firstName
# Type: String Stats: 2 properties of this type in this kind.
- property: name
external_name: name
# Type: String Stats: 2 properties of this type in this kind.
#---------------------------------------------
- kind: Location
connector: csv
connector_options:
# TODO: Add connector options here--these are specific to each connector.
property_map:
- property: street
external_name: street
- property: city
external_name: city
2-Provide a CSV with "street" and "city"In a file (something like location.csv)
location.csv
street,city
myStreet,myCity
In a terminal (linux) type:
./appcfg.py upload_data --filename=Data/location.csv --url=http://localhost:8888/remote_api --config_file=Data/ka_config.yaml --application=gaej-12examples –kind=Location
Upload data file to create the "Location"
Location is now in Datastore
0 comments:
Post a Comment