I am trying to use an early experimental release of mapper implementation to empty the datastore. This solution was proposed in a similar SO question.
This is the AppEngineMapper I am currently using. It just deletes the entity.
public class EmptyFixesMapper extends AppEngineMapper<Key, Entity, NullWritable, NullWritable> {
public EmptyFixesMapper() {
}
@Override
public void taskSetup(Context context) {
}
@Override
public void taskCleanup(Context context) {
}
@Override
public void setup(Context context) throws IOException, InterruptedException {
super.setup(context);
}
@Override
public void cleanup(Context context) {
getAppEngineContext(context).flush();
}
@Override
public void map(Key key, Entity value, Context context) {
log.warning("Mapping key: " + key);
DatastoreMutationPool mutationPool =
this.getAppEngineContext(context).getMutationPool();
mutationPool.delete(value.getKey());
}
}
This is my mapreduce.xml configuration file:
<configurations>
<configuration name="Empty Entities">
<property>
<name>mapreduce.map.class</name>
<value>com.google.appengine.demos.mapreduce.EmptyFixesMapper</value>
</property>
<property>
<name>mapreduce.inputformat.class</name>
<value>com.google.appengine.tools.mapreduce.DatastoreInputFormat</value>
</property>
<property>
<name human="Entity Kind to Map Over">mapreduce.mapper.inputformat.datastoreinputformat.entitykind</name>
<value template="optional">Fix</value>
</property>
</configuration>
...
When I enter the the mapreduce control panel in mydomain/mapreduce/status, I can launch the tasks, but they never complete. This is the screenshot where you can see a field "0/0 shards":
And I can see some tasks are created in the appengine default task queue, with a lot of retries:
An finally, in my GAE application logs I see:
1. 09-11 03:23AM 08.556 /mapreduce/mapperCallback 500 10081ms 0cpu_ms 0kb AppEngine-Google; (+http://code.google.com/appengine)
0.1.0.2 - - [11/Sep/2010:03:23:18 -0700] "POST
/mapreduce/mapperCallback HTTP/1.1" 500 0 "http://xxx.appspot.com/mapreduce/command/start_job" "AppEngine-Google; (+http://code.google.com/appengine)" xxx.appspot.com" ms=10081 cpu_ms=0 api_cpu_ms=0 cpm_usd=0.000057 queue_name=default task_name=worker-attempt-1284198892815-0001-m-000002-1--0
2. W 09-11 03:23AM 18.638
Request was aborted after waiting too long to attempt to service
your request. This may happen sporadically when the App Engine serving cluster is under unexpectedly high or uneven load. If you see this message frequently, please contact the App Engine team.
What could be happening? I'm sure I've followed steps described in the getting started guide, and I have less than 1000 entities in the datastore...