Subtask 3 - Internal Reward

status 12 Dec 2011 - 12 Dec 2011 09:07

INRIA analysed logs from VU experiments and provided code for a preference-based fitness model using only accelerometer. We will commence testing this week.
Finalised testing QI vs Distance as internal reward function in webots; although a distance-based internal reward significantly outperforms QI, QI does seem to provide a viable alternative to distance. Distance seems to result in a more twisty path.
Further testing in robo3d (courtesy of Jean-Marc).
Tags: | Comments: 0

status 5 Dec 2011 - 05 Dec 2011 09:16

VU is generating logs to analyse for preference-based learning.
Ran tests with QI as internal reward (based on gps first, now enhanced with accelerometer and joint force-feedback - not separately tuned, though); leads to comparable results as using distance as a reward.
Tags: | Comments: 0

Internal reward (task 3) progress week 2 - 21 Nov 2011 01:35

Work is underway to test QI as an alternative to distance for the internal reward for lifetime learning.

Secondly, we will endeavour to generate robot traces (i.e., sensori-motor logs) labelled as containing more or less desirable behaviour, where 'desirable' may be seen as walking far or maybe 'naturallly'. These logs can then be used as a basis for preference-based learning and indirectly define a fitness function of desirable sensori-motor states.
Tags: | Comments: 0

Sub-taskforce 3: internal reward - 12 Nov 2011 09:55

plan for the next couple of weeks

In addition to earlier trials with distance travelled as internal reward for the adaptation of the organism-mode controllers, VU will perform trials with QI as internal reward, and -if time permits- with a combination of QI and distance. The methods will be compared in terms of the area of an arena explored as well as in terms of distance travelled. The VU team will try combinations of sensory input (distance sensors, GPS measurements) and actuator positions as input for the QI calculations.

Christopher remarks:
"[I]f QI/curiosity works then I am fine with that choice because it can work on the robot without hacks. I see distance as the fallback plan since it's easy to implement on the simulator"

We are still discussing further alternatives related to weighted QI (possibly evolving the weights, for instance).
Tags: sub-task-3 workplan | Comments: 0

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License