Internal reward (task 3) progress week 2
Work is underway to test QI as an alternative to distance for the internal reward for lifetime learning.
Secondly, we will endeavour to generate robot traces (i.e., sensori-motor logs) labelled as containing more or less desirable behaviour, where 'desirable' may be seen as walking far or maybe 'naturallly'. These logs can then be used as a basis for preference-based learning and indirectly define a fitness function of desirable sensori-motor states.
page revision: 2, last edited: 01 Dec 2011 10:04