Exploration & Observation Fractions
It's always struck me as a surprising limitation of BECCA that the Exploration and observation fractions in the planner are fixed constants. It seems to me that for practically all tasks, these ratios ought to be weighted by the agent's learning.
While playing with BECCA tonight, it occurred to me that the Observation step provides valuable information that could be used to weight the Exploration fraction -- that is, observation allow the agent to measure the stochasticity of it's environment. I.e., "if I do nothing, and my inputs are the same as before, is my reward the same as before?" If not, we increase our stochaticity measure. If taking no action (observation) leads to no change in inputs or rewards, stochaticity is low.
When stochastic is high, we should explore more, because it's more likely I'll stumble on a strategy that works even better, or its more likely that my current strategy will cease to be effective.
Is there anything like this in BECCA already that I'm not aware of?
Is there any reason not to weight the exploration fraction based on this kind of stochasiticty measure?