Abstract:
Planning is a fundamental cognitive process that enables individuals to navigate complex decision-making scenarios, from daily activities to long-term strategic objectives. It can be conceptualised as a search tree that expands exponentially with the number of available options and the depth of foresight. While computers can leverage increasing processing power to explore vast decision spaces and find optimal solutions, the human mind operates within strict cognitive limitations. Despite these constraints, people routinely plan and make effective decisions - whether it be arranging the next meal or managing their long-term careers. This raises a crucial question: How do individuals plan so efficiently given their cognitive limitations? Prior research suggests that humans rely on heuristics and adaptive planning strategies, yet the mechanisms by which they learn these strategies remain unclear. This dissertation posits that these strategies are acquired through experience via a process known as metacognitive reinforcement learning. To test this proposition, this dissertation investigated two primary sets of hypotheses.
The first set of hypotheses asserted that experience-driven learning governs the adaptation of the amount of planning and planning strategies, as well as the discovery of novel planning strategies. To test these hypotheses, three experiments were conducted. Experiment 1 examined how people regulate the amount of planning, while Experiment 2 explored the adaptation of planning strategies across three different environmental structures. Experiment 3 focused on the discovery of novel planning strategies through learning, as one potential source of where planning strategies originate. Findings from these experiments consistently demonstrated that participants adjusted their planning in accordance with environmental structure, supporting the prediction of experience-driven adaptation.
The second set of hypotheses evaluated whether metacognitive reinforcement learning provides a superior explanation for the observed adaptation compared to alternative theories such as mental habit formation and rational strategy selection learning. To this end, a set of computational models was employed, encompassing purely model-based, purely model-free, and hybrid metacognitive reinforcement learning mechanisms, which contains both model-based and model-free features. Across all three experiments, metacognitive reinforcement learning consistently accounted for a larger proportion of participants' adaptation better compared to alternative theories. Moreover, individual differences in learning were observed, with learners best described by metacognitive reinforcement learning outperforming those best fitted by the mental habit formation model.
To explore whether people employ additional cognitive mechanisms to facilitate planning, additional cognitive mechanisms - intrinsically generated pseudo-reward, subjective effort of planning, and termination deliberation - were integrated into one of the hybrid metacognitive reinforcement learning model resulting in 8 different model variants. Results indicated that participants who were better described by a model variant using pseudo-reward, subjective effort and learning the value of termination exhibited better performance than their counterparts.
Furthermore, subsequent Experiments 4 and 5 validated the methodological approach and examined the effectiveness of model-based metacognitive reinforcement learning. Experiment 4 demonstrated that individuals engaged in model-based learning when direct interaction with the environment was restricted, though hybrid approaches proved to be more effective. Experiment 5 confirmed the robustness of the findings by replicating results in a more naturalistic planning environment.
Overall, this dissertation provides compelling evidence that metacognitive reinforcement learning offers a reasonable explanation of human planning adaptability. The findings contribute to a deeper understanding of the computational principles underlying metacognitive learning and have implications for both cognitive science and artificial intelligence, informing the development of more efficient and human-like planning algorithms.