Many routine tasks require an agent to perform a series of sequential actions, either to move to a desired end state or to perform a task of indefinite duration as efficiently as possible, and almost every task requires the consideration of multiple objectives, which are frequently in conflict with each other. However, methods for determining that series of actions, or policy, when considering multiple objectives have a number of issues. Some are unable to find many elements in the set of optimal policies, some are dependent on existing domain knowledge provided by an expert, and others have difficultly selecting actions as the number of objectives increases. All of these issues are limiting the use of autonomous agents to successfully complete tasks in complex, uncertain environments that are not well understood at the start of the task.This dissertation proposes the use of voting methods developed in the field of social choice theory to determine optimal policies for sequential decision making problems with many objectives, addressing limitations in methods that rely on scalarization functions and Pareto dominance to create policies. Voting methods are evaluated for action selection and policy evaluation within a model-free reinforcement learning algorithm for episodic problems ranging from two to six objectives in deterministic and stochastic environments, and compared to state of the art methods which use Pareto dominance for these tasks. The results of this analysis show that certain voting methods avoid the shortcomings of existing methods, allowing an agent to find multiple optimal policies in an initially unknown environment without any guidance from an external assistant.
|In Administrative Set: