## Problem solving

Constraint satisfaction is the art of finding values for variables in a way that satisfies various constraints. Many everyday problems can be usefully couched in these terms. Initial symbolic Artificial Intelligence (AI) approaches attempted to solve constraint-satisfaction problems in a hard fashion – either the constraints were perfectly satisfied or there was no solution. Due to its limited modeling capacity, this approach was not particularly appealing to computational modelers in psychology who intuited that the problems people face often have no perfect solutions that satisfy all constraints. Consequently, psychological approaches more often turned to the soft constraint satisfaction of artificial neural networks. Newer AI work on constraint satisfaction has made considerable progress in building computational systems that optimize solutions to hard problems even when it is impossible to satisfy all constraints.

Hanjie puzzles (also called nonograms) are good examples of constraint-satisfaction problems. Hanjie solvers need to determine whether each cell of a rectangular array is empty or filled, given some constraints. These constraints take the form of series of numbers at the head of each line (row or column) indicating the size of blocks of contiguous filled cells found on that line. Here for example, the first row requires 2 blocks of 2 filled cells, whereas row 5 contains no block of filled cells. Blocks have to be separated by at least one empty cell. At the beginning, the state of all cells is unknown (portrayed in grey), and the goal is to determine if each cell is empty (white) or filled (black), while satisfying all of the numerical constraints. On the right is a solution that satisfies all of the block-size constraints. My colleague Dr. Frédéric Dandurand has created a fascinating program that automatically solves such Hanjie puzzles. His “smart solver” uses heuristics to determine the ordering of which lines (rows or columns) to solve. The program allows comparison with a brute-force depth-first search technique.

Although the ability to solve problems has received a lot of attention, the learning of problem-solving skills has been relatively neglected. We studied learning to solve Hanjie problems by comparing the performance of three algorithms used to select the order in which to solve lines (rows or columns) with a reinforcement-learning (RL) based solver. The RL solver used a measure of reduction of distance to goal as a reward. We found that RL solvers learn near-optimal solutions that outperform a heuristic solver based on the explicit, verbal rules often given to Hanjie players. Only RL solvers that used a neural-network to remember the quality of actions generalized their knowledge to generate good solutions. In contrast, storing this quality information in a lookup table did not enable generalization.

Hanjie puzzles (also called nonograms) are good examples of constraint-satisfaction problems. Hanjie solvers need to determine whether each cell of a rectangular array is empty or filled, given some constraints. These constraints take the form of series of numbers at the head of each line (row or column) indicating the size of blocks of contiguous filled cells found on that line. Here for example, the first row requires 2 blocks of 2 filled cells, whereas row 5 contains no block of filled cells. Blocks have to be separated by at least one empty cell. At the beginning, the state of all cells is unknown (portrayed in grey), and the goal is to determine if each cell is empty (white) or filled (black), while satisfying all of the numerical constraints. On the right is a solution that satisfies all of the block-size constraints. My colleague Dr. Frédéric Dandurand has created a fascinating program that automatically solves such Hanjie puzzles. His “smart solver” uses heuristics to determine the ordering of which lines (rows or columns) to solve. The program allows comparison with a brute-force depth-first search technique.

Although the ability to solve problems has received a lot of attention, the learning of problem-solving skills has been relatively neglected. We studied learning to solve Hanjie problems by comparing the performance of three algorithms used to select the order in which to solve lines (rows or columns) with a reinforcement-learning (RL) based solver. The RL solver used a measure of reduction of distance to goal as a reward. We found that RL solvers learn near-optimal solutions that outperform a heuristic solver based on the explicit, verbal rules often given to Hanjie players. Only RL solvers that used a neural-network to remember the quality of actions generalized their knowledge to generate good solutions. In contrast, storing this quality information in a lookup table did not enable generalization.