This paper proposes a model for trail detection and tracking that builds upon the observation that trails are salient structures in the robot's visual field. Due to the complexity of natural environments, the straightforward application of bottom-up visual saliency models is not sufficiently robust to predict the location of trails. As for other detection tasks, robustness can be increased by modulating the saliency computation based on a priori knowledge about which pixel-wise visual features are most representative of the object being sought. This paper proposes the use of the object's overall layout as the primary cue instead, as it is more stable and predictable in natural trails. Bearing in mind computational parsimony and detection robustness, this knowledge is specified in terms of perception-action rules, which control the behavior of simple agents performing as a swarm to compute the saliency map of the input image. For the purpose of tracking, multiframe evidence about the trail location is obtained with a motion-compensated dynamic neural field. In addition, to reduce ambiguity between the trail and trail-like distractors, a simple appearance model is learned online and used to influence the agents' activity. Experimental results on a large data set reveal the ability of the model to produce a success rate on the order of 97% at 20 Hz. The model is shown to be robust in situations where previous models would fail, such as when the trail does not emerge from the lower part of the image or when it is considerably interrupted.