When using a tablet computer, sketching is a natural approach for users to annotate video scenes. However, when these annotations are done in real-time and overlaid in the video, their context can be lost due to changes in the scene being annotated. We propose an approach towards maintaining the annotations' context, by using object tracking to create anchors onto which further annotations can be attached. To this end, the annotator is capable of using different tracking methods, including a Kinect sensor and/or the TLD object tracking algorithm. The challenges involved in designing an interface to support the association of video annotations with tracked objects in real-time are also discussed. In particular, we discuss our alternative approaches to handle moving object selection on live video, which we have called "Hold and Overlay" and "Hold and Speed Up". In addition, the results of a set of preliminary tests are reported.