||With the ever-increasing flow of information, the need for computer-automated tools for handling information becomes greater and greater. News shows and other television broadcasts carry vast amounts of information, but generally in forms that are not reachable through conventional information retrieval techniques. During the last couple of years, researchers around the world have given considerable attention to the problem of extracting semantic information from television and re-representing it in a form suitable for automatic indexing and searching. Detailed content information in a television broadcast is typically found in teletext subtitles (if such are available), text imbedded in the video image, and in the spoken dialogue. Extracting it involves using techniques associated with signal processing, image analysis, artificial intelligence, speech recognition, etc. A reliable filter system for use in e.g. Scandinavia, where only a few of the television broadcasts are teletext-subtitled, must take advantage of information in all three forms, or modalities. The presented report contains a survey of current research projects in this area and a theoretical design of a modular, multi-modal, content-based television filter, based on findings in the survey. A prototype of a module for extracting and recognizing image-embedded text has been designed and implemented in matlab. The prototype operates on DCT-compressed video frames (e.g. JPEG, MPEG) and uses statistics, heuristics, image processing and neural network technology. When applied to subtitles embedded in the image, the prototype outputs recognized text with a total character error rate of 5%.