||Energy-based segmentation methods are well established in document binarization to obtain a globally optimal solution. However they suffer from performance issues when bleed through and background textures are widely prevalent in the digital image. The current approach for text binarization proposes a clustering algorithm as a preprocessing stage to an energy-based segmentation method. It uses a clustering algorithm to obtain a coarse estimate of the background and foreground pixels. These estimates are then used as a prior for the source and sink points of a graph cut implementation, which is used to efficiently find the minimum energy solution of an objective function to separate the background and foreground. The binary image thus obtained is used to refine the edge map that guides the graph cut algorithm.
A theoretical framework for the classifier will be presented, along with a comprehensive topological setting that would enable better visualization of the under lying clusters. A comparative evaluation of the current approach w.r.t the state of the art binarization method would be presented.