Wednesday, May 20, 2009

Reading Characters from an Image


One of my friend asked me to find a way of reading characters from an image. My googling ended up with very expensive software’s and bulky codes. Ultimately I came across with amazing OCR component available in Microsoft office package.

Microsoft Office Document Imaging 2003 (MODI) adds programmability features to the document scanning and viewing tools that Microsoft Office 2002 (XP) included for the first time. Programmers can take advantage of a simple object model built around the Document and Image (page) objects to display and read a scanned document as easily as a paper document, perform optical character recognition (OCR), search for text within scanned documents, copy and export scanned text and images, combine multiple pages into a single compressed file, and reorganize scanned document pages as easily as rearranging papers in a folder.

Reference

  • http://www.microsoft.com/downloads/details.aspx?FamilyId=8F93E445-B1CF-4477-A373-E17417D616BC&displaylang=en
  • http://msdn.microsoft.com/en-us/library/aa167607(office.11).aspxhttp:/msdn.microsoft.com/en-us/library/aa167607(office.11).aspx

How To Code


  1. First you have to add MODI reference to .Net application.



2. Add flowing name space using MODI

3. codes for application as follows


// browse the new iamge
MODI.Document doc = new Document();
OpenFileDialog imageDialog = new OpenFileDialog();
imageDialog.Filter= "*.jpeg|*.jpg|*.gif|*.bmp";
imageDialog.ShowDialog();
string filename = openFileDialog1.FileName;

doc.Create(filename);
doc.OCR(MODI.MiLANGUAGES.miLANG_ENGLISH, false, false);

MODI.Image im = (MODI.Image)doc.Images[0];
MODI.Layout imagelayout = im.Layout;

string strResult = string.Empty;
//read each word by word
for (int i = 0; i < imagelayout.Words.Count; i++)
{
MODI.Word w = (MODI.Word)imagelayout.Words[i];
strResult += strResult + w.Text;
}
doc.Close(false);
//show the result by a message box.
MessageBox.Show(strResult);



2 comments:

  1. Is it possible to read handwritten characters form the image using MODI?
    I tried it but no luck.

    ReplyDelete
  2. This is really an informative article, I came across a nice Java OCR component. I hope you guys are going to like it. Here are some details
    Aspose.OCR for Java is a character recognition component that allows developers to add OCR functionality in their Java web applications, web services and Windows applications. It provides a simple set of classes for controlling character recognition tasks. It helps developers to work with image files from within their Java applications.
    Complete Details are available here: http://www.aspose.com/categories/java-components/aspose.ocr-for-java/default.aspx

    Many Thanks

    ReplyDelete