Wednesday, December 22, 2010

Document Layout Analysis

The Problem
Create a program that will analyse a document image and will determine the document's physical layout and in some extent the logical layout of the document.


The Test Document Images




The Solution
My solution is a simple and quite restrictive bottom-up document physical layout analyzer.
Bottom-up layout method  depends on  local information to analyze and build the parts of the document, like black or ON pixels and connected components as opposed to top-down methods that makes use of black/white stripes/spaces/gaps that is use to strip the document into smaller parts for analysis. Many variation have been proposed for each of the two methods and hybrid methods are also available but there is none (based from the literatures that I have read) that will work on any type of document layout. 
The solution presented in this blog makes use of three level/pass of simple connected components (CCs)  analysis which is described below :
1. For the first level the images where converted to binary image using a a naive threshold method that makes use of a predefined gray threshold value of 220. 
2. After binarization, connected components (CCs) where extracted from the binary image and where placed on a list. 
3.After producing the CCs, the components where sorted with respect to their Y and X coordinate in the binary image, having greater priority for the Y value than the X, making the component  with both the least X and Y value (the  topmost-leftmost component) the first on the list. Small components like the dot of the letter "i", period , comma, hypens, and the like plus some unfiltered noise where placed near the end of the list so that they will not hinder later processing.
4. After sorting, the euclidian distance of adjacent pair components on the list where computed, grouped based from the range of the Y - value of the components and then stored in a group-distance list.
5. Each group-distance  where sorted and for each group the 20th percentile was computed. This value will be the maximum allowable distance (MAD) of adjacent connected components to determine which will be linked as one bigger component.
6. The building up of a bigger component will stat by analyzing the list of sorted CCs again. Distance of adjacent pair components where measured again and compared to the computed MAD. If the distance of the pair is acceptable, the component with the lesser X and Y value will be linked to the bigger component  being formed. Once the distance of a chosen pair goes beyond the MAD value, the component with the lesser X and Y value will be linked to the currently being formed component and the building up this bigger component will be stopped and a new bigger component will be built using the other half of the pair. The process is done repeatedly until no pair of component can be produced. The process of linking components is achieved by drawing a solid rectangular binary box that covers all the linked component in the produced binary image in step 1. This linking is shown as transparent rectangular box drawn in the color image. (presented in the Sample Results section below)
7. After linking all the smaller components, the second level processing follows, which is just a repetition of Step 2 - 6 changing only the percentile value to 80 in step 5.
8.The third level processing follows, which is also a repetition of Step 2 - 6 changing only the percentile value to 10 in step 5.



Sample Results

First Pass ( Forming of Words)

 
Second Pass ( Forming of Lines/Sentences)


Third Pass ( Forming of Paragraph)


Final Remarks
The solution presented is far from perfect. As seen from the images above, there are components that were either fail to be linked or has been linked incorrectly. Also, no preprocessing was made to the image, like skew correction, noise removal, and background removal.

Tuesday, December 7, 2010

Document Form Reader - Solution Formulation

How I came up with the solution?
This activity was actually my first Image Processing Program and there were many important image processing techniques I failed to use. The solution I will present later is the result  trial-and-error experiments of the simple naive image processing techniques I have tried and I will briefly present it first . 


The Naive Approach
This approach is very simple, and starts by loading all the content of the fields.csv file in an array named items with size 39. The array is actually based from a structure datatype called pixels and is defined as follows:


       struct pixels { pixelData pix [7]; };


Where pixelData is another structured type defined as


       struct pixelData { int x; int y; };


The pixels structure represents a group of seven circle coordinates, while pixelData represents the actual coordinate in terms of x and y. This means that when you read one line (14 values) from the text file, the values can be stored as a single pixels type variable. Using this idea the array items was created with 39 indices to store the values from each of the 39 lines of the text file.


After reading the whole file, the images where read one at a time using the readJpeg function  from ImageLab and was loaded as an RGBImage named inputImage. After loading one image, it will be preprocessed by converting it to a binary image named tmpBin using a predetermined threshold value of 220. No noise removal/reduction techniques and image enhancement techniques were applied. After producing a binary image the content of the items array was used to locate the circles one at a time. For example, the first circle in all the image will be the first circle from section A.1 of the document based from the discussion on the Field Location subsection of the preceding blog. This means that the coordinate of this circle is found on the first line of the text file which is mapped to the first index of array items (index 0). Since the circle is also the first among the circles of A.1, it will be mapped to the first index of the pix member.  In code it will look something like :
     
     //The X and Y value of the coordinate of the very first circle in every document image
     items[0].pix[0].X
     items[0].pix[0].Y


To check whether this circle is shaded, the pixel with the same coordinate in tmpBin was checked. If the pixel has a value of 1 then it means that the circle is shaded, otherwise it is not. In code:


     if((tmpBin(items[0].pix[0].X,items[0].pix[0].Y)==1){
            //procedure for putting a red box around the circle
     }else{
            //procedure for putting a green box around the circle
     }


This can be extended to all the other circles by a simple nested loop.


However, due to the observed skewness some circles were misaligned thus the given coordinates sometimes fall outside the boundary of the circles thus missing the shade. Also, it happens that the given coordinate points to the borders of a circle and since the color of the shade and the border of the circle is not distinguishable in a binary image, circles with no shades were identified as shaded circles. 


Another reason is the amount and form of shading of the circles. Some circles have very light shades near the center that fall above the predetermined threshold while some has no shades at all near the center of the circles.


The Connected Component Approach
This approach improved only the process of finding the exact location of the circles. This make use of binary image Connected Component Analysis.  This is done after the image read was converted to a binary image.


Connected Component Analysis is only applied to a given a horizontal rectangular strip of the whole binary image. The strip must contain seven circles, which means that the procedure create strips 39 times per image and process one strip at a time. 
Sample Rectangular Strip
The strip is computed from the coordinates of the first and the last circle of the two consecutive groups of circles ( e.g. the first and the last  circles in A.1 and  A.2. ). The process of creating the strip is as follow using A.1 and A.2 :
1. Subtract the Y value of the first circle of A.1 to the Y value of the first circle of A.2. Divide the result by two. This will result to the Y value of the pixel that lies in the middle of the two circles. this value will be named as nMoves.
2. Locate for the upper left corner pixel of the strip by moving nMoves times to the left from the center of the first circle of A.1 and then nMoves times upward.
3. Locate for the upper right corner pixel of the strip by moving nMoves times to the right from the center of the last circle of A.1 and then nMoves times upward.
4. Locate for the lower left corner pixel of the strip by moving nMoves times to the left from the center of the last circle of A.1 and then nMoves times downwards.
5. Locate for the lower right corner pixel of the strip by moving nMoves times to the right from the center of the last circle of A.1 and then nMoves times downwards.
6. Copy all pixels covered by  the four corners, from the upper right down to the lower right pixel, to a temporary binary image strip. The strip will now have the seven circles of A.1.
The process can be repeated for the next group of circles(e.g. A.2, A.3, B.1.). Step 1 can be skipped when the process is repeated and the value of nMoves from the first computation can be used instead.  The height of the temporary binary image strip is twice the value of nMoves, while the width of the strip is the number of pixels from the upper left corner to the upper right corner pixel boundary. It should be noted that the copied circles on the strip does not retain the coordinate value of its source and only the arrangement of the circles on the strip is the link to the original source.

After creating the strip, a connected component object named ccTmp is created using the Connected Component class in ImageLab.The analyzeBinary method of ccTmp is used with the temporary strip as argument to produce individual components (circles) with number labels ranging from 0-6.


After analyzing the binary, the strip now is decomposed to seven components each representing a circle. But before checking each component (circle) if they are shaded or not, the connected components where first sorted and relabeled in ascending order based from their X coordinates. This is needed because, when connected components where created there is no particular order in labeling and that means that there is no assurance that the component with label 0 is the leftmost circle on the strip and this will break the connection to the original source. Arrangement  is the only basis for knowing which circle is being processed.    
     
After fixing the labels, each component undergone dilation using a (3 by 3) cross-like structuring element to ensure that shades on circles will have less wholes specially in the center. The checking of shades is similar to the naive procedure above.



The Final Procedure in Codes

#include <iostream>
#include <fstream>
#include <cstdio>
#include <cstdlib>
#include "lib/image.h"
#include "lib/jpegio.h"
#include "lib/tools.h"
#include "lib/color.h"
#include "lib/binary.h"


#define MAX_IMAGES 30
#define COLOR_RED COLOR_RGB(255,0,0)
#define COLOR_GREEN         COLOR_RGB(0,255,0)
#define COLOR_BLUE COLOR_RGB(0,0,255)


using namespace std;
struct pixelData{
    int X;
    int Y;
};
struct pixels{
    pixelData pix[7];
};
void loadCoordinates(char fName[],pixels items[],int count){
    ifstream coordinates;
   
    coordinates.open(fName, ios::in);  
    if (coordinates.is_open()){
   for(int i = 0;i<count;i++){
     for(int j=0;j<7;j++){
                   coordinates>>items[i].pix[j].X >> items[i].pix[j].Y;
     }
     }
    }
    coordinates.close();
}
bool nextImage(RGBImage& img,char file[]){
    static int current = 0 ;
    char filePath[16];//,fileOutput[17];
  
    current++;   
    if(current <= MAX_IMAGES){
       if(current<10){
sprintf(file,"000%d",current);
       }else if(current<100){
sprintf(file,"00%d",current);
       }else{
sprintf(file,"0100");
       }
       sprintf(filePath,"forms/%s.jpg",file);
       readJpeg(img,filePath);    
       return true;
    }else{
       return false;
    }
}
void boxPixelOfImage(RGBImage& img, pixelData pixel,int radius,int color, int thickness){
     drawBoxRGB(img,pixel.X-radius,pixel.Y-radius,pixel.X+radius,pixel.Y+radius,color,thickness);
}
void RGBToBinary(const RGBImage& imgRGB,Image<unsigned char>& imgBin, unsigned char threshold){
int width=imgRGB.width(),height=imgRGB.height();
unsigned char grayValue;

imgBin.resize(width,height);
for(int row=0;row<height;row++){
for(int col = 0; col<width;col++){
grayValue = (RED(imgRGB(col,row))+ GREEN(imgRGB(col,row)) + BLUE(imgRGB(col,row)))/3;
if(grayValue <= threshold)
imgBin(col,row)=1;
else
imgBin(col,row)=0;
}
}
}
void BinaryToGray(const Image<unsigned char>& imgBin, RGBImage& imgGray ,int grayColor){
int width=imgBin.width(),height=imgBin.height();

imgGray.resize(width,height);
for(int row=0;row<height;row++){
for(int col = 0; col<width;col++){
if(imgBin(col,row)==1)
imgGray(col,row)=grayColor;
else
imgGray(col,row)=COLOR_RGB(255,255,255);
}
}
}
void createStripFromImage(RGBImage img, RGBImage& strip ,pixelData topLeft, pixelData bottomRight){
strip.resize((bottomRight.X-topLeft.X+1),(bottomRight.Y-topLeft.Y+1));
for(int i = topLeft.Y;i<=bottomRight.Y;i++){
for(int j = topLeft.X;j<=bottomRight.X;j++){
strip(j-topLeft.X,i-topLeft.Y)=img(j,i);
}
}
}
void printComponentData(ConnectedComponents& conComponent){
int h=conComponent.mBbox.height(), numComp = conComponent.getNumComponents();
for(int i= 0;i<numComp;i++){
printf("\nComponent[%d] = [",i);
for(int j=0;j<h;j++){
printf(" %d ",conComponent.mBbox(i,j));
}
printf("]\n");
}
}
void updateComponentLabel(ConnectedComponents& conComponent,int bBoxLoc,int oldLabel, int newLabel) {
Image<int> &Bbox=conComponent.mBbox, &TagImage = conComponent.mTagImage;
int xstart = Bbox(bBoxLoc,0),ystart = Bbox(bBoxLoc,1);
int w = Bbox(bBoxLoc,2),h = Bbox(bBoxLoc,3); 

for (int x = 0; x < w; x++){  
for (int y = 0; y < h; y++){
if (TagImage(xstart+x,ystart+y) == oldLabel){
TagImage(xstart+x,ystart+y)= newLabel;
}
}
}
}
void sortComponentsByX(ConnectedComponents& conComponent){
int h=conComponent.mBbox.height(), numComp = conComponent.getNumComponents();
Image<int> tmp(1,h),&Bbox=conComponent.mBbox;
int i,j,cur;


if(numComp>1){
for(i=1;i<numComp;i++){
for(cur =0 ; cur<h;cur++){
tmp(0,cur)=Bbox(i,cur);
}
for(j = i;(j>0 && tmp(0,0)<=Bbox(j-1,0));j--){
updateComponentLabel(conComponent,j-1,j-1,j);
for(int cur =0 ; cur<h;cur++){
Bbox(j,cur)=Bbox(j-1,cur);
}
}
if(j<i){
for(cur =0 ; cur<h;cur++){
Bbox(j,cur)=tmp(0,cur);
}
updateComponentLabel(conComponent,j,i,j);
}
}
}
}
void markImages(pixels items[],int count,unsigned char GrayThreshHold){
      RGBImage inputImage, tmpStrip;
      Image<unsigned char> tmpComponent, tmpBin,tmpStruct1;
      pixelData tmpPixel,tmpBounds[2];
      ConnectedComponents ccTmp;
      char file[5], filePath[255];
      int radius,numComponents,cursor;
    double dFrequency;


      radius =   (items[1].pix[0].Y-items[0].pix[0].Y)/2;
      tmpStruct1.resize(3,3);
      tmpStruct1.setAll(0);
     tmpStruct1(1,0)=1; tmpStruct1(0,1)=1;
     tmpStruct1(2,1)=1; tmpStruct1(1,2)=1;


      while(nextImage(inputImage,file)){
for(int i = 0;i<count;i++){
  tmpBounds[0].X= items[i].pix[0].X - radius; 
  tmpBounds[0].Y= items[i].pix[0].Y - radius;

  tmpBounds[1].X= items[i].pix[6].X + radius; 
  tmpBounds[1].Y= items[i].pix[6].Y + radius;     
  createStripFromImage(inputImage,tmpStrip,tmpBounds[0],tmpBounds[1]);
  RGBToBinary(tmpStrip,tmpBin,175);
     ccTmp.analyzeBinary(tmpBin,EIGHT_CONNECTED);
  sortComponentsByX(ccTmp);
  numComponents = ccTmp.getNumComponents();
  cursor=0;
  for(int j= 0;j<numComponents;j++){
  tmpComponent = ccTmp.getComponentBinary(j);

  if(tmpComponent.width()>=10 && tmpComponent.height()>=10){
  tmpPixel = items[i].pix[cursor];
  tmpComponent=binaryDilation(tmpComponent,tmpStruct1,1,1);
  if(tmpComponent(tmpComponent.width()/2,tmpComponent.height()/2)==1){
  boxPixelOfImage(inputImage,tmpPixel,10,COLOR_RED,2);
  }else{
  boxPixelOfImage(inputImage,tmpPixel,10,COLOR_GREEN,2);
  }
  cursor++;
  }
  }
    }
  sprintf(filePath,"output/%s.jpg",file);
  writeJpeg(inputImage,filePath,100);  
       }
}
int main(int argc, char *argv[]){
    pixels items[39]; 


    loadCoordinates("fields39.csv",items,39);
    markImages(items,39,220);
}




The Result