Digital Image Processing Activities: Document Form Reader

How I came up with the solution?
This activity was actually my first Image Processing Program and there were many important image processing techniques I failed to use. The solution I will present later is the result trial-and-error experiments of the simple naive image processing techniques I have tried and I will briefly present it first .

The Naive Approach
This approach is very simple, and starts by loading all the content of the fields.csv file in an array named items with size 39. The array is actually based from a structure datatype called pixels and is defined as follows:

   struct pixels { pixelData pix [7]; };

Where pixelData is another structured type defined as

   struct pixelData { int x; int y; };

The pixels structure represents a group of seven circle coordinates, while pixelData represents the actual coordinate in terms of x and y. This means that when you read one line (14 values) from the text file, the values can be stored as a single pixels type variable. Using this idea the array items was created with 39 indices to store the values from each of the 39 lines of the text file.

After reading the whole file, the images where read one at a time using the readJpeg function from ImageLab and was loaded as an RGBImage named inputImage. After loading one image, it will be preprocessed by converting it to a binary image named tmpBin using a predetermined threshold value of 220. No noise removal/reduction techniques and image enhancement techniques were applied. After producing a binary image the content of the items array was used to locate the circles one at a time. For example, the first circle in all the image will be the first circle from section A.1 of the document based from the discussion on the Field Location subsection of the preceding blog. This means that the coordinate of this circle is found on the first line of the text file which is mapped to the first index of array items (index 0). Since the circle is also the first among the circles of A.1, it will be mapped to the first index of the pix member. In code it will look something like :

   //The X and Y value of the coordinate of the very first circle in every document image
   items[0].pix[0].X
     items[0].pix[0].Y

To check whether this circle is shaded, the pixel with the same coordinate in tmpBin was checked. If the pixel has a value of 1 then it means that the circle is shaded, otherwise it is not. In code:

   if((tmpBin(items[0].pix[0].X,items[0].pix[0].Y)==1){
   //procedure for putting a red box around the circle
   }else{

//procedure for putting a green box around the circle

}

This can be extended to all the other circles by a simple nested loop.

However, due to the observed skewness some circles were misaligned thus the given coordinates sometimes fall outside the boundary of the circles thus missing the shade. Also, it happens that the given coordinate points to the borders of a circle and since the color of the shade and the border of the circle is not distinguishable in a binary image, circles with no shades were identified as shaded circles.

Another reason is the amount and form of shading of the circles. Some circles have very light shades near the center that fall above the predetermined threshold while some has no shades at all near the center of the circles.

The Connected Component Approach
This approach improved only the process of finding the exact location of the circles. This make use of binary image Connected Component Analysis. This is done after the image read was converted to a binary image.

Connected Component Analysis is only applied to a given a horizontal rectangular strip of the whole binary image. The strip must contain seven circles, which means that the procedure create strips 39 times per image and process one strip at a time.

Sample Rectangular Strip

The strip is computed from the coordinates of the first and the last circle of the two consecutive groups of circles ( e.g. the first and the last circles in A.1 and A.2. ). The process of creating the strip is as follow using A.1 and A.2 :

1. Subtract the Y value of the first circle of A.1 to the Y value of the first circle of A.2. Divide the result by two. This will result to the Y value of the pixel that lies in the middle of the two circles. this value will be named as nMoves.

2. Locate for the upper left corner pixel of the strip by moving nMoves times to the left from the center of the first circle of A.1 and then nMoves times upward.

3. Locate for the upper right corner pixel of the strip by moving nMoves times to the right from the center of the last circle of A.1 and then nMoves times upward.

4. Locate for the lower left corner pixel of the strip by moving nMoves times to the left from the center of the last circle of A.1 and then nMoves times downwards.

5. Locate for the lower right corner pixel of the strip by moving nMoves times to the right from the center of the last circle of A.1 and then nMoves times downwards.

6. Copy all pixels covered by the four corners, from the upper right down to the lower right pixel, to a temporary binary image strip. The strip will now have the seven circles of A.1.

The process can be repeated for the next group of circles(e.g. A.2, A.3, B.1.). Step 1 can be skipped when the process is repeated and the value of nMoves from the first computation can be used instead. The height of the temporary binary image strip is twice the value of nMoves, while the width of the strip is the number of pixels from the upper left corner to the upper right corner pixel boundary. It should be noted that the copied circles on the strip does not retain the coordinate value of its source and only the arrangement of the circles on the strip is the link to the original source.

After creating the strip, a connected component object named ccTmp is created using the Connected Component class in ImageLab.The analyzeBinary method of ccTmp is used with the temporary strip as argument to produce individual components (circles) with number labels ranging from 0-6.

After analyzing the binary, the strip now is decomposed to seven components each representing a circle. But before checking each component (circle) if they are shaded or not, the connected components where first sorted and relabeled in ascending order based from their X coordinates. This is needed because, when connected components where created there is no particular order in labeling and that means that there is no assurance that the component with label 0 is the leftmost circle on the strip and this will break the connection to the original source. Arrangement is the only basis for knowing which circle is being processed.

After fixing the labels, each component undergone dilation using a (3 by 3) cross-like structuring element to ensure that shades on circles will have less wholes specially in the center. The checking of shades is similar to the naive procedure above.

The Final Procedure in Codes

#include <iostream>
#include <fstream>
#include <cstdio>
#include <cstdlib>
#include "lib/image.h"
#include "lib/jpegio.h"
#include "lib/tools.h"
#include "lib/color.h"
#include "lib/binary.h"

#define MAX_IMAGES 30
#define COLOR_RED COLOR_RGB(255,0,0)
#define COLOR_GREEN    COLOR_RGB(0,255,0)
#define COLOR_BLUE COLOR_RGB(0,0,255)

using namespace std;
struct pixelData{
   int X;
   int Y;
};
struct pixels{
   pixelData pix[7];
};
void loadCoordinates(char fName[],pixels items[],int count){
   ifstream coordinates;

   coordinates.open(fName, ios::in);
   if (coordinates.is_open()){
   for(int i = 0;i<count;i++){
for(int j=0;j<7;j++){
   coordinates>>items[i].pix[j].X >> items[i].pix[j].Y;
}
   }
   }
   coordinates.close();
}
bool nextImage(RGBImage& img,char file[]){
   static int current = 0 ;
   char filePath[16];//,fileOutput[17];

   current++;
   if(current <= MAX_IMAGES){
   if(current<10){
sprintf(file,"000%d",current);
   }else if(current<100){
sprintf(file,"00%d",current);
   }else{
sprintf(file,"0100");
   }
   sprintf(filePath,"forms/%s.jpg",file);
   readJpeg(img,filePath);
   return true;
   }else{
   return false;
   }
}
void boxPixelOfImage(RGBImage& img, pixelData pixel,int radius,int color, int thickness){
   drawBoxRGB(img,pixel.X-radius,pixel.Y-radius,pixel.X+radius,pixel.Y+radius,color,thickness);
}
void RGBToBinary(const RGBImage& imgRGB,Image<unsigned char>& imgBin, unsigned char threshold){
int width=imgRGB.width(),height=imgRGB.height();
unsigned char grayValue;

imgBin.resize(width,height);
for(int row=0;row<height;row++){
for(int col = 0; col<width;col++){
grayValue = (RED(imgRGB(col,row))+ GREEN(imgRGB(col,row)) + BLUE(imgRGB(col,row)))/3;
if(grayValue <= threshold)
imgBin(col,row)=1;
else
imgBin(col,row)=0;
}
}
}
void BinaryToGray(const Image<unsigned char>& imgBin, RGBImage& imgGray ,int grayColor){
int width=imgBin.width(),height=imgBin.height();

imgGray.resize(width,height);
for(int row=0;row<height;row++){
for(int col = 0; col<width;col++){
if(imgBin(col,row)==1)
imgGray(col,row)=grayColor;
else
imgGray(col,row)=COLOR_RGB(255,255,255);
}
}
}
void createStripFromImage(RGBImage img, RGBImage& strip ,pixelData topLeft, pixelData bottomRight){
strip.resize((bottomRight.X-topLeft.X+1),(bottomRight.Y-topLeft.Y+1));
for(int i = topLeft.Y;i<=bottomRight.Y;i++){
for(int j = topLeft.X;j<=bottomRight.X;j++){
strip(j-topLeft.X,i-topLeft.Y)=img(j,i);
}
}
}
void printComponentData(ConnectedComponents& conComponent){
int h=conComponent.mBbox.height(), numComp = conComponent.getNumComponents();
for(int i= 0;i<numComp;i++){
printf("\nComponent[%d] = [",i);
for(int j=0;j<h;j++){
printf(" %d ",conComponent.mBbox(i,j));
}
printf("]\n");
}
}
void updateComponentLabel(ConnectedComponents& conComponent,int bBoxLoc,int oldLabel, int newLabel) {
Image<int> &Bbox=conComponent.mBbox, &TagImage = conComponent.mTagImage;
int xstart = Bbox(bBoxLoc,0),ystart = Bbox(bBoxLoc,1);
int w = Bbox(bBoxLoc,2),h = Bbox(bBoxLoc,3);

for (int x = 0; x < w; x++){
for (int y = 0; y < h; y++){
if (TagImage(xstart+x,ystart+y) == oldLabel){
TagImage(xstart+x,ystart+y)= newLabel;
}
}
}
}
void sortComponentsByX(ConnectedComponents& conComponent){
int h=conComponent.mBbox.height(), numComp = conComponent.getNumComponents();
Image<int> tmp(1,h),&Bbox=conComponent.mBbox;
int i,j,cur;

if(numComp>1){
for(i=1;i<numComp;i++){
for(cur =0 ; cur<h;cur++){
tmp(0,cur)=Bbox(i,cur);
}
for(j = i;(j>0 && tmp(0,0)<=Bbox(j-1,0));j--){
updateComponentLabel(conComponent,j-1,j-1,j);
for(int cur =0 ; cur<h;cur++){
Bbox(j,cur)=Bbox(j-1,cur);
}
}
if(j<i){
for(cur =0 ; cur<h;cur++){
Bbox(j,cur)=tmp(0,cur);
}
updateComponentLabel(conComponent,j,i,j);
}
}
}
}
void markImages(pixels items[],int count,unsigned char GrayThreshHold){
   RGBImage inputImage, tmpStrip;
   Image<unsigned char> tmpComponent, tmpBin,tmpStruct1;
   pixelData tmpPixel,tmpBounds[2];
   ConnectedComponents ccTmp;
   char file[5], filePath[255];
   int radius,numComponents,cursor;
   double dFrequency;

   radius = (items[1].pix[0].Y-items[0].pix[0].Y)/2;
   tmpStruct1.resize(3,3);
   tmpStruct1.setAll(0);
   tmpStruct1(1,0)=1; tmpStruct1(0,1)=1;
   tmpStruct1(2,1)=1; tmpStruct1(1,2)=1;

   while(nextImage(inputImage,file)){
for(int i = 0;i<count;i++){
tmpBounds[0].X= items[i].pix[0].X - radius;
tmpBounds[0].Y= items[i].pix[0].Y - radius;

tmpBounds[1].X= items[i].pix[6].X + radius;
tmpBounds[1].Y= items[i].pix[6].Y + radius;
createStripFromImage(inputImage,tmpStrip,tmpBounds[0],tmpBounds[1]);
RGBToBinary(tmpStrip,tmpBin,175);
   ccTmp.analyzeBinary(tmpBin,EIGHT_CONNECTED);
sortComponentsByX(ccTmp);
numComponents = ccTmp.getNumComponents();
cursor=0;
for(int j= 0;j<numComponents;j++){
tmpComponent = ccTmp.getComponentBinary(j);

if(tmpComponent.width()>=10 && tmpComponent.height()>=10){
tmpPixel = items[i].pix[cursor];
tmpComponent=binaryDilation(tmpComponent,tmpStruct1,1,1);
if(tmpComponent(tmpComponent.width()/2,tmpComponent.height()/2)==1){
boxPixelOfImage(inputImage,tmpPixel,10,COLOR_RED,2);
}else{
boxPixelOfImage(inputImage,tmpPixel,10,COLOR_GREEN,2);
}
cursor++;
}
}
}
sprintf(filePath,"output/%s.jpg",file);
writeJpeg(inputImage,filePath,100);
   }
}
int main(int argc, char *argv[]){
   pixels items[39];

   loadCoordinates("fields39.csv",items,39);
   markImages(items,39,220);
}

Click here to download the solution

The Result