Extract Data From Bank Statements

This is chase bank statement with 3 columns.

Solution for 3 columns format

As we know that OCR extract data as single string and I was using OCR.Space. So I came with solution to classify the text. So I used “Naive Bayes classifier” to classify the text from columns.

My Data Looks like this.

Final Results

Successfully save data into csv files in 1st attempt.

The Next Big Challenge

So after some time when I lost my job due to corona crisis, I had plenty of time to improve this problem and I was very optimistic if I improve this problem so I could definitely get back my job. I discussed this problem with someone on linked and promised me to mentor me for this project.The next challenge was to extract data from 5 columns bank statements pdfs.

Bank Sheet with 5 columns


The “MASK RCNN” the final solution, I would not go into the technical details of the architecture you could find other medium article which could help you to understand the mechanism. Anyhow again data labeling for MASK RCNN so I used “Via Annotator Toollink:https://www.robots.ox.ac.uk/~vgg/software/via/via_demo.html. Now I am progressing!

The Magic of Mask RCNN

So finally I got the results which I was looking for.


Won the Battle but Lost the War!

Reference Links




