Cheat Engine Forum Index Cheat Engine
The Official Site of Cheat Engine
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 


Optical Character Recognition

 
Post new topic   Reply to topic    Cheat Engine Forum Index -> General programming
View previous topic :: View next topic  
Author Message
Deltron Z
Expert Cheater
Reputation: 1

Joined: 14 Jun 2009
Posts: 164

PostPosted: Sat May 15, 2010 5:09 pm    Post subject: Optical Character Recognition Reply with quote

I've made a simple captcha reader (for this captcha) but it doen't work very well for rotated digits.
My algorithm is simple - recoloring the image to black and white, detecting the rectangle surrounding the digit (by detecting fully white columns and then once I have 2, within this range I scan for fully white rows till I detect the exact rectangle) and simply comparing with a very small database (10 images containing 0~9, not rotated) - the best match is then added to the result string. (as the title of the window)

Now, it works pretty good for easy captcha's, but I need some help or ideas for the rotated digits and furthermore digits in different sizes than my database. I've added a simple "hidden function" for testing which expands the database by adding rotated samples to it. (first text box is angle to start with, second is angle to end with and the third is the difference of the angle of rotation between two distinct samples. it may get correctly one digit where the normal database fails to detect, but fail to detect the rest)

I think my problem is that I'm comparing pixels instead of shape - that should also solve the rotated digits problem, I guess, I just don't know how to implement such an algorithm.

What I've done so far is attached, don't bother checking the source because it's ugly (and even scary!), lots of duplicates and inefficient algorithms becuase I wanted to finish this today as fast as I can. (ok, fine! so I'm lazy! Confused )

Any suggestions for improvment? maybe if you check the Database folder you'll notice the images aren't at the same size. maybe that's the cause (one of 'em) of bad recognition? (external for testing - eventually I'm gonna stick it somewhere inside the resources or something like that, perhaps expanding the database dynamically by adding good results!)

P.S. The digits 5 and 6 are very similar, it may recognize 5 as 6 and vice versa. as well with 1 and 7, and 7 which is a little similar to 2 may also produce a false result. the same with the sets 0, 6, 8, 9 and { 1, 2, 9 or 0, 1, 8 and perhaps more.

P.S.2. - I've added a second file, CaptchaReader 2.exe which is a test, attempt to fix the problem of different sizes of database images without actually changing them to the same size and risk many false results.


Edit:
Can't attach rar:
RapidShare
SendSpace
MegaUpload
FileFlyer
MediaFire

Very very ugly code, many duplicates and ofcourse inefficient.

Current, only works for this specific captcha.

TODO:
-Better support for rotated characters.
-Support different sizes characters.
-Support characters other than digits.
-Clean the code, rewrite the algorithms efficiently and perhaps turn it into a library. Smile
-Currently coded in C# for easy access to the pixels of different formats, planned to rewrite it somewhen, maybe in C. Anyhow, it's gonna be cross-platform for sure.
Back to top
View user's profile Send private message
Deltron Z
Expert Cheater
Reputation: 1

Joined: 14 Jun 2009
Posts: 164

PostPosted: Mon May 17, 2010 2:58 am    Post subject: Reply with quote

Well, if nobody knows then at least I'll share my progress. Very Happy
As I said, since I'm comparing pixels instead of shape then it's not very dynamic, but I've made a database creator to help me read more captcha's, all you have to do is specify the type of captcha from your custom database and you're good to go. I might even try and recognize captcha's automatically.
I'm still working on the captcha reader itself, I've got to clean the code first so I can understand myself what I did there and then I have to think of new algorithms for pixel matching, because I don't have a clue about starting to find the shape of a character.
One thing that's bothering me is that a free OCR gave better results than my own captcha reader - and mine supposed to read that one specifically. I guess it won't read harder captcha's and I'm planning mine to be able to read as many as it can. Smile
If someone has a better idea than creating a database for each captcha and it's codeable for one person (which is me) I'm listening. Rolling Eyes

Only pictures this time:
Successful recognition:




And the database creator:




Now... combining them both with classes which I'm gonna put the messed code in, after cleaning it. Very Happy
Back to top
View user's profile Send private message
1929394839292057839194958
Grandmaster Cheater Supreme
Reputation: 130

Joined: 22 Dec 2006
Posts: 1509

PostPosted: Mon May 17, 2010 5:09 am    Post subject: Reply with quote

Not bad. Just read through that. Pretty good job.
Back to top
View user's profile Send private message
Deltron Z
Expert Cheater
Reputation: 1

Joined: 14 Jun 2009
Posts: 164

PostPosted: Tue May 18, 2010 5:39 am    Post subject: Reply with quote

Smile
I'm having troubles devising and algorithm to find the skeleton of a character. I thought if I'd run from one black edge to the other horizontaly and then again vertically, then by calculating the distance between them I could calculate an avarage to form a skeleton.
Not much of a success, but I guess that's just because I'm cheap on memory and tried to save some by running the algorithm horizontaly and vertically seperatly and at the same time with the method that finds the blocking rectangle in red. note to myself: take the longer side to form a square and apply a rotation to that square's center, this way I'm preventing data loss. least important, since I want my database creator to work first on simple CAPTCHAs with no rotation. (however, with font twist and different colors, perhaps different fonts)

What I've got:

And how it's supposed to be: (sort of something like this)

So that the skeleton can be put inside the character and match, this is basically just a thin line crossing the center of the character.

I've read that neural networks can be used to create a good database, can anyone explain further? Razz
This way I can read CAPTCHAs with many different fonts or twisted characters. now I can only match pixels to a copy of the original character, so twisting gives me bad results. neural networks sounds more dynamic. Very Happy
Back to top
View user's profile Send private message
NINTENDO
Grandmaster Cheater Supreme
Reputation: 0

Joined: 02 Nov 2007
Posts: 1371

PostPosted: Wed May 19, 2010 2:00 am    Post subject: Reply with quote

this this
_________________
Intel over amd yes.
Back to top
View user's profile Send private message Send e-mail AIM Address Yahoo Messenger MSN Messenger
nwongfeiying
Grandmaster Cheater
Reputation: 2

Joined: 25 Jun 2007
Posts: 695

PostPosted: Wed May 19, 2010 5:30 pm    Post subject: Reply with quote

Interesting project to say the least. I congratulate you on your progress.
Back to top
View user's profile Send private message
Deltron Z
Expert Cheater
Reputation: 1

Joined: 14 Jun 2009
Posts: 164

PostPosted: Fri May 21, 2010 1:53 pm    Post subject: Reply with quote

NINTENDO wrote:
this this

I can't read this because my algorithm filters the background and once it reaches a character it uses a fill algorithm to fill a new blank image with black pixels (actually, something slightly different but similar), since the red overrides the text my algorithm can't recognize the text, and there's also black noise which I can't filter and the only way to recognize it is to match enough pixels correctly, so if there's a lot of black noise it won't work.
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    Cheat Engine Forum Index -> General programming All times are GMT - 6 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Powered by phpBB © 2001, 2005 phpBB Group

CE Wiki   IRC (#CEF)   Twitter
Third party websites