 |
Cheat Engine The Official Site of Cheat Engine
|
View previous topic :: View next topic |
Author |
Message |
Deltron Z Expert Cheater
Reputation: 1
Joined: 14 Jun 2009 Posts: 164
|
Posted: Sat May 15, 2010 5:09 pm Post subject: Optical Character Recognition |
|
|
I've made a simple captcha reader (for this captcha) but it doen't work very well for rotated digits.
My algorithm is simple - recoloring the image to black and white, detecting the rectangle surrounding the digit (by detecting fully white columns and then once I have 2, within this range I scan for fully white rows till I detect the exact rectangle) and simply comparing with a very small database (10 images containing 0~9, not rotated) - the best match is then added to the result string. (as the title of the window)
Now, it works pretty good for easy captcha's, but I need some help or ideas for the rotated digits and furthermore digits in different sizes than my database. I've added a simple "hidden function" for testing which expands the database by adding rotated samples to it. (first text box is angle to start with, second is angle to end with and the third is the difference of the angle of rotation between two distinct samples. it may get correctly one digit where the normal database fails to detect, but fail to detect the rest)
I think my problem is that I'm comparing pixels instead of shape - that should also solve the rotated digits problem, I guess, I just don't know how to implement such an algorithm.
What I've done so far is attached, don't bother checking the source because it's ugly (and even scary!), lots of duplicates and inefficient algorithms becuase I wanted to finish this today as fast as I can. (ok, fine! so I'm lazy! )
Any suggestions for improvment? maybe if you check the Database folder you'll notice the images aren't at the same size. maybe that's the cause (one of 'em) of bad recognition? (external for testing - eventually I'm gonna stick it somewhere inside the resources or something like that, perhaps expanding the database dynamically by adding good results!)
P.S. The digits 5 and 6 are very similar, it may recognize 5 as 6 and vice versa. as well with 1 and 7, and 7 which is a little similar to 2 may also produce a false result. the same with the sets 0, 6, 8, 9 and { 1, 2, 9 or 0, 1, 8 and perhaps more.
P.S.2. - I've added a second file, CaptchaReader 2.exe which is a test, attempt to fix the problem of different sizes of database images without actually changing them to the same size and risk many false results.
Edit:
Can't attach rar:
RapidShare
SendSpace
MegaUpload
FileFlyer
MediaFire
Very very ugly code, many duplicates and ofcourse inefficient.
Current, only works for this specific captcha.
TODO:
-Better support for rotated characters.
-Support different sizes characters.
-Support characters other than digits.
-Clean the code, rewrite the algorithms efficiently and perhaps turn it into a library.
-Currently coded in C# for easy access to the pixels of different formats, planned to rewrite it somewhen, maybe in C. Anyhow, it's gonna be cross-platform for sure.
|
|
Back to top |
|
 |
Deltron Z Expert Cheater
Reputation: 1
Joined: 14 Jun 2009 Posts: 164
|
Posted: Mon May 17, 2010 2:58 am Post subject: |
|
|
Well, if nobody knows then at least I'll share my progress.
As I said, since I'm comparing pixels instead of shape then it's not very dynamic, but I've made a database creator to help me read more captcha's, all you have to do is specify the type of captcha from your custom database and you're good to go. I might even try and recognize captcha's automatically.
I'm still working on the captcha reader itself, I've got to clean the code first so I can understand myself what I did there and then I have to think of new algorithms for pixel matching, because I don't have a clue about starting to find the shape of a character.
One thing that's bothering me is that a free OCR gave better results than my own captcha reader - and mine supposed to read that one specifically. I guess it won't read harder captcha's and I'm planning mine to be able to read as many as it can.
If someone has a better idea than creating a database for each captcha and it's codeable for one person (which is me) I'm listening.
Only pictures this time:
Successful recognition:
And the database creator:
Now... combining them both with classes which I'm gonna put the messed code in, after cleaning it.
|
|
Back to top |
|
 |
1929394839292057839194958 Grandmaster Cheater Supreme
Reputation: 130
Joined: 22 Dec 2006 Posts: 1509
|
Posted: Mon May 17, 2010 5:09 am Post subject: |
|
|
Not bad. Just read through that. Pretty good job.
|
|
Back to top |
|
 |
Deltron Z Expert Cheater
Reputation: 1
Joined: 14 Jun 2009 Posts: 164
|
Posted: Tue May 18, 2010 5:39 am Post subject: |
|
|
I'm having troubles devising and algorithm to find the skeleton of a character. I thought if I'd run from one black edge to the other horizontaly and then again vertically, then by calculating the distance between them I could calculate an avarage to form a skeleton.
Not much of a success, but I guess that's just because I'm cheap on memory and tried to save some by running the algorithm horizontaly and vertically seperatly and at the same time with the method that finds the blocking rectangle in red. note to myself: take the longer side to form a square and apply a rotation to that square's center, this way I'm preventing data loss. least important, since I want my database creator to work first on simple CAPTCHAs with no rotation. (however, with font twist and different colors, perhaps different fonts)
What I've got:
And how it's supposed to be: (sort of something like this)
So that the skeleton can be put inside the character and match, this is basically just a thin line crossing the center of the character.
I've read that neural networks can be used to create a good database, can anyone explain further?
This way I can read CAPTCHAs with many different fonts or twisted characters. now I can only match pixels to a copy of the original character, so twisting gives me bad results. neural networks sounds more dynamic.
|
|
Back to top |
|
 |
NINTENDO Grandmaster Cheater Supreme
Reputation: 0
Joined: 02 Nov 2007 Posts: 1371
|
Posted: Wed May 19, 2010 2:00 am Post subject: |
|
|
this this
Description: |
|
Filesize: |
168.86 KB |
Viewed: |
6880 Time(s) |

|
_________________
Intel over amd yes. |
|
Back to top |
|
 |
nwongfeiying Grandmaster Cheater
Reputation: 2
Joined: 25 Jun 2007 Posts: 695
|
Posted: Wed May 19, 2010 5:30 pm Post subject: |
|
|
Interesting project to say the least. I congratulate you on your progress.
|
|
Back to top |
|
 |
Deltron Z Expert Cheater
Reputation: 1
Joined: 14 Jun 2009 Posts: 164
|
Posted: Fri May 21, 2010 1:53 pm Post subject: |
|
|
NINTENDO wrote: | this this |
I can't read this because my algorithm filters the background and once it reaches a character it uses a fill algorithm to fill a new blank image with black pixels (actually, something slightly different but similar), since the red overrides the text my algorithm can't recognize the text, and there's also black noise which I can't filter and the only way to recognize it is to match enough pixels correctly, so if there's a lot of black noise it won't work.
|
|
Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum You cannot attach files in this forum You can download files in this forum
|
|