Cheat Engine Forum Index Cheat Engine
The Official Site of Cheat Engine
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 


Solving Substitution Ciphers, Manual versus Semi-Automated

 
Post new topic   Reply to topic    Cheat Engine Forum Index -> General programming -> Crackmes
View previous topic :: View next topic  
Author Message
thenicelordj
Cheater
Reputation: 0

Joined: 24 Mar 2007
Posts: 48
Location: I'm in your VCR

PostPosted: Thu May 15, 2008 6:23 am    Post subject: Solving Substitution Ciphers, Manual versus Semi-Automated Reply with quote

Semi-automated cracking of monoalphabetic and polyalphabetic substitution ciphers

Since a few of the 'crackme' challenges in this topic have been based upon substitution ciphers I thought I'd spend a little time to cover a cracking strategy that seems to elude most people. What I present here is a method of breaking substitution ciphers in any language or format without the usual human approach... that is, without guessing at the likely sequences which help determine the plaintext.

First though, some examples of the classical approach...


Example 1, Monoalphabet with spaces preserved

In a Monoalphabetic substitution cipher you have an alphabet used for the plaintext (For example A-Z). Thus all plaintext messages are expressed in this alphabet. In the example given (uppercase letters) the following is a valid message :

I AM GOING TO THE SHOPS

Now, we choose 26 symbols that will form the encrypted text. It is important to note that the symbols are not always chosen from the same alphabet. Usually though, they are just a rearrangement of the initial alphabet, like this...

CODE
Plain Message : [ABCDEFGHIJKLMNOPQRSTUVWXYZ]
Crypt Message : [FQRPTLVDWYHZIJSKMXNOAUBGCE]

Thus the message [I AM GOING TO THE SHOPS]
becomes [W FI VSWJV OS ODT NDSKN]


Now the usual approach 'frequency distribution' won't work too well. Basically, the principle that E is the most common letter in english text breaks down because the sample is not representative. There is only one E. There are three O's though, so we're already starting badly by mistaking O's for E's The next most common letter in english test is 'T', but as you see here this is ambiguous as T, S and even G have 2 letters. Still, this is enough to guess that ODT might mean THE and thus make a guess at the true symbol for 'E'

But lets put frequency distribution approaches aside for a moment, just realise that frequency distribution will only get you a few characters at best and requires careful application to avoid the 'E/O' problem we encountered above. Ultimately, whatever the results of frequency analysis give us, we must eventually solve this as a logic problem.

Of course, we have a LOT of clues, firstly the spaces are preserved, so we immediately can guess that the lone 'W' represents either an I or an A. We also know that 'OS' represents an 'is, am, it, an, or, be, to, etc...' ... experience tells us that ODT is probably 'and' or 'the', but could also be 'but', 'our'.

What do we see in 'OS ODT' well, the initial letter is the same.

So, 'to the' and 'on our' seem very likely, 'be bad' is a less likely possibility... 'an ass' is obviously ruled out because of the double s... etc...

Continuing with this thinking, and testing various possibilities we eventually get to a point where we can solve the problem. Essentially we are playing a variation of the childrens game 'hangman'. Trying to guess at the gaps by seeing patterns in structure and a few available letters.

After a few minutes we will probably get to the solution.


Example 2, Monoalphabetic with spaces encrypted

Lets say that instead of using 26 symbols in our alphabetic substitution cipher we choose 27. The new one is the space character itself.

CODE
Plain Message : [ABCDEFGHIJKLMNOPQRSTUVWXYZ ]
Crypt Message : [FERPTLVDWYHZIJ SKMXNOAUBGCQ]

Thus the message [I AM GOING TO THE SHOPS]
becomes [WQFIQV WJVQN QNDTQXD SH]


This hides a lot of the clues. The initial approach would probably have the frequency distribution approach misinterpret Q (the spaces) for the letter E or T. In the logic phase they would probably assert that SH is 'it' because of its relationship to the larger preceeding word and the fact that it occurs at the end of the message (maeaning that 'is', 'as', 'am', etc... would be dangling, and thus gramatically unlikely)

Of course, after some time spent analysing this the cracker may realise the mistake and treat Q as a SPACE (The most common SYMBOL in english text) and suddenly everything starts falling into place. They then solve it as before.


The automated solution to all monoalphabetic ciphers

It is an incredibly complex problem to code a program which will emulate your thought processes as you try to crack a monoalpabetic cipher. Describing gramatical relationships alone is a difficult task. Now, you might think, easy, I'll just write a program which tries every possible combination... that is an incredibly wasteful and simplistic approach but you still have a problem, how does the program 'know' when it hits a valid solution ? Do you want it to prompt you whenever it gets a string with the word 'the' in it ? That is going to log countless possibles! Do you want to check everything against a wordlist? What if the majority of the message is names ?

HERBERT GRIMSHAW IS COOLTASTIC

what we need is a system for identifying english-looking structures. A good way to do this is with trigrams or quadrigrams. We will work with quadrigrams (4 letter sequences).

We create an array 27x27x27x27 (A four dimensional array) and then head over to project gutenberg (free online literature) for training. Find a selection of reasonably contemporary piece and download them as text. Strip off the license headers and feed these files through an algorithm.

We read the book from beginning to end using a sliding window. We ignore all punctuation and symbols and merge any whitespace into a single space. This gives us a stream of characters separated only by single spaces. We move through this stream one character at a time keeping a buffer of the last four symbols read.

CODE
Thus, for a book which begins 'It was a cold, dark, stormy night...' our buffer reads the following...

[IT W]
[T WA]
[ WAS]
[WAS ]
[AS A]
[S A ]
[ A C]
[A CO]
[ COL]
[COLD]
[OLD ]
[LD D]


Each buffer represents a quadrigram. Now, for each buffer we do the following. Turn each character into a symbol. A=0, B=1, C=2, D=3..to..Y=25, Z=26, (SPACE)=27. This is easy, just deduct 65 (The value of 'A') from each character in the buffer... if the result is negative then set it to 27

So, the first buffer 'IT(space)W' becomes '8, 19, 27, 23'

Remember our 4 dimensional array ? This counts the occurrences of each unique quadragram pattern. So, when we see [IT W] we increment the value at that position in the four dimensional array:

CODE
In C this looks like 'buffer[8][19][27][23]'
in VB it would look Like 'buffer(8,19,27,23)'


Keep doing this for a few books. If you want to build a quadrigram for breaking spanish messages you'd use modern spanish texts to train your quadrigrams, if russian you would use a different alphabet and russian texts to train on. By the time you're finished you have an array of quadrigrams and their likelyhood. Ones that have never occurred score badly. Ones that occur regularly score highly.

Thus, en english, Q followed by anything other than a U will score very badly indeed...

CODE
For example, the message O QID ITPAA'

[O QI] scores highly negative
[ QID] scores highly negative
[QID ] scores highly negative
[ID I] moderately positive
[D IT] moderately positive
[ ITP] scores highly negative
[ITPA] scores highly negative
[TPAA] scores highly negative
[TPAA] scores highly negative


The result, this is rejected as english text.


CODE
Other constructs score highly twice... 'Is the round ball under the chair'

[IS T] scores high
[S TH] scores high
[ THE] scores very high
[THE ] scores very high
[HE R] high
[E RO] high
[ ROU] moderate
[ROUN] moderate-high
[OUND] moderate-high
[UND ] moderate
[ND B] moderate-high


Such a sentence would very quickly pass the threshold for strong english constructions.


I hope you realize the simplicity and power of the quadragram approach for automated cracking of mono alphabetic substitution ciphers. It can be trained on any language or alphabet you wish, from English to Russian, from dwarvish to klingon. It can also be trained on protocols such as HTML or IRC and structured documents and file formats.

Whilst it will not 'know' when it has the correct answer you can league-table the best results for display, normally the answer is in the first 20 places and, with structured documents such as HTML, is almost always the top result.

Not only does it simplify monoalphabetic substitution but it helps considerably to speed up polyalphabetics and other non-substitution ciphers such as RC4. And, the best thing, you can build a discrimination library of languages, protocols, dialects, formal and informal speech. Anything.

Alphabets with upper AND lower case are even more discriminating. Alphabets with periods will discriminate against 'is' at the end of a sentence thus demonstrating a very simplistic awareness of gramatical structure. Endings with 'ing' 'er' 'ist' score higher too than the same constructs in the middle of a word. It really does take the work out of recognising solutions and discerning between possible languages.

So, stop breaking substitution ciphers by hand, and give this a go. See if you can write a program which breaks monoalphabetics in a semi-automated fashion. If you manage this, try the problem of breaking polyalphabetics (much more of a challenge)

The other thing you should take from this is that substitution ciphers are NOT valid encryption these days, at best they are logic puzzles. Fun to break if you're bored, but ultimately useless.

_________________
I'm like a rubix cube, the more you play with me the harder I get.
Back to top
View user's profile Send private message
oib111
I post too much
Reputation: 0

Joined: 02 Apr 2007
Posts: 2947
Location: you wanna know why?

PostPosted: Thu May 15, 2008 2:52 pm    Post subject: Reply with quote

x0r wrote:
Thank you for your huge effort.

http://www.criticalsecurity.net/index.php?showtopic=26999


I suspect a copy paste.... Laughing

_________________


8D wrote:

cigs dont make people high, which weed does, which causes them to do bad stuff. like killing
Back to top
View user's profile Send private message AIM Address Yahoo Messenger MSN Messenger
thenicelordj
Cheater
Reputation: 0

Joined: 24 Mar 2007
Posts: 48
Location: I'm in your VCR

PostPosted: Fri May 16, 2008 5:13 am    Post subject: Reply with quote

oib111 wrote:
x0r wrote:
Thank you for your huge effort.

http://www.criticalsecurity.net/index.php?showtopic=26999


I suspect a copy paste.... Laughing


Indeed I copied my earlier post from cricticalsecurity. I'm DToX. Sure I could've mentioned it but it's funnier when someone finds the original thread and then accuse me. Wink

_________________
I'm like a rubix cube, the more you play with me the harder I get.
Back to top
View user's profile Send private message
Jake!
Grandmaster Cheater Supreme
Reputation: 4

Joined: 03 Jan 2007
Posts: 1354

PostPosted: Tue May 20, 2008 8:51 pm    Post subject: Reply with quote

x0r wrote:
thenicelordj wrote:
Indeed I copied my earlier post from cricticalsecurity. I'm DToX. Sure I could've mentioned it but it's funnier when someone finds the original thread and then accuse me. Wink

Oh sure, now please make a post on criticalsecurity.net verifying this claim Smile


Bitch got denied.
Back to top
View user's profile Send private message
Overload
Master Cheater
Reputation: 0

Joined: 08 Feb 2008
Posts: 293

PostPosted: Tue May 20, 2008 9:01 pm    Post subject: Reply with quote

Link wrote:
x0r wrote:
thenicelordj wrote:
Indeed I copied my earlier post from cricticalsecurity. I'm DToX. Sure I could've mentioned it but it's funnier when someone finds the original thread and then accuse me. Wink

Oh sure, now please make a post on criticalsecurity.net verifying this claim Smile


Bitch got denied.


Bitch got owned. Laughing
Back to top
View user's profile Send private message MSN Messenger
Display posts from previous:   
Post new topic   Reply to topic    Cheat Engine Forum Index -> General programming -> Crackmes All times are GMT - 6 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You cannot download files in this forum


Powered by phpBB © 2001, 2005 phpBB Group

CE Wiki   IRC (#CEF)   Twitter
Third party websites