Cheat Engine Forum Index Cheat Engine
The Official Site of Cheat Engine
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 


Assembler Help

 
Post new topic   Reply to topic    Cheat Engine Forum Index -> Cheat Engine Source
View previous topic :: View next topic  
Author Message
KryziK
Expert Cheater
Reputation: 3

Joined: 16 Aug 2009
Posts: 199

PostPosted: Sat Sep 12, 2015 9:06 pm    Post subject: Assembler Help Reply with quote

Hey all, I'm working on writing a small assembler for a subset of the Intel instructions, and I had a few questions:

1. In the CE source, in assemblerunit.pas, I noticed the following line:
Code:
(mnemonic:'ADD';opcode1:eo_iw;paramtype1:par_AX;paramtype2:par_imm16;bytes:2;bt1:$66;bt2:$05)


I've read through some of the manual and found that this seemingly misplaced 0x66 byte might correspond to VEX. Is this true? I've read that VEX extends the instruction set and allows for more operands. How important are these instructions right now? Could I probably get by without including them for the time being? I just want to start with the most used instructions, like an add, jump, mov, etc..

2. How does the assembler know which instruction to choose? For example, if I had a jmp instruction, how does the assembler know which jump rule to pick? Does it calculate the size of the immediate operand and see if it's larger than an 8-bit immediate (and if so, use the 32-bit immediate)? Or perhaps it depends on how large the offset between the jump instruction and the jump target?

These are just two instruction examples, but the concepts apply for many more instructions, so any information would be helpful. I've written a simplified MIPS assembler, but that one didn't have any duplicate mnemonics with different operands, so I am a bit lost in the Cheat Engine source code trying to find how it takes a set of tokens and grabs the correct opcodes array in order to build the output bytes.

3. Some instructions in the Intel manual are not in the Cheat Engine source. I was wondering what the reason for only including specific ones was. For example, Cheat Engine has 3 definitions for jmp, but the Intel manual has 11. I'm trying not to copy from Cheat Engine, but how would I know which ones I need or not?

Thank you!

Update: This link is very helpful for understanding the Mod R/M and SIB bytes: http://www.c-jump.com/CIS77/CPU/x86/index.html


Last edited by KryziK on Wed Oct 21, 2015 10:00 pm; edited 1 time in total
Back to top
View user's profile Send private message
Dark Byte
Site Admin
Reputation: 457

Joined: 09 May 2003
Posts: 25262
Location: The netherlands

PostPosted: Sun Sep 13, 2015 2:28 am    Post subject: Reply with quote

1 the 0x66 prefix switches between 16 and 32 bit operation (0x67 does the same but then for addressing, which ce does not support)

2 the prefered instruction is at top, but if the instructiin can’t be encoded it’ll for an alternate one (e g when you wish to encode a value bigger than 128 it can’t use the 2 byte jmp

3 a lot of those entries are duplicates with a minot diference (e.g it shows a different entry for 32 and 64 bit mode. CE just deals with that afterwards by setting a rex prefix)
besides that, some newer instructions aren't handled yet like VEX

_________________
Do not ask me about online cheats. I don't know any and wont help finding them.

Like my help? Join me on Patreon so i can keep helping
Back to top
View user's profile Send private message MSN Messenger
KryziK
Expert Cheater
Reputation: 3

Joined: 16 Aug 2009
Posts: 199

PostPosted: Sun Sep 13, 2015 11:38 am    Post subject: Reply with quote

2. So, for my second question, I see that CE finds the range of instructions in the array opcodes with the matching mnemonic, but what I really wanted to know was how CE determines whether or not the entry in opcodes is the right one. So, it starts at the first one, which is preferred, how does it determine whether the operands match what the user has typed? My guess was in the original post: It checks the size of the immediate that the user has typed, and then just makes sure that it fits into the size specified in opcodes.

Hopefully that was clear enough to understand. I'm just trying to figure out how CE "weeds out" the opcodes that don't match what the user typed. It's hard to read through so many if statements and such.

3. I see, in the Intel manual, some instructions that are 64 bit, such as:
Code:
REX.W + FF /5           JMP m16:64

I understand that these can be omitted because the REX prefix can be added later on. However, for jmp, there are a total of 10 entries in the Intel manual that do not have this REX prefix in front of them. So, I am just trying to figure out why there are only 3 entries total in CE's opcodes array. Without copying CE's opcodes array, how would I determine which entries from the Intel manual I really needed?

Thanks for your reply!
Back to top
View user's profile Send private message
Dark Byte
Site Admin
Reputation: 457

Joined: 09 May 2003
Posts: 25262
Location: The netherlands

PostPosted: Sun Sep 13, 2015 1:12 pm    Post subject: Reply with quote

2 yes, you're right
it checks the size of the immediate and then goes through the list
first the 2 byte jmp will get checked if it matches, and then the 5 byte one

if the immediate is bigger than 128 (1 byte) the 2 byte jmp fails, so it goes for the 5 byte jmp

the assembler array contains the parameters they expect


3 take that example you post there. CE has no real use for that one (with or without REX prefix) as changing the code segment is grnerally a bad idea. (sure, you could do tricks like executing 64 bit code in a 32 process, but come on... for ce?)

as for jmp rel16, for some reason it zeroes the upper 16 bits of EIP so for all purpose, useless

the jmp r/m# instructions are just 1 instruction. it may be shown 3 times, but they are just the same instruction. It depends on the cpu state how it gets handled (e. g. in 64 execution mode modrm can be encoded as rip relative)

_________________
Do not ask me about online cheats. I don't know any and wont help finding them.

Like my help? Join me on Patreon so i can keep helping
Back to top
View user's profile Send private message MSN Messenger
KryziK
Expert Cheater
Reputation: 3

Joined: 16 Aug 2009
Posts: 199

PostPosted: Sun Sep 13, 2015 5:59 pm    Post subject: Reply with quote

Thanks for your replies! I'll take what you have said into account when coding and reading the Intel manual.

Unfortunately, with classes and a job, who knows if I'll have enough time to work on this for very long. But, for now, I'll try!

Thanks again, DB. You're the best. <3
Back to top
View user's profile Send private message
KryziK
Expert Cheater
Reputation: 3

Joined: 16 Aug 2009
Posts: 199

PostPosted: Wed Sep 23, 2015 10:51 pm    Post subject: Reply with quote

DB,

Could you explain modR/M and how the /digit opcode works?

For example, the PUSH instruction (on 4-271 of Intel Manual):

Code:
FF /6    PUSH r/m32       M

Code:
M      ModRM:r/m (r)


I was under the impression that /0 through /7 directly related to a register. So are these 3 "FF /6" instructions referring to DH/SI/ESI? If not, what do they refer to?

I am trying to create a function to build the modR/M value based on an input. It looks like you check if there are any "[]"s to determine whether or not the input is a register or not. Is this true?

Going back to the PUSH instruction, if I wanted to say the following:
Code:
push [esi]


Then the bytes turn out to be "FF 36". So, that means that I first look for ESI in the top section (for a value of 110) and then find where it matches on the left side ([ESI] for a value of mod 00 and r/m 110). Is this sort of how it works? What is the difference between all of the following (2-operand examples):

Code:

RM        ModRM:reg (r, w)          ModRM:r/m (r)
MR        ModRM:r/m (r, w)         ModRM:reg (r)


Does the "reg" or "r/m" part say whether to look up that operand as a column or row in the mentioned table? What is the "r" and "r, w" part for?

Also, what is the difference between an opcode type of /digit (/0 - /7) and /r? Both are registers, are they not? Does the numbered type just force it to be a specific register rather than leaving it open?

Thank you for your time. I was making progress until I realized that modR/M was a special thing that had to be "built".
Back to top
View user's profile Send private message
Dark Byte
Site Admin
Reputation: 457

Joined: 09 May 2003
Posts: 25262
Location: The netherlands

PostPosted: Thu Sep 24, 2015 4:00 am    Post subject: Reply with quote

for instructions that don't need 2 parameters, the reg field of the modr/m byte can be part of the instruction

in case of push that's 110 , so the real instruction is 11111111 110

modR/M is too complex to explain, but the intel guide with all the instructions has a chapter on how the modr/m and sib bytes are build (and the offsets, and several other special cases)

as for the r,w stuff, no idea, i never looked at those

_________________
Do not ask me about online cheats. I don't know any and wont help finding them.

Like my help? Join me on Patreon so i can keep helping
Back to top
View user's profile Send private message MSN Messenger
KryziK
Expert Cheater
Reputation: 3

Joined: 16 Aug 2009
Posts: 199

PostPosted: Thu Sep 24, 2015 12:14 pm    Post subject: Reply with quote

Could you explain that first part again? What do you mean the reg field can be part of the instruction? Why is push "11111111 110"? That comes out to "FF 06", which is an inc instruction according to CE. I'm sure that I'm misunderstanding something here.

Also, could you answer my last question about /digit vs /r (eo_reg vs eo_reg0/1/2/3/4/5/6/7)? I can tell that this difference is important in understanding what you just explained, because:
Code:
• /digit — A digit between 0 and 7 indicates that the ModR/M byte of the instruction uses only the r/m (register
or memory) operand. The reg field contains the digit that provides an extension to the instruction's opcode.

is basically what it sounds like you are saying. From the table, I can't tell how you would get 0x36 as the ModR/M byte from just "[ESI]" as one parameter unless you look up the row with "[ESI]" and the column with "ESI".

Idea: Is it the eo_reg6 part that tells you to go to the column of value 110 (DH/SI/ESI/MM6/XMM6)? If so, does that mean eo_reg would be any register, not a specific one, and so you'd have to parse that register for the register (column) to go to? So something like "add edx,ebx" would be eo_reg with params par_r32 and par_rm32, and so you'd go to the column of EDX (010) and the row of EBX (011) to get the ModR/M byte of 11 010 011 aka 0xD3 (for a total of 03 D3)?

Let me know. I really am not asking for spoon-fed answers; I am trying to grasp these concepts on my own (as you can tell by my thought process in this post, assuming it's correct).

Thank you so much!
Back to top
View user's profile Send private message
Dark Byte
Site Admin
Reputation: 457

Joined: 09 May 2003
Posts: 25262
Location: The netherlands

PostPosted: Sat Sep 26, 2015 3:11 am    Post subject: Reply with quote

bit 3,4 and 5 make up the reg field, so if those are 110 (6) it'd be a push

well, i guess an alternate way of looking at it is assume FF is a single instruction (e.g. WRT) where the register specifies what kind of operation is done
e. g. :
WRT [1234], EAX = INC [1234]
WRT [1234], ESI = PUSH [1234]


eo_reg6 means that bits 3-5 should be made 6

_________________
Do not ask me about online cheats. I don't know any and wont help finding them.

Like my help? Join me on Patreon so i can keep helping
Back to top
View user's profile Send private message MSN Messenger
KryziK
Expert Cheater
Reputation: 3

Joined: 16 Aug 2009
Posts: 199

PostPosted: Wed Oct 21, 2015 9:57 pm    Post subject: Reply with quote

Dark Byte,

Thanks for the reply. I have been working on my code ever since your reply. I found this resource which ended up being very helpful:

http://www.c-jump.com/CIS77/CPU/x86/index.html

It provides an easy to understand look at the MOD R/M and SIB bytes, with examples and everything. I'll add the link to the OP, too.

I'm getting closer to finishing! Thanks again.
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    Cheat Engine Forum Index -> Cheat Engine Source All times are GMT - 6 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Powered by phpBB © 2001, 2005 phpBB Group

CE Wiki   IRC (#CEF)   Twitter
Third party websites