Posted: Mon Jul 25, 2016 9:44 am Post subject: Where is a memory address's format stored at?
I've tried Googling a bit for this, but I can't seem to come across a "plain English" explanation of this. Put simply, how does a program know how to treat any given memory address? I understand that at compile time, the program is compiled with size and format in mind for everything, but where does it store such information so that at runtime, it knows when a particular address is a 4-byte int vs. 4-byte long?
Or better yet, how does CE know what format a memory address is when it scans?
I may well be thinking about this too deeply, but I'm just not understanding how a program (whether the program itself or a program like CE that can analyze said program) "knows" how to treat each of its memory addresses! _________________
Joined: 09 May 2003 Posts: 25833 Location: The netherlands
Posted: Mon Jul 25, 2016 11:44 am Post subject:
CE doesn't know what format an address is. It relies on the user to tell it instead. (or if you use all, it just tries every possible combination)
if you're talking about dissect data, then it's either based on guessing (address alignment and if the value is a human readable value or not) or if there is debugging information available (.net/mono, .pdb) then it can get the info from there _________________
Do not ask me about online cheats. I don't know any and wont help finding them.
Like my help? Join me on Patreon so i can keep helping
When you program, you can define which data type you want to use for your variable. Such as int, short int, long int, unsigned int, char etc. depending on the language. It is a feature of strongly typed languages but if you have been using very high level languages/managed languages then i can understand your confusion.
CE just guesses, of course CE doesn't know what is the proper data type.
In memory all data is same, a string is no different than an int unless you treat it as such. A string is a collection of chars(one byte), a 4 bytes int is a collection of 1 bytes. This will make it clear for you, open CE mem viewer and in hex viewer, change display type from byte hex to any of the different data types, you can see all of them are basically just bytes. That's how they are stored in memory _________________
When you program, you can define which data type you want to use for your variable. Such as int, short int, long int, unsigned int, char etc. depending on the language.
I understand this part. My confusion comes in not understanding how the program itself, after compilation, knows to assign a particular format to an address it allocates. Where is this information stored in the program? Like, when you run the program and it loads into RAM, where does it check within itself to know that a particular address needs to be, say, a 4-byte long instead of a 4-byte int? _________________
When you program, you can define which data type you want to use for your variable. Such as int, short int, long int, unsigned int, char etc. depending on the language.
I understand this part. My confusion comes in not understanding how the program itself, after compilation, knows to assign a particular format to an address it allocates. Where is this information stored in the program? Like, when you run the program and it loads into RAM, where does it check within itself to know that a particular address needs to be, say, a 4-byte long instead of a 4-byte int?
Ah, that's within each function. For example, strcmp functions expects a string/chars so the value you pass it to will be treated as string. If you pass that same string to say your custom function expecting ints, it will treat that string/chars as ints. This is how typecasts works.
Strongly typed language's compiler enforces these rules i.e if you define a string, you can't use it as an int unless you do type-casting but they are stored in a memory just the same as any other data type.
So in memory, they aren't stored as a 4 byte or a double, they are stored as collection of bytes. Functions define how they are used, of course strict rules are followed at compile time to avoid anarchy and unexpected results! if your function expects a string but you accidentally pass it an int, it can result in even disastrous results. 0 in a string signifies end of string but its just another value in int.
Hope this makes sense. When you are gamehacking, you can modify a string one byte at a time, same for a 4 or 8 bytes value. Look at doubles, collection of double DWORDS so if you want to modify them with a mov instruction, you modify those two DWORDs to reach your desired value. But you can modify it one byte at a time.
Compiles do store data types in a specific order though for example in case of classes/object oriented programming. They are closely stored but you can still treat them separately and modify them a byte at a time. So nowhere does the format info is stored but functions themselves decide how to use them (speaking at debugging time, at compile time this is all enforced by strict rules). _________________
When you program, you can define which data type you want to use for your variable. Such as int, short int, long int, unsigned int, char etc. depending on the language.
I understand this part. My confusion comes in not understanding how the program itself, after compilation, knows to assign a particular format to an address it allocates. Where is this information stored in the program? Like, when you run the program and it loads into RAM, where does it check within itself to know that a particular address needs to be, say, a 4-byte long instead of a 4-byte int?
Ah, that's within each function.
Why on earth did my brain not put that together?
Right, so then we get into value types and reference types between stack/heap/global, any addresses of which all have their size/format defined from their respective functions, correct? _________________
When you program, you can define which data type you want to use for your variable. Such as int, short int, long int, unsigned int, char etc. depending on the language.
I understand this part. My confusion comes in not understanding how the program itself, after compilation, knows to assign a particular format to an address it allocates. Where is this information stored in the program? Like, when you run the program and it loads into RAM, where does it check within itself to know that a particular address needs to be, say, a 4-byte long instead of a 4-byte int?
Ah, that's within each function.
Why on earth did my brain not put that together?
Right, so then we get into value types and reference types between stack/heap/global, any addresses of which all have their size/format defined from their respective functions, correct?
See my edit above, the forums should have a notification of some sort.
Anyway, not sure i understand your question correctly. What makes this all very clear for me to think of it in the way it actually is if you remove all the prettiness the compilers and high languages do. It is eventually just a collection of on/off, higher than that 1 and 0s....higher than that assembly you know. So when you speak debugger time, assembly is what we are dealing with.
What do you know about assembly? There are bytes which code for opcodes/instructions which makes it all happen. So, how do different instructions treat a certain data type and the register types? If this is all clear to you then you already know how everything is stored in memory.
Bytes are actually stored in file and then loaded into memory. Those bytes translate to opcodes/instructions right ? THAT IS IT! that is all that is stored, you can treat those bytes encoding for instructions as a DATA TYPE, you can treat data types in .data section as data type. You can treat anything in memory as a data type, you can treat it as a pointer, a reference type, a float, a double etc.
I am over-simplifying things but you get it now, right ? .
A function in compiler translates to instructions in memory. You made a function and it for example expects a float so for example this instruction will be in memory
fld [game.exe+92]
game.exe+92 is expected to contain a float value. But you made another function where you used game.exe+92 as a 4 byte say
mov [game.exe+92], 1
then game.exe+92 will be expected as a 4 byte! Both are true. A float value of say -8.826972961 is stored in memory as 48 3B 0D C1 but that same value can just as easily be a 4 byte (C10D3B48) or it can be a byte (48 ) or it can be a pointer! (to C10D3B48) or it can be a reference type.
So nowhere is the format defined, it is just stored in memory as bytes but functions(instructions) use it as you tell them/expect them to at compile time.
You're thinking about this from a high-level perspective far too much. Value types are very useful for sanity checks when developing a program, but when you get down to it, a value type is really just an abstraction over bytes in memory. In other words, every value type is stored in memory as bytes. You're free to interpret those bytes any way you want, be it 4-byte, float, string, or something you make up (i.e. custom value types). There is absolutely nothing you can do to conclusively distinguish an address's value type just from looking at its value.
You can make an educated guess of an address's value type by looking at how the program accesses that address (e.g. fld dword ptr [eax] probably means [eax] is a float), but you still won't know for certain. When you look at the core aspects of reverse engineering, the only thing that's important is what the program does with a value. In order to quickly determine this, most people will make the assumption that a program will only treat a single value as a single type, which doesn't always have to be true. Take this C code for example:
Code:
#include <stdio.h>
int main(void) {
float a = 9.375f;
for (;*((char*)&a+3); a /= 8192.0f){
printf("%d\n",*((char*)&a+3)/6);
}
*(int*)&a += 0x4346;
printf("%s",(char*)&a);
return(0);
}
This code counts down from 10 to 1 and prints a message to the screen using a single 4-byte address. That address is a float when it is declared and initialized as such, a boolean when used as the condition in the for loop, a float again in the for loop's increment statement, a 1-byte value in the print statement within the for loop, a 4-byte integer after the for loop, and a string at the final print statement. If you were just looking at the disassembly of this, you wouldn't know what to make of that variable's type. Again, the variable's type isn't important- what the code is doing is the important thing.
When CE scans for a particular value type, it looks through all memory you've specified. The only thing that changes between a 4-byte scan and a float scan is how CE interprets the bytes in memory. This is why you can change the value type in the found list on the fly (as long as the types are of the same length). When you dissect a structure CE has no way of gathering information on (mono discussed next), CE guesses the value types based on what looks correct. For example, the 4-byte value 1088421888 would be better interpreted as the float 7.0.
That's not to say this information can't exist. Some languages or compilers might store this information somewhere, usually for debugging purposes. Object-oriented languages can store information about an object's class at the start of the object, mono software can provide information about different classes and fields (CE makes use of this), and compilers can keep information on variables around if asked to. However, since this information is useless with regards to the execution of a bug-free program, it is commonly omitted for privacy and efficiency. _________________
I don't know where I'm going, but I'll figure it out when I get there.
Thanks DB, STN, and Parkour! This is crystal clear for me now.
The convolution in my head stems from a weird amalgamation of things I've been studying at the same time lately from low-level and high-level (C#, specifically)--the error being what you led with in your reply, Parkour.
Thanks again for your detailed replies, everyone!!! _________________
Joined: 07 Nov 2008 Posts: 4438 Location: W kraju nad Wisla. UTC+01:00
Posted: Mon Jul 25, 2016 3:11 pm Post subject: Re: Where is a memory address's format stored at?
h3x1c wrote:
how does a program know how to treat any given memory address? I understand that at compile time...
All above is true for many programming languages. But, there are other languages...
For example Lua 5.3 (and customized Lua in games). The type of variable can dynamically change and its value will still be at the same address.
There must be additional few bytes with variable type.
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum You cannot attach files in this forum You can download files in this forum