Cheat Engine Forum Index Cheat Engine
The Official Site of Cheat Engine
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 


[tutv2(updated)]Analyzing Data to Produce a parser[By BanMe]

 
Post new topic   Reply to topic    Cheat Engine Forum Index -> General programming
View previous topic :: View next topic  
Author Message
BanMe
Master Cheater
Reputation: 0

Joined: 29 Nov 2005
Posts: 375
Location: Farmington NH, USA

PostPosted: Wed Mar 18, 2009 5:47 pm    Post subject: [tutv2(updated)]Analyzing Data to Produce a parser[By BanMe] Reply with quote

ok people today im going to do a tutorial in autoit3(cause i like the new features and its easy to code in)..In This tutorial im going to show you how to interpret data into code..by anylzing the data and looking for logical patterns in the data and then applying those logical patterns to code..this has nothing to do with "hacking games" just simple data analysis..

please visit this website in order to obtain the entire sample of the data that we will be anaylzing throughout this tutorial..

http://www.drf.com/misc/charts/SAR20060727.chart

please also visit this website as well as it describes each structure's layout(note: layout does not apply to the general order in which the structures appear in the data)..

http://www.drf.com/misc/charts/drfchartfields.pdf

so as we look at the data it looks to be a complete jumble but upon further inspection we start to see certain things that help in the break down of data into useable parts..

DataSample1

    "H","USA","SAR","20060727","9","D","Saratoga" "R","1","TB","AOC"," "," ","4U"," ","51000","0","51000","0"," ","0"," ","0"," ","0"," ","0"," ","0"," ","0"," ","0","30000","30000"," ","1650","F","T","M","11"," "," "," ","0100","0135","0100"," ","OC 30k/N1X -N","Firm","","",""," ","","","83","342.58","0","0","0","0","0"," ","","150542","Good","Cloudy","","","Y" "R","2","TB","MSW"," ","F","02"," ","47000","0","47000","0"," ","0"," ","0"," ","0"," ","0"," ","0"," ","0"," ","0","0","0"," ","550","F","D","D","11"," "," "," ","0135","0207","0138"," ","Md Sp Wt 47k","Fast","","17","89"," ","","","83","106.09","2173","4582","5924","0","0"," ","","401768","Good","Cloudy","","","Y" "R","3","TB","ALW"," "," ","3U"," ","53000","0","53000","0"," ","0"," ","0"," ","0"," ","0"," ","0"," ","0"," ","0","0","0"," ","900","F","T","T","9"," "," "," ","0207","0239","0208"," ","Alw 53000N2X","Firm","","3","97"," ","","","83","146.96","2449","4862","11206","13532","0"," ","","451183","Good","Cloudy","","","Y" "R","4","TB","MSW"," ","B","3U"," ","48000","0","48000","0"," ","0"," ","0"," ","0"," ","0"," ","0"," ","0"," ","0","0","0"," ","800","F","T","I","11"," "," "," ","0239","0311","0240"," ","Md Sp Wt 48k","Firm","","14","88"," ","","","83","135.91","2403","4861","11278","0","0"," ","","500705","Good","Cloudy","9","","Y"


here is the description of the first Structure (easily relatable to code)

    Field# Field Name Data Type Description
    1 Record type Character Record type code "H" for card Header
    2 Country code Character Three letter country code either USA or CAN
    3 Track code Character Two or three letter track code
    4 Race date Character Race date in CCYYMMDD format
    5 Number races Numeric
    6 Day/evening flag Character "D" for Day Racing, "E" for Evening Racing; Note that "E" is only used when a track runs two cards of racing on one day.
    7 Track Name Character


so looking at this we see it needs 7 elements 6 string types and 1 integer type..and that all structures begin with RecordType Field.. so with this info we would build someting similiar to this..
Code:

Func GetRawCSV()
Local $rc_FilePath,$rc_Info,$rc_Header,$Element
Local $rc_Temp,$rc_StrLoc,$rc_TempStr,$rc_StrLen
Dim $rc_hdrDef = "char[1];char[3];char[3];char[10];int;char[1];char[10];"
Dim $rc_Array
$rc_FilePath = FileOpenDialog("Select a Import File",@ScriptDir,"*.*")
If @error Then
   MsgBox(0,"Failed to select file","Failed")
   Exit
Else
        $rc_Info = FileOpen($rc_FilePath,0)
         $rc_Temp = FileRead($rc_Info)
         ReDim $rc_Array = StringSplit($rc_TempStr,",")
         $rc_Header = DllStructCreate($rc_hdrdef)
         For $Element = 1 to 7
               DllStructSetData($rc_Header,$Element,$rc_Array[$Element])
         Next
         $rc_Temp = DllStructCreate($rc_rddef)
         For $Element = $Element To 71;(7+64(64 is the total number of elements in the next structure if you where following along you would know this..)= 71)
               DllStructSetData($rc_Temp,$Element,$rc_Array[$Element])
         Next
     EndIf   
EndFunc


and this is a bad way to do things..
I hope you ask yourself "why?". Because, I'm gonna try to answer that in the next few paragraphs.this method does not take into account for header dynamics..i.e. it doesn't use the header RecordType Field to its fullest advantage and it relies on you knowing the number of elements in that structure in order to parse it..also it may get a overflow exception if the amount of elements in all structures goes above 65,356. also this code is not robust enough to deal with more then 1 header for each parsing pass in the text file, we want it to make a continous cycle until it reachs the end of the file..

so faults in this type of logic include:

1 not using the data to help you write more robust code.

2 Not thinking out the consequances of a really big file with multiple cards on them with 100k elements or more total.

3 counts on defined ways and user knowledge to do this with no dynamic variables that can help it do its task more effeciently and safely.


so how do we combat these "snags" in our logic..well lets go back to the data and look at it again..maybe it will provide the neccessary hint to help us...


    "H","USA","SAR","20060727","9","D","Saratoga" "R",...


so looking at this we can see the first 7 elements + the first element of the next structure(RecordType)..notice there is no comma between them..
so now we can search for the '%20' and trim everything before that space off and then split it into sections..also another benefit of the newly found identifier is that we no long have to keep adding onto a variable avoid a possible overflow due to to many elements. + we also get the added bonus of having more dynamic code.

Code:

Func GetRawCSV()
   Local $rc_FilePath,$rc_Info,$rc_Header,$Element
   Local $rc_Temp,$rc_StrLoc,$rc_TempStr,$rc_StrLen,$rc_LocTemp
   Dim $rc_hdrDef = "char[1];char[3];char[3];char[10];int;char[1];char[10];"
   $rc_FilePath = FileOpenDialog("Select a CSV File",@ScriptDir & "\","All(*.*)")
   If @error Then
      MsgBox(0,"Failed to select file","Failed")
      Return -1
   Else;no error occured Open The File.
      $rc_Info = FileOpen($rc_FilePath,0)
      $rc_Temp = FileRead($rc_Info)
      $rc_StrLen = StringLen($rc_Temp)
      Do
         
         $rc_StrLoc  = StringInStr($rc_Temp," ")
         $rc_LocTemp = $rc_LocTemp + $rc_StrLoc
         $rc_TempStr = StringLeft($rc_Temp,$rc_StrLoc)
         $rc_Temp    = StringTrimLeft($rc_Temp,$rc_StrLoc)
         $rc_Array   = StringSplit($rc_TempStr,",")
         If @error Then
            MsgBox(0,"Code Error","Failed Splitting Text")
            Return -1
         EndIf
         If $rc_Array[0] > 1 Then
               Select
               Case $rc_Array[1] = '"H"'
                  $rc_Header = DllStructCreate($rc_hdrDef)
                  For $Element = 1 To $rc_Array[0]
                     DllStructSetData($rc_Header,$Element,$rc_Array[$Element])
                  Next
               Case $rc_Array[1] = '"R"'
               Case $rc_Array[1] = '"E"'
               Case $rc_Array[1] = '"A"'
               Case $rc_Array[1] = '"C"'
               Case $rc_Array[1] = '"F"'
               Case $rc_Array[1] = '"S"'
               Case Else
                  MsgBox(0,"Done","Done Parse Text")
                  return 0
            EndSelect
         Else
            MsgBox(0,"Code Error","Returned 0 element array")
            Return -1
         EndIf
      Until $rc_StrLoc = $rc_StrLen
   EndIf
EndFunc
GetRawCSV()


Kind regards BanMe

p.s. please excuse all the edits i've made (touch up..you know)
i wrote this on the fly..without any help from anyone..(just the helpfile..)

_________________
don't +rep me..i do not wish to have "status" or "recognition" from you or anyone.. thank you.


Last edited by BanMe on Tue Mar 24, 2009 4:28 pm; edited 2 times in total
Back to top
View user's profile Send private message MSN Messenger
Spawnfestis
GO Moderator
Reputation: 0

Joined: 02 Nov 2007
Posts: 1746
Location: Pakistan

PostPosted: Thu Mar 19, 2009 6:06 am    Post subject: Reply with quote

Apart from AutoIT3 not being a programming langauge, the tutorial itself seems ok.
I don't see why anyone would want to touch AutoIT for other features than the easy botting, really. It's just an automating language, look at the name of it.
It's not fast, it's not better, it's definately not more logical than any programming language of which you could say take just as long to learn the basics of.

Eh, good anyway. Crying or Very sad

_________________

CLICK TO HAX MAPLESTORAY ^ !!!!
Back to top
View user's profile Send private message Send e-mail MSN Messenger
BanMe
Master Cheater
Reputation: 0

Joined: 29 Nov 2005
Posts: 375
Location: Farmington NH, USA

PostPosted: Thu Mar 19, 2009 11:24 am    Post subject: Reply with quote

thank you for your kind words and your opinions Very Happy
I did not do any planning for this tutorial and actually it is the result of a job ive been requested to do.. analyzing horse race data to find the choice picks from the data. so this tutorial was written on the fly.. I didnt think autoit3 was capable of something like this, but actually the release version now includes a whole host of automation features from internet explorer automation to excel automation..all with increasing detail in how to use them..I feel that using Autoit3 for the grunt work(data reaping/GUI) and a C++ dll to do the finer calculations with the data is easy way to get a task done..(quickest way to definable results)

I plan our continuing this tutorial on through that phase of development as well as onto the complete C++ rework of the code..hopefully showing code relationships in definable terms so that others may not only convert Auoit3 to C++ but from any coding/scripting language to ones own language of choice..

regards BanMe

_________________
don't +rep me..i do not wish to have "status" or "recognition" from you or anyone.. thank you.
Back to top
View user's profile Send private message MSN Messenger
SXGuy
I post too much
Reputation: 0

Joined: 19 Sep 2006
Posts: 3551

PostPosted: Thu Mar 19, 2009 2:28 pm    Post subject: Reply with quote

theres already a c++ / Autoit converter.
Back to top
View user's profile Send private message
BanMe
Master Cheater
Reputation: 0

Joined: 29 Nov 2005
Posts: 375
Location: Farmington NH, USA

PostPosted: Thu Mar 19, 2009 5:29 pm    Post subject: Reply with quote

the point would be "not" to have to use a converter..
I am unable to find this supposed converter with google or dogpile..

regards BanME

_________________
don't +rep me..i do not wish to have "status" or "recognition" from you or anyone.. thank you.
Back to top
View user's profile Send private message MSN Messenger
SXGuy
I post too much
Reputation: 0

Joined: 19 Sep 2006
Posts: 3551

PostPosted: Fri Mar 20, 2009 3:28 am    Post subject: Reply with quote

thats because its in the autoit forums, you wont find it on google.
Back to top
View user's profile Send private message
LolSalad
Grandmaster Cheater
Reputation: 1

Joined: 26 Aug 2007
Posts: 988
Location: Australia

PostPosted: Fri Mar 20, 2009 3:58 am    Post subject: Reply with quote

SXGuy wrote:
thats because its in the autoit forums, you wont find it on google.


Huh? Most of this is Autoit forums... why would they be excluded from the Google crawler?

_________________
Back to top
View user's profile Send private message MSN Messenger
BanMe
Master Cheater
Reputation: 0

Joined: 29 Nov 2005
Posts: 375
Location: Farmington NH, USA

PostPosted: Fri Mar 20, 2009 2:13 pm    Post subject: Reply with quote

maybe you could be so nice as to provide a link.. because I am completly unable to find this..all i can find is the autoit3 dev's saying that it would be almost impossible..
_________________
don't +rep me..i do not wish to have "status" or "recognition" from you or anyone.. thank you.
Back to top
View user's profile Send private message MSN Messenger
SXGuy
I post too much
Reputation: 0

Joined: 19 Sep 2006
Posts: 3551

PostPosted: Fri Mar 20, 2009 5:12 pm    Post subject: Reply with quote

Im sorry i cant provide a link, i saw the thread about a year ago, while researching autoit functions, havent had the need to use it either.

Maybe they removed it, or maybe it ended up being a load of bollox i dont know, i just remember a thread about someone who wrote a program to convert it all from c++ to Autoit.
Back to top
View user's profile Send private message
BanMe
Master Cheater
Reputation: 0

Joined: 29 Nov 2005
Posts: 375
Location: Farmington NH, USA

PostPosted: Fri Mar 20, 2009 5:52 pm    Post subject: Reply with quote

thank you for the reply :]

I read every thread that had convert autoit to C++, or C++ to autoit in it.. I couldn't find anything like a convertor

_________________
don't +rep me..i do not wish to have "status" or "recognition" from you or anyone.. thank you.
Back to top
View user's profile Send private message MSN Messenger
smartz993
I post too much
Reputation: 2

Joined: 20 Jun 2006
Posts: 2013
Location: USA

PostPosted: Fri Mar 20, 2009 6:23 pm    Post subject: Reply with quote

Maybe he got confused with this:

http://www.mmowned.com/forums/general-programs/73637-use-autoit-c-programs.html
Back to top
View user's profile Send private message
BanMe
Master Cheater
Reputation: 0

Joined: 29 Nov 2005
Posts: 375
Location: Farmington NH, USA

PostPosted: Mon Mar 23, 2009 4:15 pm    Post subject: [Tutv2]Analyzing Data To Produce a Parser[By BanMe] Reply with quote

maybe but it doesn't matter :]

on with the last part of the autoit tutorial..after analzing the code and actually running it it runs into problems with blank Fields and with structure elements that are text separated by commas' in a long string in in some structures.. so we cannot just split the string up by each space in the string as we did above.

so back to the data we go..


    "H","USA","SAR","20060727","9","D","Saratoga" "R","1","TB","AOC"," "," ","4U"," ","51000","0","51000","0"," ","0"," ","0"," ","0"," ","0"," ","0"," ","0"," ","0","30000","30000"," ","1650","F","T","M","11"," "," "," ","0100","0135","0100"," ","OC 30k/N1X -N","Firm","","",""," ","","","83","342.58","0","0","0","0","0"," ","","150542","Good","Cloudy","","","Y"


lets break down what we see..

first we see a StartQuote then Charecter(s) then a EndQuote and finally a comma.. we also know there is a space after each structure..
So If we take this data as a string and Split each field by the comma we end up with a array with data that looks like this:


    [0] Number Of Elements
    [1] "H"
    [2] "USA"
    [3] "SAR"
    [4] "20060727"
    [5] "9"
    [6] "D"
    [7] "Saratoga" "R"


for Strings with commas in them
like "Came up gamely,Tired,Came again and again"
break down into structures that look like this (usually but not all the time..)


    [0] 1
    [1] "Came Up Gamely



    [0] 1
    [1] Tired



    [0] 1
    [1] Came again and again"


so on and so forth with each element still containing Start and/or End Quotes.

so we can use this constant to our advantage by splitting the string By the Quote we end up with arrays that look like this:


    [0] Number Of Elements = 3
    [1]
    [2] H
    [3]


Next Element looks like this:


    [0] Number Of Elements = 3
    [1]
    [2] USA
    [3]


and also element comma strings end up looking like this

    [0] 2
    [1]
    [2] Came up gamely



    [0] 1
    [1] Tired



    [0] 2
    [1] Came again and again
    [2]


so there are a few benefits to doing it this way. you end up with a array like this when each structure ends in a space and includes the RecordType in the string..


    [0] Number Of Element = 5
    [1]
    [2] Saratoga
    [3]
    [4] R
    [5]


and also this method has another benefit. it provides a mostly predictable string pattern that one can use to reform the string and check for most often used endpoint for a comma separated string element and then add that string to the structure..


Here is the code representing this new analysis of the data
pls forgive me some debugging code is still in there.. but it is far more workable then the previous version and is going to be the last autoit version i produce as a prototype framework for the parser.. :]

Code:

#include <Array.au3>
#include <String.au3>
Func GetRawCSV()
   Local $rc_FilePath,$rc_FileHandle,$Element,$Index
   Local $rc_String,$rc_StringTemp,$rc_Res,$rc_Array,$rc_ArrayTemp,$rc_Element = 1
   Local $CurrentObj[1],$hdr[8],$rdata[64]
   Local $FirstExecute = 0,$thdr = 1,$lecheck = 2
   $rc_FilePath = FileOpenDialog("Select a File To Parse",@ScriptDir & "\","All(*.*)")
   If @error Then
      MsgBox(0,"Failed to select file","Failed")
      Return -1
   Else;no error occured Open The File.
      $rc_FileHandle = FileOpen($rc_FilePath,0)
      $rc_String = FileRead($rc_FileHandle)
      $rc_Array = StringSplit($rc_String,",")
      For $Element = 1 To $rc_Array[0]
         $rc_ArrayTemp = StringSplit($rc_Array[$Element],'"')
         If $rc_ArrayTemp[0] = 3 Then;its the start of the Header or it is a element in a structure
            If $lecheck < 2 Then
               _ArrayDisplay($rc_ArrayTemp)
               _StringInsert($rc_StringTemp,$rc_ArrayTemp[2],StringLen($rc_StringTemp))
               _ArrayPush($CurrentObj,$rc_StringTemp,1)
               $rc_Element += 1
               $lecheck = 2
            EndIf
            If $rc_Element = 1 Then;first element should be a H (Header)
               $rc_Res = StringCompare($rc_ArrayTemp[2],"H",1);if not $rc_Element != 1 then dont do this
               If $rc_Res = 0 Then;if H..
                  ReDim $CurrentObj[8];resize CurrentObject to Header Size
                  _ArrayPush($CurrentObj,$rc_ArrayTemp[2],1);Push element to top of list..
                  $rc_Element += 1;increment the counter..
               Else
                  _ArrayPush($CurrentObj,$rc_ArrayTemp[2],1);push down last element and take top of list..
                  $rc_Element += 1;..
               EndIf
            Else
               _ArrayPush($CurrentObj,$rc_ArrayTemp[2],1);..
               $rc_Element += 1;..
            EndIf
         Else; ArrayTemp[0] > 3(5) Or ArrayTemp[0] < 3(0)
                If $rc_ArrayTemp[0] <= 2 Then;elements split up that had comma in the text..
               If $lecheck > 0 Then
               Else
                  $lecheck = 0
               EndIf
               _ArrayDisplay($rc_ArrayTemp)
               If $lecheck = 1 Then
                  _StringInsert($rc_StringTemp,$rc_ArrayTemp[1],StringLen($rc_StringTemp)+1)
               ElseIf $lecheck = 0 Then
                  _StringInsert($rc_StringTemp,$rc_ArrayTemp[2],0)
                  $lecheck += 1
               EndIf
            Else
               _ArrayPush($CurrentObj,$rc_ArrayTemp[2],1);fisish off the structure by adding last element...
               ;_ArrayDisplay($CurrentObj);display finished structure
               $rc_Element += 1;..
               _ArrayDisplay($rc_ArrayTemp);show all arrays to be analyzed..
               Select
                  Case $rc_ArrayTemp[4] = "R"
                     If $thdr > 0 Then;this will be the total Number of RaceData structure after FirstExecute..
                        If $FirstExecute = 0 Then;Test..
                           For $Index = 0 To UBound($CurrentObj) - 1;reverse the array
                              $hdr[$Index] = _ArrayPop($CurrentObj);pop last element to the Header structure (first was pushed down to last)
                           Next
                           $thdr = Int($hdr[5]);set thdr to the number of RaceData structs in the data..
                           Dim $CurrentObj[$thdr*64];reset current structure so we can hold all that data...
                           ReDim $rdata[$thdr*64]
                           $FirstExecute = 1;done firstexecute
                        EndIf
                        _ArrayPush($CurrentObj,$rc_ArrayTemp[4],1);set first element to the 4th element of the last array returned..
                        $thdr = $thdr - 1;1 header started..
                        $rc_Element += 1;..
                     EndIf
                  Case $rc_ArrayTemp[4] = "E"
                     MsgBox(0,"landed At E..","")
                  Case $rc_ArrayTemp[4] = "A"
                     MsgBox(0,"landed At A..","")
                  Case $rc_ArrayTemp[4] = "C"
                     MsgBox(0,"landed At C..","")      
                  Case $rc_ArrayTemp[4] = "F"
                     MsgBox(0,"landed At F..","")
                  Case $rc_ArrayTemp[4] = "S"
                     MsgBox(0,"landed At S..","")
                  Case Else
                     _ArrayDisplay($rc_ArrayTemp)
                     MsgBox(0,"landed here..","");..default case
               EndSelect
             EndIf
         EndIf
      Next
      FileClose($rc_FileHandle);cleanup..
   EndIf
EndFunc

_________________
don't +rep me..i do not wish to have "status" or "recognition" from you or anyone.. thank you.
Back to top
View user's profile Send private message MSN Messenger
Display posts from previous:   
Post new topic   Reply to topic    Cheat Engine Forum Index -> General programming All times are GMT - 6 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Powered by phpBB © 2001, 2005 phpBB Group

CE Wiki   IRC (#CEF)   Twitter
Third party websites