| View previous topic :: View next topic |
| Author |
Message |
bowbowtap Newbie cheater
Reputation: 0
Joined: 27 Apr 2013 Posts: 12 Location: 台灣
|
Posted: Mon Apr 29, 2013 12:12 pm Post subject: pointer scanner How Gpu computing |
|
|
How GPu computing?
cpu Slow...
|
|
| Back to top |
|
 |
Dark Byte Site Admin
Reputation: 470
Joined: 09 May 2003 Posts: 25807 Location: The netherlands
|
Posted: Mon Apr 29, 2013 12:17 pm Post subject: |
|
|
When graphics cards can hold more than 6gb ram for the pointertree i'll look into it.
Also, a big bottleneck is the writing of the results to disk, so get a 2tb ssd
_________________
Do not ask me about online cheats. I don't know any and wont help finding them.
Like my help? Join me on Patreon so i can keep helping |
|
| Back to top |
|
 |
bowbowtap Newbie cheater
Reputation: 0
Joined: 27 Apr 2013 Posts: 12 Location: 台灣
|
Posted: Mon Apr 29, 2013 12:21 pm Post subject: |
|
|
| Dark Byte wrote: | When graphics cards can hold more than 6gb ram for the pointertree i'll look into it.
Also, a big bottleneck is the writing of the results to disk, so get a 2tb ssd |
pc ram can not it?
|
|
| Back to top |
|
 |
Dark Byte Site Admin
Reputation: 470
Joined: 09 May 2003 Posts: 25807 Location: The netherlands
|
Posted: Mon Apr 29, 2013 12:24 pm Post subject: |
|
|
From what i've read gpu computing can not access cpu memory. The cpu first has to send the data to the gpu first
_________________
Do not ask me about online cheats. I don't know any and wont help finding them.
Like my help? Join me on Patreon so i can keep helping |
|
| Back to top |
|
 |
bowbowtap Newbie cheater
Reputation: 0
Joined: 27 Apr 2013 Posts: 12 Location: 台灣
|
Posted: Mon Apr 29, 2013 12:27 pm Post subject: |
|
|
| Dark Byte wrote: | | From what i've read gpu computing can not access cpu memory. The cpu first has to send the data to the gpu first |
How to speed up?
game+ce Store RamDisk?
|
|
| Back to top |
|
 |
Dark Byte Site Admin
Reputation: 470
Joined: 09 May 2003 Posts: 25807 Location: The netherlands
|
Posted: Mon Apr 29, 2013 12:30 pm Post subject: |
|
|
If you have more than 500gb ram you can use a ramdisk, but i doubt that.
In the future i might add distributed computing to the pointerscan so you can have 100 computers working on the same pointerscan
_________________
Do not ask me about online cheats. I don't know any and wont help finding them.
Like my help? Join me on Patreon so i can keep helping |
|
| Back to top |
|
 |
bowbowtap Newbie cheater
Reputation: 0
Joined: 27 Apr 2013 Posts: 12 Location: 台灣
|
Posted: Mon Apr 29, 2013 12:37 pm Post subject: |
|
|
GPU
I thought feasible
Results can not be
I saw the program
「RAR GPU Password Recovery」
The legend ..
9 password CPU crack 43 years, GPU 48 days
Last edited by bowbowtap on Mon Apr 29, 2013 12:45 pm; edited 1 time in total |
|
| Back to top |
|
 |
Dark Byte Site Admin
Reputation: 470
Joined: 09 May 2003 Posts: 25807 Location: The netherlands
|
Posted: Mon Apr 29, 2013 12:42 pm Post subject: |
|
|
It's still a theoretic idea.
But you'd probably have to set them up yourself and let 'workers' connect to the 'cloud' where it will fetch jobs and create new jobs to other workers if needed
Basically like the current pointerscanner where each thread can give any other thread a job when it can, but then with a variable amount of threads
Of course, every worker will need access to the pointertree which can be a 6GB+ file (So a slow initial initialization and high network traffic when a new worker gets added)
Edit: just read about amd's hUMA project which will be useful here
_________________
Do not ask me about online cheats. I don't know any and wont help finding them.
Like my help? Join me on Patreon so i can keep helping
Last edited by Dark Byte on Tue Apr 30, 2013 9:11 am; edited 1 time in total |
|
| Back to top |
|
 |
bowbowtap Newbie cheater
Reputation: 0
Joined: 27 Apr 2013 Posts: 12 Location: 台灣
|
Posted: Mon Apr 29, 2013 12:48 pm Post subject: |
|
|
| Dark Byte wrote: | It's still a theoretic idea.
But you'd probably have to set them up yourself and let 'workers' connect to the 'cloud' where it will fetch jobs and create new jobs to other workers if needed
Basically like the current pointerscanner where each thread can give any other thread a job when it can, but then with a variable amount of threads
Of course, every worker will need access to the pointertree which can be a 6GB+ file (So a slow initial initialization and high network traffic when a new worker gets added) |
TY~XD
|
|
| Back to top |
|
 |
Dark Byte Site Admin
Reputation: 470
Joined: 09 May 2003 Posts: 25807 Location: The netherlands
|
Posted: Tue Nov 26, 2013 11:11 pm Post subject: |
|
|
Seeing that Titans have 6GB ram I've decided to give this a test.
Result: Not fast enough ( source: http://code.google.com/p/cheat-engine/source/browse/trunk/Cheat+Engine/CUDA+pointerscan/ )
Anyhow, next version has the multiple worker method implemented, which does provided a great speed improvement
_________________
Do not ask me about online cheats. I don't know any and wont help finding them.
Like my help? Join me on Patreon so i can keep helping |
|
| Back to top |
|
 |
Gniarf Grandmaster Cheater Supreme
Reputation: 43
Joined: 12 Mar 2012 Posts: 1285
|
Posted: Thu Dec 12, 2013 4:51 pm Post subject: |
|
|
I saw a few cuda-related commits in the SVN, so before this project goes too far, I'd suggest switching to OpenCL, simply because the radeons do not support cuda, but all modern gpus support OpenCL.
Actually since opencl code can also run on some cpus, you could also use the same code cpu and gpu pointerscanning (I'm NOT speaking about merged scanning), in a distant future.
Also I'm not very competent on the matter, but I heard geforce are more FPU oriented and radeons perform faster on logical operations, which is why they are preferred for bitcoin mining. Considering that pointerscanning is more about integer operations I'd expect radeons to perform significantly faster, incase you have one laying around.
_________________
DO NOT PM me if you want help on making/fixing/using a hack. |
|
| Back to top |
|
 |
Dark Byte Site Admin
Reputation: 470
Joined: 09 May 2003 Posts: 25807 Location: The netherlands
|
Posted: Thu Dec 12, 2013 5:49 pm Post subject: |
|
|
Right now I've stopped work on this as the performance is too slow.
(A scan took about as long as a single threaded scan in ce when compiled in debug mode , and that while this cuda pointerscanner didn't even write the results)
Even with the minor difference between float and int calculations it's way to slow to be usable
99% of the time are lookups in a map and the other 1% is iterating through a linked list.
There's no complex math that the gpu has to do, so there's no gain there
One of the reasons is that gpu threads (in nvidia at least) only execute the same line of code at the same time:
| Code: |
thread 1 executes line 10
thread 2's if statement path didn't lead to executing line 10, so it waits till thread 1 has reached line 45
thread 3's if statement path didn't lead to executing line 10, so it waits till thread 1 has reached line 45
thread 4 executes line 10
...
|
And since the pointerscanner has a lot of loops and iterations based on a positive result of a map lookup or not, this basically reduces the number of threads actively running to 1
Another big problem is that each thread may not run longer than 2 seconds. But the pointerscanner is designed to let a thread run for as long as it needs fetching work commands from a queue if it runs out, and add to that same queue if possible
But if you feel like experimenting or changing it to opencl you can give it a shot. (The pointerscanner lookup method is now basically ported to C)
http://cheatengine.org/temp/test.PTR.scandata is the scandata file I used for testing
_________________
Do not ask me about online cheats. I don't know any and wont help finding them.
Like my help? Join me on Patreon so i can keep helping |
|
| Back to top |
|
 |
mgr.inz.Player I post too much
Reputation: 222
Joined: 07 Nov 2008 Posts: 4438 Location: W kraju nad Wisla. UTC+01:00
|
Posted: Fri Dec 13, 2013 9:03 am Post subject: |
|
|
I tested few implementations of par2cmdline tool.
(tool to apply the data-recovery capability concepts of RAID-like systems to the multi-part archives)
So, there is original project - http://sourceforge.net/projects/parchive/ - par2cmdline-0.4-x86-win32
And other builds:
par2cmdline 0.4 with Intel Threading Building Blocks 2.2 - http://chuchusoft.com/par2_tbb/download.html
par2cmdline 0.4 with Intel Threading Building Blocks 2.2 + CUDA - http://chuchusoft.com/par2_tbb/download.html#gpu_version
TBB version is significantly faster than original par2cmdline.
TBB+CUDA is slightly faster than TBB (on my 8800 GS).
Results depends on source blocks count, repair blocks count, and dataset size.
_________________
|
|
| Back to top |
|
 |
zm0d Master Cheater
Reputation: 7
Joined: 06 Nov 2013 Posts: 423
|
Posted: Fri Dec 13, 2013 9:20 am Post subject: |
|
|
| Dark Byte wrote: | | From what i've read gpu computing can not access cpu memory. |
DMA should do the trick, shouldn't it?
http://www.techterms.com/definition/dma
|
|
| Back to top |
|
 |
Dark Byte Site Admin
Reputation: 470
Joined: 09 May 2003 Posts: 25807 Location: The netherlands
|
Posted: Fri Dec 13, 2013 9:52 am Post subject: |
|
|
The cuda implementation of par2cmdline is only for xor'ing a block of data without any conditional checks
So yeah, gpu computing is most likely not suitable for a pointerscan
(On a sidenote, they use atomicXor in a __global__ function. Funny thing is that atomic functions do not work for some reason in those functions, only in __device__ functions. You won't get any errors during compile time, but i can guarantee it's not going to be atomic.
I wasted 8 hours on this myself trying to figure out why my data kept getting corrupted)
_________________
Do not ask me about online cheats. I don't know any and wont help finding them.
Like my help? Join me on Patreon so i can keep helping |
|
| Back to top |
|
 |
|