Tuesday, December 13, 2011

Summary III: GPUs and Chess

GPU as Coprocessor
Because of the communication latency between Host and GPU a no go,
maybe AMDs and Intels new APUs are more decent for this but afaik there are not many gpu cores on die.

Multi-Thread One Board Idea
I've tested different approaches for an architecture where multible threads work simultan on one chess position, but the peformance loss by workless threads was too high.

One Thread One Board Idea
Pro: Has the best occupancy of threads
Con: Limited Registers Memory per Thread causes a low Warp/Wavefront occupancy.

Board Presentation
My current solution is to use Quad-BitBoards, they use 32 Bytes for one Chess Position plus 8 Bytes additional Information like Castle Rights, so i can run 4*32 threads on one SIMD Unit of an NV GTS250.

Move Generation
QuadBitboards with an Magic BitBoard Move Generator are 64 bit based and perform better than an 32 bit 0x88 Move Generator with nested loops.

Parallel Search Algorithm
A Three-Tier Solution is needed,

1) for all Threads/work-items in one Warp/Wavefront on one SIMD Unit
2) for different Warps/Wavefront running on one SIMD Unit.
3) for distributing work across SIMD Units.


Outlook
- Board Presentation, solved
- Move Generation, solved
- Brute Force Negamax Algorithm for one SIMD Unit, solved

missing is a Parallel AlphaBeta Pruning Solution inside and across SIMD Units.


0 Kommentare:

Post a Comment