<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-1356846300156443183</id><updated>2012-01-05T01:21:05.906+01:00</updated><category term='SIMD SPMD GPU OpenCL'/><category term='GPU Chess global local memory'/><category term='CUDA CHESS'/><category term='NVIDIA GTX480 TESLA M2050 M2070 ATI 5870'/><category term='GPU Chess'/><category term='8800 GT G92 ATI 5770 Chess OpenCL'/><title type='text'>Zeta OpenCL Chess</title><subtitle type='html'>This blog is about an chess engine written in OpenCL, a programming language suited for GPUs -&amp;gt; GPU Chess.</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><link rel='next' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default?start-index=101&amp;max-results=100'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>104</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-4532138036970022574</id><published>2011-12-22T16:53:00.006+01:00</published><updated>2011-12-22T17:30:45.672+01:00</updated><title type='text'>AMD HD 7970 released</title><content type='html'>AMD released the new top model HD 7970,&lt;br /&gt;price will be about 500 Euros.&lt;br /&gt;&lt;br /&gt;The new GCN architecture works now with 16 wide SIMD Units, with 4 SIMD Units on one Compute Unit.&lt;br /&gt;&lt;br /&gt;Another point is the 384 bit Memory Interface with a bandwidth of 264GB/s!&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Data&lt;/span&gt;&lt;br /&gt;Cores: 2048&lt;br /&gt;Compute Units: 32&lt;br /&gt;SIMD Units: 32x4&lt;br /&gt;GPU Clock: 925 MHz&lt;br /&gt;GFLOP/s SP: 3790&lt;br /&gt;GFLOP/s DP: 947&lt;br /&gt;Memory Interface: 384 bit&lt;br /&gt;Memory Clock: 1375MHz (GDDR5)&lt;br /&gt;Memory Bandwidth: 264GB/s&lt;br /&gt;private memory/SIMD Unit: 64 KB&lt;br /&gt;local memory/Compute Unit Unit: 64 KB&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-4532138036970022574?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/4532138036970022574/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2011/12/amd-hd-7970-released.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/4532138036970022574'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/4532138036970022574'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2011/12/amd-hd-7970-released.html' title='AMD HD 7970 released'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-3943559756782364147</id><published>2011-12-20T00:10:00.010+01:00</published><updated>2011-12-20T00:30:26.456+01:00</updated><title type='text'>Alternatives to pure AlphaBeta</title><content type='html'>&lt;span style="font-weight: bold;"&gt;Monte Carlo Simulations with UCT&lt;/span&gt;&lt;br /&gt;Monte Carlo Simulations with the UCT optimization work well for the Game of GO, but, as reported, not for Chess. Maybe with the computing power of an GPU there is a chance for it?&lt;br /&gt;An MC-Approach runs thousands of random games and decides by the overall win/loss counters if a position is weak or strong.&lt;br /&gt;UCT adds some heuristics to the randomness of game selection.&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;&lt;br /&gt;Nagging with SPAM&lt;/span&gt;&lt;br /&gt;Naggers support the master process by searching, with a narrowed alpha-beta window, nodes from a tree to get updated AlphaBeta Values for the master process.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Pure Negamax or MiniMax&lt;/span&gt;&lt;br /&gt;Has to search 100x more nodes than a serial performed AlphaBeta with good move ordering.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-3943559756782364147?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/3943559756782364147/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2011/12/alternatives-to-pure-alphabeta.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/3943559756782364147'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/3943559756782364147'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2011/12/alternatives-to-pure-alphabeta.html' title='Alternatives to pure AlphaBeta'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-9202155718270300960</id><published>2011-12-19T17:20:00.003+01:00</published><updated>2011-12-21T00:31:38.569+01:00</updated><title type='text'>Zeta 094x - new stack design</title><content type='html'>i am going to implement a spinlocked, two way linked list as a stack in the new version.&lt;br /&gt;&lt;br /&gt;Hope this will boost things up, because changing some pointers should be faster than to copy hole boards.&lt;br /&gt;&lt;br /&gt;###edit###&lt;br /&gt;&lt;br /&gt;i will switch first with the 0933 design to a pointer based copy process and see how this will peform...&lt;br /&gt;&lt;br /&gt;--&lt;br /&gt;Srdja&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-9202155718270300960?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/9202155718270300960/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2011/12/zeta-094x-new-stack-design.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/9202155718270300960'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/9202155718270300960'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2011/12/zeta-094x-new-stack-design.html' title='Zeta 094x - new stack design'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-7109608795201466958</id><published>2011-12-19T17:18:00.004+01:00</published><updated>2011-12-20T00:06:33.444+01:00</updated><title type='text'>Zeta 093x - lifo copy kills performance</title><content type='html'>i 've tried to implement a shared lifo stack across simd units in 0933 and 0934, but the copy process kills the performance....&lt;br /&gt;&lt;br /&gt;--&lt;br /&gt;Srdja&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-7109608795201466958?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/7109608795201466958/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2011/12/zeta-093x-lifo-copy-kills-performance.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/7109608795201466958'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/7109608795201466958'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2011/12/zeta-093x-lifo-copy-kills-performance.html' title='Zeta 093x - lifo copy kills performance'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-36599513600133278</id><published>2011-12-17T21:54:00.005+01:00</published><updated>2011-12-17T22:28:10.842+01:00</updated><title type='text'>New AMD GPUs in January?</title><content type='html'>Regarding to some hardware-sites we will see new AMD GPUs in January, the HD 7000 series.&lt;br /&gt;The top model, 7970, is suspected to hold 2048 stream processors on board, spread on 32 compute units. The HD 6970 has 1536 stream cores on 24 compute units.&lt;br /&gt;&lt;br /&gt;According to AMD white papers each compute unit will be able to compute 10 Warps/Wavefronts with 64 threads and will have 64 KB Registers and 64 KB shared/local memory.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;--&lt;br /&gt;Srdja&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-36599513600133278?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/36599513600133278/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2011/12/new-amd-gpus-in-january.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/36599513600133278'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/36599513600133278'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2011/12/new-amd-gpus-in-january.html' title='New AMD GPUs in January?'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-3134421364170036943</id><published>2011-12-13T20:05:00.005+01:00</published><updated>2011-12-17T21:53:12.393+01:00</updated><title type='text'>Summary III: GPUs and Chess</title><content type='html'>&lt;span style="font-weight: bold;"&gt;GPU as Coprocessor&lt;/span&gt;&lt;br /&gt;Because of the communication latency between Host and GPU a no go,&lt;br /&gt;maybe AMDs and Intels new APUs are more decent for this but afaik there are not many gpu cores on die.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Multi-Thread One Board Idea&lt;br /&gt;&lt;/span&gt;I've tested different approaches for an architecture where multible threads work simultan on one chess position, but the peformance loss by workless threads was too high.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;One Thread One Board Idea&lt;/span&gt;&lt;br /&gt;Pro: Has the best occupancy of threads&lt;br /&gt;Con: Limited Registers Memory per Thread causes a low Warp/Wavefront occupancy.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Board Presentation&lt;br /&gt;&lt;/span&gt;My current solution is to use Quad-BitBoards, they use 32 Bytes for one Chess Position plus 8 Bytes additional Information like Castle Rights, so i can run 4*32 threads on one SIMD Unit of an NV GTS250.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Move Generation&lt;/span&gt;&lt;br /&gt;QuadBitboards with an Magic BitBoard Move Generator are 64 bit based and perform better than an 32 bit 0x88 Move Generator with nested loops.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Parallel Search Algorithm&lt;br /&gt;&lt;/span&gt;A Three-Tier Solution is needed,&lt;br /&gt;&lt;br /&gt;1) for all Threads/work-items in one Warp/Wavefront on one SIMD Unit&lt;br /&gt;2) for different Warps/Wavefront running on one SIMD Unit.&lt;br /&gt;3) for distributing work across SIMD Units.&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-weight: bold;"&gt;Outlook&lt;/span&gt;&lt;br /&gt;- Board Presentation, solved&lt;br /&gt;- Move Generation, solved&lt;br /&gt;- Brute Force Negamax Algorithm for one SIMD Unit, solved&lt;br /&gt;&lt;br /&gt;missing is a Parallel &lt;span style="font-weight: bold;"&gt;AlphaBeta Pruning&lt;/span&gt; Solution inside and across SIMD Units.&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-3134421364170036943?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/3134421364170036943/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2011/12/summary-iii-gpus-and-chess.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/3134421364170036943'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/3134421364170036943'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2011/12/summary-iii-gpus-and-chess.html' title='Summary III: GPUs and Chess'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-6580987933123127469</id><published>2011-12-05T15:37:00.005+01:00</published><updated>2011-12-13T15:28:40.792+01:00</updated><title type='text'>32 bit move generator again?</title><content type='html'>I am not satisfied with the single thread performance, maybe i should switch back from BitBoards to an 0x88 move generator because GPUs are 32 bit optimized....not sure how the nested loops will perform on a simd unit...&lt;br /&gt;&lt;br /&gt;### edit ###&lt;br /&gt;&lt;br /&gt;ahh, i will keep the QuadBitBoards, they consume only 32 bytes of memory so i can run multible warps/wavefronts on  one SIMD Unit.&lt;br /&gt;&lt;br /&gt;### edit ###&lt;br /&gt;I tested a 32 bit 0x88 move generator, but it does not perform as quick as the Magic Bitboard move generator....i guess the nested loops are not very SIMD friendly.&lt;br /&gt;&lt;br /&gt;--&lt;br /&gt;Srdja&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-6580987933123127469?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/6580987933123127469/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2011/12/zeta-0934-32-bit-move-generator-again.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/6580987933123127469'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/6580987933123127469'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2011/12/zeta-0934-32-bit-move-generator-again.html' title='32 bit move generator again?'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-8206802549380163983</id><published>2011-12-05T15:33:00.003+01:00</published><updated>2011-12-06T19:52:37.035+01:00</updated><title type='text'>Zeta 0934 - One Lifo Stack One Thread?</title><content type='html'>A shared LIFO-Stack doesn't perform well with AlphaBeta Pruning, so maybe i can use one LIFO-Stack per thread and distribute somehow work accross threads.&lt;br /&gt;&lt;br /&gt;Zeta 0934, search depth 4, good ab pruning with move ordering, one thread:&lt;br /&gt;&lt;br /&gt;nodes: 1922 ,movecount: 11492, bestmove: 17900673 ,sec: 0.600000&lt;br /&gt;nodes: 3055 ,movecount: 18177, bestmove: 17913158 ,sec: 0.750000&lt;br /&gt;nodes: 11336 ,movecount: 60309, bestmove: 9008591 ,sec: 2.790000&lt;br /&gt;nodes: 7692 ,movecount: 40912, bestmove: 17963090 ,sec: 2.780000&lt;br /&gt;nodes: 5075 ,movecount: 29170, bestmove: 17892385 ,sec: 1.370000&lt;br /&gt;nodes: 14323 ,movecount: 66261, bestmove: 17983893 ,sec: 3.190000&lt;br /&gt;nodes: 4775 ,movecount: 37989, bestmove: 648577536 ,sec: 2.860000&lt;br /&gt;nodes: 3471 ,movecount: 23288, bestmove: 219372902 ,sec: 1.120000&lt;br /&gt;nodes: 9960 ,movecount: 37534, bestmove: 44564488 ,sec: 1.980000&lt;br /&gt;nodes: 19522 ,movecount: 69898, bestmove: 17829968 ,sec: 4.000000&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-8206802549380163983?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/8206802549380163983/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2011/12/zeta-0933-one-lifo-stack-one-thread.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/8206802549380163983'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/8206802549380163983'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2011/12/zeta-0933-one-lifo-stack-one-thread.html' title='Zeta 0934 - One Lifo Stack One Thread?'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-8721460301656567361</id><published>2011-12-04T17:13:00.004+01:00</published><updated>2011-12-06T19:52:09.128+01:00</updated><title type='text'>Zeta 0933 - Lifo Stack with bad AlphaBeta performance</title><content type='html'>Negamax worked fine, but AlphaBeta Pruning performs bad.&lt;br /&gt;&lt;br /&gt;Because i don't use a linked list as LIFO Stack i am not able to delete all Boards pruned by a Beta Cutoff, i have to visit every Board and then do the Cutoff again.&lt;br /&gt;&lt;br /&gt;Zeta 0933, search depth 4, bad ab pruning, 128 threads on one simd unit:&lt;br /&gt;&lt;br /&gt;nodes: 76869 ,movecount: 76869, bestmove: 17900673 ,sec: 0.330000&lt;br /&gt;nodes: 97599 ,movecount: 97599, bestmove: 17913158 ,sec: 0.280000&lt;br /&gt;nodes: 209102 ,movecount: 209102, bestmove: 9008591 ,sec: 0.560000&lt;br /&gt;nodes: 206391 ,movecount: 206391, bestmove: 17963090 ,sec: 0.560000&lt;br /&gt;nodes: 187838 ,movecount: 187838, bestmove: 17892385 ,sec: 0.500000&lt;br /&gt;nodes: 276374 ,movecount: 276374, bestmove: 17983893 ,sec: 0.760000&lt;br /&gt;nodes: 142551 ,movecount: 142551, bestmove: 648577536 ,sec: 0.400000&lt;br /&gt;nodes: 173269 ,movecount: 173269, bestmove: 219372902 ,sec: 0.460000&lt;br /&gt;nodes: 229713 ,movecount: 229713, bestmove: 44564488 ,sec: 0.640000&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;--&lt;br /&gt;Srdja&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-8721460301656567361?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/8721460301656567361/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2011/12/zeta-0932-lifo-stack-with-bad-alphabeta.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/8721460301656567361'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/8721460301656567361'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2011/12/zeta-0932-lifo-stack-with-bad-alphabeta.html' title='Zeta 0933 - Lifo Stack with bad AlphaBeta performance'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-7026461966788027350</id><published>2011-12-04T02:17:00.003+01:00</published><updated>2011-12-06T19:48:52.124+01:00</updated><title type='text'>Zeta 0932 - Lifo Stack with negamax scores</title><content type='html'>Got a LIFO Stack based processing within one SIMD Unit with negamax score running. Next topic is alpha beta pruning.&lt;br /&gt;&lt;br /&gt;Zeta 0932, search depth 4, no ab pruning, 128 threads on one simd unit:&lt;br /&gt;&lt;br /&gt;nodes: 206603 ,movecount: 206603, bestmove: 17900673 ,sec: 0.640000&lt;br /&gt;nodes: 286578 ,movecount: 286578, bestmove: 17913158 ,sec: 0.710000&lt;br /&gt;nodes: 554825 ,movecount: 554825, bestmove: 9008591 ,sec: 2.170000&lt;br /&gt;nodes: 575267 ,movecount: 575267, bestmove: 17963090 ,sec: 3.380000&lt;br /&gt;nodes: 632190 ,movecount: 632190, bestmove: 17892385 ,sec: 1.640000&lt;br /&gt;nodes: 620515 ,movecount: 620515, bestmove: 17983893 ,sec: 1.630000&lt;br /&gt;nodes: 636666 ,movecount: 636666, bestmove: 648577536 ,sec: 1.640000&lt;br /&gt;nodes: 912485 ,movecount: 912485, bestmove: 219372902 ,sec: 2.420000&lt;br /&gt;nodes: 673506 ,movecount: 673506, bestmove: 44564488 ,sec: 1.790000&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;--&lt;br /&gt;Srdja&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-7026461966788027350?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/7026461966788027350/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2011/12/zeta-0932-lifo-stack-with-negamax.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/7026461966788027350'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/7026461966788027350'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2011/12/zeta-0932-lifo-stack-with-negamax.html' title='Zeta 0932 - Lifo Stack with negamax scores'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-5537165508856903073</id><published>2011-12-03T18:47:00.002+01:00</published><updated>2011-12-03T18:53:34.969+01:00</updated><title type='text'>LIFO Stack and AlphaBeta Values</title><content type='html'>i managed to distribute work within an SIMD Unit by use of an LIFO-Stack,&lt;br /&gt;it looks faster than any previous solution,&lt;br /&gt;&lt;br /&gt;but i don't have an idea how to handle the AlphaBeta Values,&lt;br /&gt;a spinlocked, linked list is a nogo because of the SIMD nature of the architecture.&lt;br /&gt;&lt;br /&gt;--&lt;br /&gt;Srdja&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-5537165508856903073?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/5537165508856903073/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2011/12/lifo-stack-and-alphabeta-values.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/5537165508856903073'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/5537165508856903073'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2011/12/lifo-stack-and-alphabeta-values.html' title='LIFO Stack and AlphaBeta Values'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-7785438609394216172</id><published>2011-11-30T12:01:00.003+01:00</published><updated>2011-11-30T12:05:24.135+01:00</updated><title type='text'>LIFO Stack based Parallel Processing?</title><content type='html'>i am expirementing with one LIFO-Stack per Thread,&lt;br /&gt;so every thread works on his own Stack and workless Threads look in other Stacks for work to do.&lt;br /&gt;&lt;br /&gt;--&lt;br /&gt;Srdja&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-7785438609394216172?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/7785438609394216172/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2011/11/lifo-stack-based-parallel-processing.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/7785438609394216172'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/7785438609394216172'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2011/11/lifo-stack-based-parallel-processing.html' title='LIFO Stack based Parallel Processing?'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-3260372834242783278</id><published>2011-09-24T13:15:00.003+02:00</published><updated>2011-09-24T18:41:28.388+02:00</updated><title type='text'>Zeta 0.920 - One Thread one Board</title><content type='html'>&lt;span style="font-weight: bold;"&gt;Source:&lt;/span&gt;&lt;br /&gt;Nvidia&lt;br /&gt;&lt;a href="https://github.com/smatovic/Zeta/tree/zeta_nvidia_0919"&gt;https://github.com/smatovic/Zeta/tree/zeta_nvidia_0920&lt;/a&gt;&lt;br /&gt;AMD&lt;br /&gt;&lt;a href="https://github.com/smatovic/Zeta/tree/zeta_amd_0920"&gt;https://github.com/smatovic/Zeta/tree/zeta_amd_0920&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Changes:&lt;/span&gt;&lt;br /&gt;- one thread handles one board&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Still Missing:&lt;/span&gt;&lt;br /&gt;- castle moves&lt;br /&gt;- en passant moves&lt;br /&gt;- checkmate detection&lt;br /&gt;- Quiscence Search&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Zeta 0.920 uses one thread to compute the tree of one chess position.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Performance:&lt;/span&gt;&lt;br /&gt;This design makes about 10 000 nps per Thread, with currently only one thread running.&lt;br /&gt;&lt;br /&gt;Search depth 5 with AlphaBetaPruning and MoveOrdering:&lt;br /&gt;&lt;br /&gt;nodes: 16875 ,movecount: 73700,sec: 1.820000&lt;br /&gt;nodes: 54307 ,movecount: 260524,sec: 4.400000&lt;br /&gt;nodes: 97125 ,movecount: 624798, sec: 8.410000&lt;br /&gt;nodes: 113395 ,movecount: 1038605, sec: 10.530000&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Outlook:&lt;/span&gt;&lt;br /&gt;Next topic would be to build a load-balancing mechanism.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;--&lt;br /&gt;Srdja&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-3260372834242783278?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/3260372834242783278/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2011/09/zeta-0920-one-thread-one-board.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/3260372834242783278'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/3260372834242783278'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2011/09/zeta-0920-one-thread-one-board.html' title='Zeta 0.920 - One Thread one Board'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-3169360758592517579</id><published>2011-09-23T14:29:00.003+02:00</published><updated>2011-09-23T14:35:31.311+02:00</updated><title type='text'>Zeta 0.920 - One Thread one Board again</title><content type='html'>The "One SIMD Unit One Board" Idea,  is propably a dead end.&lt;br /&gt;&lt;br /&gt;Too many idle threads, and a lausy Quiscence Search performance.&lt;br /&gt;&lt;br /&gt;So i will try the "One Thread One Board" Idea again.&lt;br /&gt;&lt;br /&gt;--&lt;br /&gt;Srdja&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-3169360758592517579?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/3169360758592517579/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2011/09/zeta-0920-one-thread-one-board-again.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/3169360758592517579'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/3169360758592517579'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2011/09/zeta-0920-one-thread-one-board-again.html' title='Zeta 0.920 - One Thread one Board again'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-2890275004792958966</id><published>2011-09-22T11:57:00.004+02:00</published><updated>2011-09-22T12:14:36.864+02:00</updated><title type='text'>Zeta 0.919 - Developer Release update</title><content type='html'>&lt;span style="font-weight: bold;"&gt;Source:&lt;/span&gt;&lt;br /&gt;&lt;a href="https://github.com/smatovic/Zeta/tree/zeta_nvidia_0919"&gt;https://github.com/smatovic/Zeta/tree/zeta_nvidia_0919&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Changes:&lt;/span&gt;&lt;br /&gt;- code cleanup&lt;br /&gt;- root search move generation now on GPU.&lt;br /&gt;- added buttom up heap sort for move ordering&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Still Missing:&lt;/span&gt;&lt;br /&gt;- castle moves&lt;br /&gt;- en passant moves&lt;br /&gt;- checkmate detection&lt;br /&gt;- Quiscence Search&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Zeta 0.919 uses one SIMD Unit with 128 Threads to compute one Chess Position in Parallel.&lt;br /&gt;It assumes that there are not more than 128 possible moves from one position.&lt;br /&gt;This parallel processing i call "SPPS - a Simple Parallel Processing Scheme".&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Performance:&lt;/span&gt;&lt;br /&gt;SPPS is quite slow....here some numbers with alphabeta pruning, move ordering and search depth 4 on a NV GTS 250:&lt;br /&gt;&lt;br /&gt;nodes: 10209 ,movecount: 28354,  sec: 0.290000&lt;br /&gt;nodes: 16262 ,movecount: 70892,  sec: 0.250000&lt;br /&gt;nodes: 52921 ,movecount: 241981, sec: 0.670000&lt;br /&gt;nodes: 35852 ,movecount: 379419, sec: 0.560000&lt;br /&gt;nodes: 25582 ,movecount: 489738, sec: 0.450000&lt;br /&gt;nodes: 57894 ,movecount: 721321, sec: 0.850000&lt;br /&gt;nodes: 35494 ,movecount: 795703, sec: 0.350000&lt;br /&gt;nodes: 20580 ,movecount: 874357, sec: 0.330000&lt;br /&gt;nodes: 32805 ,movecount: 1033972,sec: 1.540000&lt;br /&gt;nodes: 61802 ,movecount: 1404020,sec: 3.570000&lt;br /&gt;nodes: 68647 ,movecount: 1800534,sec: 1.210000&lt;br /&gt;nodes: 83326 ,movecount: 2325336,sec: 2.870000&lt;br /&gt;nodes: 21810 ,movecount: 2559152,sec: 0.700000&lt;br /&gt;nodes: 49937 ,movecount: 2730154,sec: 2.020000&lt;br /&gt;nodes: 87208 ,movecount: 3233045,sec: 1.500000&lt;br /&gt;nodes: 38732 ,movecount: 3577365,sec: 1.050000&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Outlook:&lt;/span&gt;&lt;br /&gt;Next topic would be to build a load-balancing mechanism across SIMD Units.&lt;br /&gt;&lt;br /&gt;--&lt;br /&gt;Srdja&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-2890275004792958966?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/2890275004792958966/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2011/09/zeta-0919-developer-release-update.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/2890275004792958966'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/2890275004792958966'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2011/09/zeta-0919-developer-release-update.html' title='Zeta 0.919 - Developer Release update'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-5723267596686725300</id><published>2011-09-21T15:51:00.002+02:00</published><updated>2011-09-21T17:13:32.989+02:00</updated><title type='text'>Zeta 0.919 - heapsort</title><content type='html'>heapsort is faster but not that much,&lt;br /&gt;i think because of the simd structure it is more efficient to sort in parallel with 128 threads than every thread on its own...will try some parallel sorting mechanism.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;##edit##&lt;br /&gt;parallel sorting across global memory is a nogo because of unsynched reads/writes.&lt;br /&gt;&lt;br /&gt;--&lt;br /&gt;Srdja&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-5723267596686725300?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/5723267596686725300/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2011/09/zeta-0919-heapsort.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/5723267596686725300'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/5723267596686725300'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2011/09/zeta-0919-heapsort.html' title='Zeta 0.919 - heapsort'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-7830719355727547754</id><published>2011-09-20T19:53:00.004+02:00</published><updated>2011-09-20T19:58:05.209+02:00</updated><title type='text'>Zeta 0.919 - bad bubblesort?</title><content type='html'>hmm,&lt;br /&gt;&lt;br /&gt;Zeta with move ordering has less nodes but needs ins some cases more time than zeta with alphabeta pruning only,&lt;br /&gt;propably bubblesort performs too many global memory reads and writes...will try quicksort.&lt;br /&gt;&lt;br /&gt;--&lt;br /&gt;Srdja&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-7830719355727547754?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/7830719355727547754/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2011/09/zeta-0919-bad-bubblesort.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/7830719355727547754'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/7830719355727547754'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2011/09/zeta-0919-bad-bubblesort.html' title='Zeta 0.919 - bad bubblesort?'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-954256659126595925</id><published>2011-09-18T22:18:00.003+02:00</published><updated>2011-09-18T22:22:48.196+02:00</updated><title type='text'>Zeta 0.918 - Developer Release</title><content type='html'>Second Developer Release:&lt;br /&gt;&lt;br /&gt;&lt;a href="https://github.com/smatovic/Zeta/tree/zeta_nvidia_0918"&gt;https://github.com/smatovic/Zeta/tree/zeta_nvidia_0918&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Changes:&lt;br /&gt;- code cleanup&lt;br /&gt;- Added Alphabeta Pruning&lt;br /&gt;- Added Move Ordering&lt;br /&gt;&lt;br /&gt;Runs still on one SIMD Unit with 128 threads only.&lt;br /&gt;It is time for an Load-Balancing mechanism across SIMD Units.&lt;br /&gt;&lt;br /&gt;--&lt;br /&gt;Srdja&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-954256659126595925?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/954256659126595925/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2011/09/zeta-0918-developer-release.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/954256659126595925'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/954256659126595925'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2011/09/zeta-0918-developer-release.html' title='Zeta 0.918 - Developer Release'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-2947752981571642056</id><published>2011-09-18T22:06:00.003+02:00</published><updated>2011-09-18T22:08:38.524+02:00</updated><title type='text'>"One SIMD Unit One Board" vs. "One Thread One Board"</title><content type='html'>have to make a decision which design i want to use....&lt;br /&gt;&lt;br /&gt;i think it is easier to handle tens of SIMD Units than thousands of threads.&lt;br /&gt;&lt;br /&gt;--&lt;br /&gt;Srdja&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-2947752981571642056?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/2947752981571642056/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2011/09/one-simd-unit-one-board-vs-one-thread.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/2947752981571642056'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/2947752981571642056'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2011/09/one-simd-unit-one-board-vs-one-thread.html' title='&quot;One SIMD Unit One Board&quot; vs. &quot;One Thread One Board&quot;'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-8236081267452794346</id><published>2011-09-18T21:15:00.003+02:00</published><updated>2011-09-18T21:26:39.932+02:00</updated><title type='text'>SPPS with AlphaBeta Pruning</title><content type='html'>I ve implemented AlphaBeta Pruning in the SPPS Search which works like "One SIMD Unit One Board".&lt;br /&gt;&lt;br /&gt;I thought i could give each SIMD Unit of the GPU one of the moves of the first search depth to compute. But i had to realize that this is a stupid idea. &lt;br /&gt;AlphaBeta Pruning does not perform in this kind of way, parallization has to be done where good moves are suspected and not at the root.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;--&lt;br /&gt;Srdja&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-8236081267452794346?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/8236081267452794346/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2011/09/spps-with-alphabeta-pruning.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/8236081267452794346'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/8236081267452794346'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2011/09/spps-with-alphabeta-pruning.html' title='SPPS with AlphaBeta Pruning'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-5474743411542182238</id><published>2011-09-17T22:07:00.006+02:00</published><updated>2011-09-17T22:19:23.179+02:00</updated><title type='text'>Zeta 0.917 - AlphaBeta Pruning with Move Ordering</title><content type='html'>AlphaBeta with Move Ordering works fine on the GPU:&lt;br /&gt;&lt;br /&gt;One Thread, search depth 4, without AlphaBeta Pruning:&lt;br /&gt;nodes: 197281, sec: 9.720000&lt;br /&gt;&lt;br /&gt;One Thread, search depth 4, with AlphaBeta Pruning:&lt;br /&gt;nodes: 28464, sec: 3.340000&lt;br /&gt;&lt;br /&gt;One Thread, search depth 4, with AlphaBeta Pruning and Move Ordering:&lt;br /&gt;nodes: 9636 ,sec: 1.350000&lt;br /&gt;&lt;br /&gt;--&lt;br /&gt;Srdja&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-5474743411542182238?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/5474743411542182238/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2011/09/zeta-0917-alphabeta-pruning-with-move.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/5474743411542182238'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/5474743411542182238'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2011/09/zeta-0917-alphabeta-pruning-with-move.html' title='Zeta 0.917 - AlphaBeta Pruning with Move Ordering'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-6512590381090664466</id><published>2011-09-17T20:24:00.004+02:00</published><updated>2011-09-17T20:34:25.660+02:00</updated><title type='text'>SPAM - Scalable Parallel Alpha-Beta Minimax</title><content type='html'>an alternative parallel search algorithm based on "Nagging" by Alberto Maria Segre and others.&lt;br /&gt;&lt;br /&gt;maybe this is what i was looking for...&lt;br /&gt;&lt;br /&gt;--&lt;br /&gt;Srdja&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-6512590381090664466?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/6512590381090664466/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2011/09/spam-scalable-parallel-alpha-beta.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/6512590381090664466'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/6512590381090664466'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2011/09/spam-scalable-parallel-alpha-beta.html' title='SPAM - Scalable Parallel Alpha-Beta Minimax'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-8435561441043924048</id><published>2011-09-16T17:41:00.004+02:00</published><updated>2011-09-16T21:54:59.189+02:00</updated><title type='text'>Zeta 0.917 - AlphaBeta Pruning is running</title><content type='html'>:-)&lt;br /&gt;&lt;br /&gt;x5-x10 Speedup, will take a look how MVV-LVA move ordering will perform.&lt;br /&gt;&lt;br /&gt;--&lt;br /&gt;Srdja&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-8435561441043924048?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/8435561441043924048/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2011/09/zeta-0917-alphabeta-pruning-is-running.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/8435561441043924048'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/8435561441043924048'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2011/09/zeta-0917-alphabeta-pruning-is-running.html' title='Zeta 0.917 - AlphaBeta Pruning is running'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-2645291856973630960</id><published>2011-09-15T14:34:00.002+02:00</published><updated>2011-09-15T15:37:54.746+02:00</updated><title type='text'>Zeta 0.916 - redesign looks good</title><content type='html'>working on a redesign, &lt;br /&gt;so every thread handles one board -&gt; "one thread one board", &lt;br /&gt;looks good.&lt;br /&gt;&lt;br /&gt;Next Topics:&lt;br /&gt;- AlphaBeta Pruning&lt;br /&gt;- Special Moves&lt;br /&gt;- Incremental Eval&lt;br /&gt;- feed the threads more efficient with boards&lt;br /&gt;&lt;br /&gt;--&lt;br /&gt;Srdja&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-2645291856973630960?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/2645291856973630960/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2011/09/zeta-0916-new-approach-looks-good.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/2645291856973630960'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/2645291856973630960'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2011/09/zeta-0916-new-approach-looks-good.html' title='Zeta 0.916 - redesign looks good'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-480309505566520161</id><published>2011-09-13T18:45:00.003+02:00</published><updated>2011-09-13T18:58:29.959+02:00</updated><title type='text'>Zeta 0.916 - SPPS over multible SIMD Units</title><content type='html'>Working on an extension of the SPPS Algorithm to work across multible SIMD Units.&lt;br /&gt;&lt;br /&gt;Currently only one SIMD Unit of a GPU is used, a GPU has tens of those SIMD Units.&lt;br /&gt;&lt;br /&gt;Have to consider Memory Usage, would be nice to get AlphaBeta-Pruning and Quiscence-Search directly implemented.&lt;br /&gt;&lt;br /&gt;--&lt;br /&gt;Srdja&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-480309505566520161?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/480309505566520161/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2011/09/zeta-dva-0916-spps-over-multible-simd.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/480309505566520161'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/480309505566520161'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2011/09/zeta-dva-0916-spps-over-multible-simd.html' title='Zeta 0.916 - SPPS over multible SIMD Units'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-4051028159470187239</id><published>2011-07-13T21:33:00.006+02:00</published><updated>2011-07-14T14:37:27.873+02:00</updated><title type='text'>Zeta 0.915 - First Developer Release</title><content type='html'>Heyho,&lt;br /&gt;&lt;br /&gt;today is the montenegrin Independence Day and i published the Source Code of Zeta under GPL:&lt;br /&gt;&lt;br /&gt;for Nvidia Devices:&lt;span style="text-decoration: underline;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;a href="https://github.com/smatovic/Zeta/tree/zeta_nvidia_0915"&gt; https://github.com/smatovic/Zeta/tree/zeta_nvidia_0915&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;and for AMD GPUs or Intel/AMD CPUs&lt;br /&gt;&lt;a href="https://github.com/smatovic/Zeta/tree/zeta_amd_0915"&gt;https://github.com/smatovic/Zeta/tree/zeta_amd_0915&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Zeta is more a prototype than a real chess engine.  It uses only one SIMD Unit of a GPU with 128 Threads,&lt;br /&gt;this is enough to achieve about 100 000 NPS, but still far away from that what a GPU is really able to compute, it is like using only 1 of 16 Cores.&lt;br /&gt;&lt;br /&gt;Zeta is written in OpenCL, so it is able to run on GPUs, CPUs, APUs.&lt;br /&gt;&lt;br /&gt;Zeta plays weak and buggy, a lot of well known chess techniques are not implemented.&lt;br /&gt;&lt;br /&gt;So, whats inside?&lt;br /&gt;&lt;br /&gt;- SPPS - a Simple Parallel Processing Scheme&lt;br /&gt;- Quad-Bitboard Board Representation (Thanks to Gerd Isenberg, http://chessprogramming.wikispaces.com/Quad-Bitboards)&lt;br /&gt;- Magic Bitboard Move Generator (Thanks to the Stockfish Team)&lt;br /&gt;- Tablebased Evaluation (Thanks to Tomasz Michniewski, http://chessprogramming.wikispaces.com/Simplified+evaluation+function)&lt;br /&gt;&lt;br /&gt;What's missing?&lt;br /&gt;&lt;br /&gt;- Castle Moves&lt;br /&gt;- En Passant Moves&lt;br /&gt;- AlphaBeta Pruning&lt;br /&gt;- Qsearch&lt;br /&gt;- SMP over mutlible SIMD Untis&lt;br /&gt;- All other well known techniques&lt;br /&gt;&lt;br /&gt;Supported Platforms?&lt;br /&gt;Until now only Linux with AMD OpenCL SDK or NVIDIA OpenCl SDK. No binaries.&lt;br /&gt;CPUs (&amp;gt;=SSE3) will also work with AMD OpenCL SDK.&lt;br /&gt;&lt;br /&gt;So feel free to contribute in GPU Chess :)&lt;br /&gt;&lt;br /&gt;--&lt;br /&gt;Srdja Matovic&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-4051028159470187239?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/4051028159470187239/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2011/07/zeta-0915-first-developer-release.html#comment-form' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/4051028159470187239'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/4051028159470187239'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2011/07/zeta-0915-first-developer-release.html' title='Zeta 0.915 - First Developer Release'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-3677704642234628523</id><published>2011-07-09T07:42:00.005+02:00</published><updated>2011-07-09T07:46:39.176+02:00</updated><title type='text'>Beaten by GPU in Chess</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/-XJSjDttRyZg/Thfq4azQquI/AAAAAAAAADw/PGqZoNFplys/s1600/badplayervsZeta0910.png"&gt;&lt;img style="float:right; margin:0 0 10px 10px;cursor:pointer; cursor:hand;width: 255px; height: 320px;" src="http://3.bp.blogspot.com/-XJSjDttRyZg/Thfq4azQquI/AAAAAAAAADw/PGqZoNFplys/s320/badplayervsZeta0910.png" alt="" id="BLOGGER_PHOTO_ID_5627224514410621666" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Ok, i am a really bad player, but i got beaten by my GPU in chess.....&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-3677704642234628523?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/3677704642234628523/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2011/07/beaten-by-gpu-in-chess.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/3677704642234628523'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/3677704642234628523'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2011/07/beaten-by-gpu-in-chess.html' title='Beaten by GPU in Chess'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/-XJSjDttRyZg/Thfq4azQquI/AAAAAAAAADw/PGqZoNFplys/s72-c/badplayervsZeta0910.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-3574908792425099805</id><published>2011-07-07T23:29:00.002+02:00</published><updated>2011-07-07T23:31:35.213+02:00</updated><title type='text'>Zeta 0.912 - Fallback to SPPS</title><content type='html'>DPPS is fast and has a good scalability, but i was not able to build a score-structure into it. It is just good for performance testing....so back to SPPS with its 128 threads per chess position.&lt;br /&gt;&lt;br /&gt;--&lt;br /&gt;Srdja&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-3574908792425099805?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/3574908792425099805/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2011/07/zeta-0912-fallback-to-spps.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/3574908792425099805'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/3574908792425099805'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2011/07/zeta-0912-fallback-to-spps.html' title='Zeta 0.912 - Fallback to SPPS'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-8918610848689401524</id><published>2011-07-06T14:53:00.003+02:00</published><updated>2011-07-06T15:01:11.284+02:00</updated><title type='text'>Zeta 0.910  - First Release Candidate</title><content type='html'>I think i will finnish my work on DPPS, computing on one SIMD Unit only, and release the source.&lt;br /&gt;&lt;br /&gt;Maybe 13th July would be a nice release date, it is the montenegrin independence day :)&lt;br /&gt;&lt;br /&gt;Todo-List:&lt;br /&gt;&lt;div style="text-align: left;"&gt;&lt;ul&gt;&lt;li&gt; Castle Moves&lt;/li&gt;&lt;li&gt; En passant Moves&lt;/li&gt;&lt;li&gt; AlphaBeta Pruning&lt;/li&gt;&lt;li&gt; MVV-LVA Move ordering&lt;/li&gt;&lt;/ul&gt;&lt;/div&gt;&lt;br /&gt;After the release i can think about mutlible SIMD Unit with DPPS.&lt;br /&gt;&lt;br /&gt;--&lt;br /&gt;Srdja&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-8918610848689401524?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/8918610848689401524/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2011/07/zeta-0910-first-release-candidate.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/8918610848689401524'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/8918610848689401524'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2011/07/zeta-0910-first-release-candidate.html' title='Zeta 0.910  - First Release Candidate'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-5313985295972006290</id><published>2011-07-05T21:37:00.004+02:00</published><updated>2011-07-05T21:42:13.484+02:00</updated><title type='text'>Coprocessor idea is a no go</title><content type='html'>made some tests, &lt;br /&gt;100 GPU calls for move generation take about 1 second....&lt;br /&gt;&lt;br /&gt;--&lt;br /&gt;Srdja&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-5313985295972006290?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/5313985295972006290/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2011/07/coprocessor-idea-is-no-go.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/5313985295972006290'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/5313985295972006290'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2011/07/coprocessor-idea-is-no-go.html' title='Coprocessor idea is a no go'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-1353407837967911394</id><published>2011-07-04T22:11:00.007+02:00</published><updated>2011-07-04T22:35:15.746+02:00</updated><title type='text'>Some numbers</title><content type='html'>Here some numbers, the nps value is not compareable to that one from real chess engines...&lt;br /&gt;&lt;br /&gt;&lt;table&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt; SPPS, one SIMD Unit, 128 Threads:&lt;/td&gt;      &lt;td&gt; 500 Knps &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;/tr&gt;&lt;tr&gt;     &lt;td&gt; DPPS, one CPU 2 Ghz Core:&lt;/td&gt;          &lt;td&gt; 5000 Knps &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;/tr&gt;&lt;tr&gt;    &lt;td&gt; DPPS, one SIMD Unit  1 Thread &lt;/td&gt;          &lt;td&gt; 100 Knps &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;    &lt;td&gt; DPPS, one SIMD Unit  2 Threads &lt;/td&gt;          &lt;td&gt; 200 Knps &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;    &lt;td&gt; DPPS, one SIMD Unit  4 Threads &lt;/td&gt;          &lt;td&gt; 400 Knps &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;    &lt;td&gt; DPPS, one SIMD Unit  8 Threads &lt;/td&gt;          &lt;td&gt; 800 Knps &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;    &lt;td&gt; DPPS, one SIMD Unit  16 Threads &lt;/td&gt;          &lt;td&gt; 1000 Knps &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;    &lt;td&gt; DPPS, one SIMD Unit  32 Threads &lt;/td&gt;          &lt;td&gt; 1200 Knps &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;    &lt;td&gt; DPPS, one SIMD Unit  64 Threads &lt;/td&gt;          &lt;td&gt; 1200 Knps &lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-1353407837967911394?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/1353407837967911394/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2011/07/some-numbers.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/1353407837967911394'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/1353407837967911394'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2011/07/some-numbers.html' title='Some numbers'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-5323247883833374093</id><published>2011-07-04T20:34:00.003+02:00</published><updated>2011-07-04T20:42:01.521+02:00</updated><title type='text'>Zeta 0.911 - GPU as a Coprocessor again?</title><content type='html'>I was able to implement a non-recursive, negamax-like search algorithm on one SIMD Unit of the GPU.&lt;br /&gt;&lt;br /&gt;But SIMD Units run autonomous, so i reconsider the idea to let the GPU act as a Move-Generator and Evaluation-Processor only, the search itself would be done on the CPU.&lt;br /&gt;&lt;br /&gt;--&lt;br /&gt;Srdja&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-5323247883833374093?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/5323247883833374093/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2011/07/zeta-0911-gpu-as-coprocessor-again.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/5323247883833374093'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/5323247883833374093'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2011/07/zeta-0911-gpu-as-coprocessor-again.html' title='Zeta 0.911 - GPU as a Coprocessor again?'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-8564043968686905438</id><published>2011-07-04T16:53:00.003+02:00</published><updated>2011-07-04T16:59:54.427+02:00</updated><title type='text'>Zeta 0.910 - DPPS in sync but slow</title><content type='html'>DPPS on one SIMD Unit is fast, because it has an internal sync function. Across more SIMD Units therre is no built-in sync function, so i have to let the others SIMD Units wait for the workers and this method slows the hole procedure down.&lt;br /&gt;&lt;br /&gt;Have to redesign the use of multible SIMD Units.&lt;br /&gt;&lt;br /&gt;--&lt;br /&gt;Srdja&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-8564043968686905438?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/8564043968686905438/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2011/07/zeta-0910-dpps-in-sync-but-slow.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/8564043968686905438'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/8564043968686905438'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2011/07/zeta-0910-dpps-in-sync-but-slow.html' title='Zeta 0.910 - DPPS in sync but slow'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-1243018361195424539</id><published>2011-07-02T13:31:00.004+02:00</published><updated>2011-07-02T13:36:02.832+02:00</updated><title type='text'>Nvidia 28 nm GPU Keppler in Q4 2011?</title><content type='html'>According to some news sites Nvidia has just taped out their Keppler series....this seems a bit late for a production date in Q4 2011, maybe Q2 2012 is more approriate.&lt;br /&gt;&lt;br /&gt;--&lt;br /&gt;Srdja&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-1243018361195424539?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/1243018361195424539/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2011/07/nvidia-28-nm-gpu-keppler-in-q4-2011.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/1243018361195424539'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/1243018361195424539'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2011/07/nvidia-28-nm-gpu-keppler-in-q4-2011.html' title='Nvidia 28 nm GPU Keppler in Q4 2011?'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-7576971146347029478</id><published>2011-07-01T19:04:00.004+02:00</published><updated>2011-07-02T22:26:24.263+02:00</updated><title type='text'>Zeta 0.905 - DPPS out of sync</title><content type='html'>with the current design i am running out of sync between different warps/wavefronts/ and different SIMD Units.&lt;br /&gt;&lt;br /&gt;SPPS works only inside of one work-group (SIMD Unit) and DPPS works only inside of one Warp/Wavefront of a SIMD Unit.&lt;br /&gt;&lt;br /&gt;Need somekind of "wait_until_others_finishes();"&lt;br /&gt;&lt;br /&gt;--&lt;br /&gt;Srdja&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-7576971146347029478?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/7576971146347029478/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2011/07/zeta-0905-dpps-out-of-sync.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/7576971146347029478'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/7576971146347029478'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2011/07/zeta-0905-dpps-out-of-sync.html' title='Zeta 0.905 - DPPS out of sync'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-3279640904961021557</id><published>2011-06-28T00:37:00.002+02:00</published><updated>2011-06-28T00:40:43.184+02:00</updated><title type='text'>Zeta 0.905 - DPPS with 1 to 32 threads running</title><content type='html'>got DPPS with max 32 threads running, nice speedup, but it is difficult to handle global vars within different warps/wavefronts and different SIMD Units.&lt;br /&gt;&lt;br /&gt;The GTS250 handles 32 Threads in one Warp, so i have to take care about counters for the next Warp...&lt;br /&gt;&lt;br /&gt;--&lt;br /&gt;Srdja&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-3279640904961021557?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/3279640904961021557/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2011/06/zeta-0905-dpps-with-1-to-32-threads.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/3279640904961021557'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/3279640904961021557'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2011/06/zeta-0905-dpps-with-1-to-32-threads.html' title='Zeta 0.905 - DPPS with 1 to 32 threads running'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-2124237932847352100</id><published>2011-06-26T17:38:00.003+02:00</published><updated>2011-06-26T17:39:54.099+02:00</updated><title type='text'>Zeta 0.905 - DPPS</title><content type='html'>working on a Dynamic Parallel Processing Scheme. The current SPPS-Search needs 128 threads for each board position, the coming DPPS will use only as much threads as moves are present...will take some time.&lt;br /&gt;&lt;br /&gt;--&lt;br /&gt;Srdja&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-2124237932847352100?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/2124237932847352100/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2011/06/zeta-0905-dpps.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/2124237932847352100'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/2124237932847352100'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2011/06/zeta-0905-dpps.html' title='Zeta 0.905 - DPPS'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-735470565037828819</id><published>2011-06-24T01:40:00.003+02:00</published><updated>2011-06-24T01:43:55.510+02:00</updated><title type='text'>Zeta 0.903 - Quiscence Search</title><content type='html'>As i thought the SPPS-Search with 128 parallel Threads has problems when the engine enters the Q-Search. In Q-Search are only Capture-Moves considered and as fewer the moves size is the more power i am loosing with spps :(&lt;br /&gt;&lt;br /&gt;--&lt;br /&gt;Srdja&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-735470565037828819?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/735470565037828819/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2011/06/zeta-0903-quiscence-search.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/735470565037828819'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/735470565037828819'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2011/06/zeta-0903-quiscence-search.html' title='Zeta 0.903 - Quiscence Search'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-4669353720929847927</id><published>2011-06-23T11:40:00.004+02:00</published><updated>2011-06-23T12:26:03.196+02:00</updated><title type='text'>Zeta 0.903 - Negamax Scores working</title><content type='html'>Got an parallel mechanism running which behaves like a non-recursive Negamax with 128 parallel threads, i call it "SPPS" - a Simple Parallel Processing Scheme.&lt;br /&gt;&lt;br /&gt;..in other words, Zeta plays its first games of chess :)&lt;br /&gt;&lt;br /&gt;--&lt;br /&gt;Srdja&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-4669353720929847927?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/4669353720929847927/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2011/06/zeta-0903-negamax-scores-working.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/4669353720929847927'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/4669353720929847927'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2011/06/zeta-0903-negamax-scores-working.html' title='Zeta 0.903 - Negamax Scores working'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-253735912806908674</id><published>2011-06-21T13:19:00.000+02:00</published><updated>2011-06-21T13:20:05.562+02:00</updated><title type='text'>Zeta 0.9 - Profiler data</title><content type='html'>&lt;span style="font-weight: bold;"&gt;Occupancy Analysis for kernel spps_gpu on device GeForce GTS 250&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;    Kernel details: Grid size: [16 1 1], Block size: [1 128 1]&lt;br /&gt;&lt;br /&gt;    Register Ratio: 0.9375 ( 7680 / 8192 ) [59 registers per thread]&lt;br /&gt;    Shared Memory Ratio: 0.28125 ( 4608 / 16384 ) [4208 bytes per Block]&lt;br /&gt;&lt;br /&gt;    Active Blocks per SM: 1 (Maximum Active Blocks per SM: 8)&lt;br /&gt;    Active threads per SM: 128 (Maximum Active threads per SM: 768)&lt;br /&gt;&lt;br /&gt;    Potential Occupancy: 0.166667 ( 4 / 24 )&lt;br /&gt;    Achieved occupancy: 0.166667 (on 16 SMs)&lt;br /&gt;&lt;br /&gt;    Occupancy limiting factor: Registers&lt;br /&gt;&lt;br /&gt;--&lt;br /&gt;Srdja&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-253735912806908674?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/253735912806908674/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2011/06/zeta-09-profiler-data_21.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/253735912806908674'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/253735912806908674'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2011/06/zeta-09-profiler-data_21.html' title='Zeta 0.9 - Profiler data'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-3568802903083649214</id><published>2011-06-21T01:43:00.001+02:00</published><updated>2011-06-21T01:45:26.890+02:00</updated><title type='text'>Zeta 0.9 - and correct Perft results :)</title><content type='html'>it took some while, but now are the nodecounters correct.....move gen works still without castling and en passant.&lt;br /&gt;&lt;br /&gt;--&lt;br /&gt;Srdja&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-3568802903083649214?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/3568802903083649214/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2011/06/zeta-09-and-correct-perft-results.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/3568802903083649214'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/3568802903083649214'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2011/06/zeta-09-and-correct-perft-results.html' title='Zeta 0.9 - and correct Perft results :)'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-8600202282915866353</id><published>2011-06-20T17:48:00.003+02:00</published><updated>2011-06-20T17:50:25.835+02:00</updated><title type='text'>Zeta 0.9 - Good Perft Performance</title><content type='html'>first performance tests for move generation with parallel mechanism look good.&lt;br /&gt;&lt;br /&gt;the holy question is if the alpha-beta pruning with move ordering performs as good as on a CPU....&lt;br /&gt;&lt;br /&gt;--&lt;br /&gt;Srdja&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-8600202282915866353?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/8600202282915866353/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2011/06/zeta-09-good-perft-performance.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/8600202282915866353'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/8600202282915866353'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2011/06/zeta-09-good-perft-performance.html' title='Zeta 0.9 - Good Perft Performance'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-5475157807898640383</id><published>2011-06-19T11:40:00.002+02:00</published><updated>2011-06-19T11:40:56.295+02:00</updated><title type='text'>Debugging the move generator</title><content type='html'>for debugging the move generator perft is verrry usefull:&lt;br /&gt;&lt;br /&gt;&lt;a href="http://chessprogramming.wikispaces.com/Perft+Results"&gt;http://chessprogramming.wikispaces.com/Perft+Results&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;--&lt;br /&gt;Srdja&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-5475157807898640383?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/5475157807898640383/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2011/06/debugging-move-generator.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/5475157807898640383'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/5475157807898640383'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2011/06/debugging-move-generator.html' title='Debugging the move generator'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-7125046004660295425</id><published>2011-06-18T16:45:00.004+02:00</published><updated>2011-06-19T01:48:35.844+02:00</updated><title type='text'>AMDs Graphics Core Next - GCN</title><content type='html'>Infos about AMDs next GPU architecture:&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.anandtech.com/show/4455/amds-graphics-core-next-preview-amd-architects-for-compute"&gt;http://www.anandtech.com/show/4455/amds-graphics-core-next-preview-amd-architects-for-compute&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Changes:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;non-VLIW Design&lt;/li&gt;&lt;li&gt;16 wide SIMD Units&lt;/li&gt;&lt;li&gt;4 SIMD Units / Compute Unit&lt;/li&gt;&lt;li&gt;10 Wavefronts / SIMD Unit&lt;br /&gt;&lt;/li&gt;&lt;li&gt;64 KB registers / SIMD Unit&lt;/li&gt;&lt;li&gt;64 KB LDS / CU&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;According to a german news site it is unlikely that GCN will be launched in 2011.&lt;br /&gt;&lt;br /&gt;--&lt;br /&gt;Srdja&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-7125046004660295425?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/7125046004660295425/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2011/06/amds-graphics-core-next-gcn.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/7125046004660295425'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/7125046004660295425'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2011/06/amds-graphics-core-next-gcn.html' title='AMDs Graphics Core Next - GCN'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-6496441554015612810</id><published>2011-06-17T13:32:00.000+02:00</published><updated>2011-06-17T13:33:04.288+02:00</updated><title type='text'>New 28 nm GPUs from AMD this year</title><content type='html'>&lt;div class="date-posts"&gt;&lt;div class="post-outer"&gt;&lt;div class="post hentry"&gt;&lt;div class="post-header"&gt;  &lt;/div&gt; &lt;div class="post-body entry-content" id="post-body-523026404776947678"&gt; according to some news sites AMD is going to launch new 28 nm GPUs this year.&lt;br /&gt;&lt;br /&gt;It is not clear if only the performance will increase or a new GPU architecture will be introduced....&lt;br /&gt;&lt;br /&gt;--&lt;br /&gt;Srdja &lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-6496441554015612810?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/6496441554015612810/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2011/06/new-28-nm-gpus-from-amd-this-year_17.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/6496441554015612810'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/6496441554015612810'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2011/06/new-28-nm-gpus-from-amd-this-year_17.html' title='New 28 nm GPUs from AMD this year'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-820795471154855523</id><published>2011-06-16T16:48:00.005+02:00</published><updated>2011-06-21T21:09:18.453+02:00</updated><title type='text'>Zeta 0.9 - Pieces are moving again</title><content type='html'>Yeeha, Pieces are moving again on the GPU :)&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Zeta 0.9 approach&lt;/span&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Quad Bitboard Board Presentation&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Magic Bitboard Move Generation for sliders&lt;/li&gt;&lt;li&gt;Attack Tables for non-sliders&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;Thought a while on a 32 bit Move Generator but 1) the board presentation uses too much memory and 2) the nested while loops are not very SIMD friendly.&lt;br /&gt;Quad Bitboards use only 32 Bytes for one Board and can be combined with a fast Magic-Bitboard Move Generator.&lt;br /&gt;&lt;br /&gt;I also tried a Kogge-Stone Move Generator with 8 threads for 8 directions in parallel, but the Magic Bitboards performed better.&lt;br /&gt;&lt;br /&gt;On the Nvidia GTS250 i am currently able to run 16*128 threads. With these 2048 parallel threads i got an occupancy of the GPU of 1/6 only, the GPU is able to run max 12880 threads.&lt;br /&gt;&lt;br /&gt;With some work on the registers use i could double the running threads, i guess more is not possible.&lt;br /&gt;&lt;br /&gt;Other GPUs have a better registers/thread ratio. The AMD 69xx series for example offers 256 KB private memory and 32 KB local memory.&lt;br /&gt;&lt;br /&gt;--&lt;br /&gt;Srdja&lt;br /&gt;&lt;br /&gt;PS: did i mention that i am looking for sponsors? :-)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-820795471154855523?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/820795471154855523/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2011/06/zeta-09-pieces-are-moving-again.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/820795471154855523'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/820795471154855523'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2011/06/zeta-09-pieces-are-moving-again.html' title='Zeta 0.9 - Pieces are moving again'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-5653782262781327937</id><published>2011-06-15T22:20:00.004+02:00</published><updated>2011-06-16T14:47:02.273+02:00</updated><title type='text'>GTS450 with 2 GB available</title><content type='html'>Just saw that there are entry level GPUs with 2 GB RAM available, GTS450 for &lt; 100 Euro.&lt;br /&gt;&lt;br /&gt;With 2 GB RAM the limiting factor would only be the register sizes of private and local memory, which are a bit more relaxed on the Fermi architecture,&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;GTS450:&lt;/span&gt;&lt;br /&gt;Shaders: 196&lt;br /&gt;SIMD Units: 4&lt;br /&gt;GPU Clock: 783 MHz&lt;br /&gt;Shader Clock: 1566 MHz&lt;br /&gt;GFLOP/s SP: 601.34 (max. single point precision)&lt;br /&gt;GFLOP/s DP: 75 (max. double point precision)&lt;br /&gt;Memory Interface: 128 bit&lt;br /&gt;Memory Clock: 3608 MHz (GDDR3)&lt;br /&gt;Memory Bandwidth: 57.73 GB/s&lt;br /&gt;private memory/SIMD Unit: 128 KB&lt;br /&gt;local memory/SIMD Unit: 48 KB&lt;br /&gt;Threads/SIMD Unit: 1536&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-5653782262781327937?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/5653782262781327937/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2011/06/gts450-with-2-gb-available.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/5653782262781327937'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/5653782262781327937'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2011/06/gts450-with-2-gb-available.html' title='GTS450 with 2 GB available'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-2954276355880097444</id><published>2011-06-15T18:19:00.004+02:00</published><updated>2011-06-16T14:45:43.658+02:00</updated><title type='text'>Memory limits max amount of threads</title><content type='html'>1) private and shared memory limits the max amount of threads by register size for kernel computation,&lt;br /&gt;&lt;br /&gt;2) but global memory limits too :(&lt;br /&gt;&lt;br /&gt;In OpenCL currently only 1/4 of physical GPU RAM is allocateable in one block, means 128 MB from 512...that is bad....with my actual design i am only able to use 2048 instead of 12288 possible threads.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;  CL_DEVICE_MAX_MEM_ALLOC_SIZE:  128 MByte&lt;br /&gt;  CL_DEVICE_GLOBAL_MEM_SIZE:  511 MByte&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;--&lt;br /&gt;Srdja&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-2954276355880097444?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/2954276355880097444/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2011/06/memory-limits-max-amount-of-threads.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/2954276355880097444'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/2954276355880097444'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2011/06/memory-limits-max-amount-of-threads.html' title='Memory limits max amount of threads'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-5116821488844744943</id><published>2011-06-14T16:14:00.004+02:00</published><updated>2011-06-16T16:34:32.158+02:00</updated><title type='text'>Registers on GTS250</title><content type='html'>Running actual into problems with the size of private memory on a Nvidia GTS250.&lt;br /&gt;&lt;br /&gt;GTS250 has 16 SIMD Units each got 8K Registers, means 32 KB of private memory and is capable of running 24 Warps a 32 Threads. Means i have to run at least 768 threads per SIMD Unit to get a occupancy of 100%, means i got only 42 Bytes for kernel computation :(&lt;br /&gt;&lt;br /&gt;Same with shared memory, which is 16 KB per SIMD Unit, 16384 Bytes / 768 Threads = 21,333333333 Bytes/Thread.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;--&lt;br /&gt;Srdja&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-5116821488844744943?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/5116821488844744943/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2011/06/registers-on-gts250.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/5116821488844744943'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/5116821488844744943'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2011/06/registers-on-gts250.html' title='Registers on GTS250'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-7554252139392664563</id><published>2011-06-13T15:59:00.002+02:00</published><updated>2011-06-13T16:03:24.102+02:00</updated><title type='text'>New and maybe last approach</title><content type='html'>To finish this project in a clean way i will code all my ideas straight down in OpenCL,&lt;br /&gt;even if the performance tests show up that it will end in a desaster ;)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-7554252139392664563?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/7554252139392664563/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2011/06/new-and-maybe-last-approach.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/7554252139392664563'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/7554252139392664563'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2011/06/new-and-maybe-last-approach.html' title='New and maybe last approach'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-8034217418113676678</id><published>2011-06-13T04:47:00.002+02:00</published><updated>2011-06-13T04:49:23.839+02:00</updated><title type='text'>Always when...</title><content type='html'>i give up i get the best ideas....will be back soon.&lt;br /&gt;&lt;br /&gt;--&lt;br /&gt;Srdja&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-8034217418113676678?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/8034217418113676678/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2011/06/always-when.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/8034217418113676678'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/8034217418113676678'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2011/06/always-when.html' title='Always when...'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-6960791200936175354</id><published>2011-06-08T23:34:00.003+02:00</published><updated>2011-06-08T23:59:36.047+02:00</updated><title type='text'>Some reasons against chess on GPUs</title><content type='html'>&lt;span style="font-weight:bold;"&gt;GPUs are "SIMT" devices, Single Instruction Mutlible Thread&lt;/span&gt;&lt;br /&gt;A GPU consists of tens of SIMD Units, inside of these SIMD Units every process executes the same code, so branches and while loops are not welcome. Therefore a Movegenerator has to be designed SIMD friendly, because every process has to wait for the others to finish.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Memory&lt;/span&gt;&lt;br /&gt;Too little, fast private and local memory. Global Memory (RAM) is slow. A GPU-chess program has to be designed that every process which works on an board can hold the board presentation in private or local memory.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Not that fast&lt;/span&gt;&lt;br /&gt;GPUs have more "raw" computation power than CPUs but also a lot of restrictions.&lt;br /&gt;Todays High-End GPUs have about 600 GFLOP in DP, a 4 Core CPU maybe 50 GFLOP.&lt;br /&gt;The standard MinMax algorithm is boosted on CPUs with alpha-beta pruning by a magnitude. And alpha-beta-pruning can be boosted again by a magnitude with good move ordering.&lt;br /&gt;So without ab-pruning even the fastet GPU will loose the game against a CPU.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;No Recursion, no communication, no sync&lt;/span&gt;&lt;br /&gt;How to build an Master-Slave relationship between Threads in an parallel alpha-beta algorithm for distributing work with these limitations?&lt;br /&gt;&lt;br /&gt;--&lt;br /&gt;Srdja&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-6960791200936175354?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/6960791200936175354/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2011/06/some-reasons-against-chess-on-gpus.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/6960791200936175354'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/6960791200936175354'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2011/06/some-reasons-against-chess-on-gpus.html' title='Some reasons against chess on GPUs'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-8011858304064313054</id><published>2011-02-03T20:39:00.004+01:00</published><updated>2011-02-03T21:05:17.824+01:00</updated><title type='text'>Summary II: GPUs and Chess</title><content type='html'>After almost one year GPU-Chess it is time for a second summary:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;GPU as Coprocessor&lt;/span&gt;&lt;br /&gt;Because of the communication latency between Host and GPU a no go,&lt;br /&gt;maybe AMDs and Intels new APUs are more decent for this but afaik there are not many stream cores on die.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;One Thread one Board Idea&lt;br /&gt;&lt;/span&gt; Is a stopper, over last generations of GPUs the local/private memory by thread was almost constant. Where "memory by thread" means a constellation of threads which is able to use the computation power of an GPU. In Numbers: 32 to 512 Bytes by thread.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Multi-Thread one Board&lt;/span&gt;&lt;br /&gt;Because of local memory limits it is neccassary to couple threads together and use the coupled memory for computation.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Board Presentation&lt;/span&gt;&lt;br /&gt;Because of the GPU-architecure a 16*4*32 Bit optimized Board Presentation ("Quarter BitBoards") would perform best.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Move Generation&lt;/span&gt;&lt;br /&gt;It is questionable if the hole chess move generation including checks for legality perform well on such a SIMD device.&lt;br /&gt;My tests with dummy data showed that a clear BitBoard Design with precalculated Hashtables ("Magic BitBoards") performs better then nested loops. The point is if the Hashtables fit in fast local/private memory or has to be fetched from slow global memory....&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Parallel Search Algorithm&lt;/span&gt;&lt;br /&gt;For the most efficient parallel game tree search algoroithms is communication between threads/tasks needed. But OpenCL doesnt offer such built in communication, it has to be solved manually with slow global memory.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Outlook&lt;/span&gt;&lt;br /&gt;I will give the "Quarter BitBoard" idea a try and will see....&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-8011858304064313054?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/8011858304064313054/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2011/02/summary-ii-gpus-and-chess.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/8011858304064313054'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/8011858304064313054'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2011/02/summary-ii-gpus-and-chess.html' title='Summary II: GPUs and Chess'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-1846517925445218590</id><published>2011-01-29T22:30:00.008+01:00</published><updated>2011-06-14T19:46:42.765+02:00</updated><title type='text'>Nvidia GTS250 / G92b</title><content type='html'>Comparison of entry level Nvidia GPUs, nice cards for starting GPU programming.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;GTS250:&lt;/span&gt;&lt;br /&gt;Shaders: 128&lt;br /&gt;SIMD Units: 16&lt;br /&gt;GPU Clock: 738 MHz&lt;br /&gt;Shader Clock: 1836 MHz&lt;br /&gt;GFLOP/s SP: 705 (max. single point precision)&lt;br /&gt;GFLOP/s DP:  88 (max. double point precision)&lt;br /&gt;Memory Interface: 256 bit&lt;br /&gt;Memory Clock: 2200 MHz (GDDR3)&lt;br /&gt;Memory Bandwidth: 70.4 GB/s&lt;br /&gt;constant memory size:  64 KB&lt;br /&gt;private memory/SIMD Unit: 32 KB&lt;br /&gt;local memory/SIMD Unit:  16 KB&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;8800 GT:&lt;/span&gt;&lt;br /&gt;Shaders: 112&lt;br /&gt;SIMD Units: 14&lt;br /&gt;GPU Clock: 600 MHz&lt;br /&gt;Shader Clock: 1500 MHz&lt;br /&gt;GFLOP/s SP: 504 (max. single point precision)&lt;br /&gt;GFLOP/s DP:  63 (max. double point precision)&lt;br /&gt;Memory Interface: 256 bit&lt;br /&gt;Memory Clock: 1800 MHz (GDDR3)&lt;br /&gt;Memory Bandwidth: 57.6 GB/s&lt;br /&gt;constant memory size:  64 KB&lt;br /&gt;private memory/SIMD Unit: 32 KB&lt;br /&gt;local memory/SIMD Unit:  16 KB&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-1846517925445218590?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/1846517925445218590/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2011/01/nvidia-gts250-g92b.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/1846517925445218590'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/1846517925445218590'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2011/01/nvidia-gts250-g92b.html' title='Nvidia GTS250 / G92b'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-5942621883653165263</id><published>2011-01-29T11:01:00.002+01:00</published><updated>2011-01-29T11:06:24.101+01:00</updated><title type='text'>One thread one board</title><content type='html'>The "one thread one board" idea proparly will never fit on GPUs. Because of the amount of threads needed to use the full computation power of the device the memory by thread is very  limited.&lt;br /&gt;Therefore it may be neccessary to couple some thread to work on one board....&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-5942621883653165263?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/5942621883653165263/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2011/01/one-thread-one-board.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/5942621883653165263'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/5942621883653165263'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2011/01/one-thread-one-board.html' title='One thread one board'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-2203103625831382719</id><published>2011-01-19T08:10:00.014+01:00</published><updated>2011-06-09T00:17:26.531+02:00</updated><title type='text'>Design thoughts for AMDs Cayman</title><content type='html'>The 64 bit performance on Cayman  is 1/4 of 32 bit performance. And memory reads/writes are organised in 128 bit a 4 * 32 bit. So a 32 bit board presentation should perform better.&lt;br /&gt;&lt;br /&gt;Usefull OpenCL Setup could be:&lt;br /&gt;8 simultan workgroups * 16 wide SIMD * 4 Wavefronts * 24 SIMD Units =&gt; 12288 total work-items&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Possible Board Presentations&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Quad-BitBoards, 256 bit&lt;/span&gt;&lt;br /&gt;Quadbitbards with 4*64 bit Board presentation and 64 work-items, one work group, working simultan on one board. Shared memory could hold precalculated attack and Kindegarten-Bitboard tables for move generation.&lt;br /&gt;Pro; 64 work-items working on one board -&gt; more registers for use.&lt;br /&gt;Con: 64 work-items, each for one square, a lot of unused computing power.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Classic BitBoards. 2*6*64 bit&lt;/span&gt;&lt;br /&gt;Classis BitBoards with 2*6*64 bit with 6 work-items working on the same board.&lt;br /&gt;Con: 6 work-items coupled don't have enough private memory for deep calculations.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;32 bit&lt;/span&gt;&lt;br /&gt;Don't have a clue how to organize the board effective in 32 bit parts and to use less than 3/4 more arithmetic operations compared to bitboards for move generation. &lt;br /&gt;Maybe a 0x88 design with some modifications in move generation?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-2203103625831382719?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/2203103625831382719/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2011/01/design-thoughts-for-amds-cayman.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/2203103625831382719'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/2203103625831382719'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2011/01/design-thoughts-for-amds-cayman.html' title='Design thoughts for AMDs Cayman'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-3899099507620720439</id><published>2011-01-19T07:24:00.005+01:00</published><updated>2011-01-24T01:02:40.159+01:00</updated><title type='text'>Memory Latency in AMDs Cayman series</title><content type='html'>&lt;span style="font-weight:bold;"&gt;private memory&lt;/span&gt;&lt;br /&gt;The ALU needs 4 cycles to fetch two 32 bit operands from register file (256KB per SIMD).&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;shared memory&lt;/span&gt;&lt;br /&gt;The LDS - Local Data Share (shared memory) has 32 KB per SIMD Unit. This LDS can be shared by an entire work-group and needs one ALU operation (4 cycles) for read access of two 32 bit values and one ALU operation and one memory operation for writes.&lt;br /&gt;&lt;br /&gt;Compared to the non OpenCL specific GDS - Global Data Share with 24 cycles latency and a total of 64KB it is fast.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-3899099507620720439?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/3899099507620720439/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2011/01/memory-latency-in-amds-cayman-series.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/3899099507620720439'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/3899099507620720439'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2011/01/memory-latency-in-amds-cayman-series.html' title='Memory Latency in AMDs Cayman series'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-1861751112292778071</id><published>2011-01-13T15:41:00.007+01:00</published><updated>2011-01-15T01:00:51.012+01:00</updated><title type='text'>Another programmer giving GPU Chess a try?</title><content type='html'>Hey &lt;a href="http://chessprogramming.wikispaces.com/Vincent+Diepeveen"&gt;Vincent&lt;/a&gt;, welcome to the party? :)&lt;br /&gt;&lt;br /&gt;&lt;a href="http://forums.amd.com/devforum/messageview.cfm?catid=328&amp;threadid=144735&amp;highlight_key=y"&gt;forums.amd.com&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-1861751112292778071?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/1861751112292778071/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2011/01/another-programmer-giving-gpu-chess-try.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/1861751112292778071'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/1861751112292778071'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2011/01/another-programmer-giving-gpu-chess-try.html' title='Another programmer giving GPU Chess a try?'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-4468586018624303204</id><published>2011-01-12T19:54:00.014+01:00</published><updated>2011-01-29T22:57:35.138+01:00</updated><title type='text'>AMD HD 69xx</title><content type='html'>New Cayman cards from AMD:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;AMD 6970&lt;/span&gt;&lt;br /&gt;Shaders: 1536&lt;br /&gt;SIMD Units: 24&lt;br /&gt;GPU Clock: 880 MHz&lt;br /&gt;Shader Clock: 880 MHz&lt;br /&gt;GFLOP/s SP: 2700 (max. single point precision)&lt;br /&gt;GFLOP/s DP: 675 (max. double point precision)&lt;br /&gt;Memory Interface: 256 bit&lt;br /&gt;Memory Clock: 1375 MHz (GDDR5)&lt;br /&gt;Memory Bandwidth: 176 GB/s&lt;br /&gt;Registers/SIMD: 256KB (private memory)&lt;br /&gt;GDS: 64KB (global data share)&lt;br /&gt;LDS: 32KB (shared memory)&lt;br /&gt;VLIW: 4 (previos cypress-model has 5)&lt;br /&gt;&lt;br /&gt;As far as i can see there is no increasement of register size compared to &lt;a href="http://www.realworldtech.com/page.cfm?ArticleID=RWT121410213827&amp;p=6"&gt;Cypress-Architecture (58xx)&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-4468586018624303204?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/4468586018624303204/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2011/01/amd-hd-69xx.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/4468586018624303204'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/4468586018624303204'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2011/01/amd-hd-69xx.html' title='AMD HD 69xx'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-5402629143608428182</id><published>2011-01-11T18:13:00.003+01:00</published><updated>2011-01-11T18:17:32.716+01:00</updated><title type='text'>Zeta Dva</title><content type='html'>i lost my CPU zeta code, hence i will start to code another version:&lt;br /&gt;&lt;br /&gt;&lt;a href="http://zeta-dva.blogspot.com/"&gt;http://zeta-dva.blogspot.com/&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-5402629143608428182?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/5402629143608428182/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2011/01/zeta-dva.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/5402629143608428182'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/5402629143608428182'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2011/01/zeta-dva.html' title='Zeta Dva'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-6627183323294633573</id><published>2010-08-04T15:40:00.000+02:00</published><updated>2010-08-04T15:41:04.592+02:00</updated><title type='text'>Project canceled</title><content type='html'>choose life and not computer chess ;)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-6627183323294633573?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/6627183323294633573/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2010/08/project-canceled.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/6627183323294633573'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/6627183323294633573'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2010/08/project-canceled.html' title='Project canceled'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-8140762133051142420</id><published>2010-07-05T20:29:00.002+02:00</published><updated>2010-07-05T20:40:27.072+02:00</updated><title type='text'>Memory Latency</title><content type='html'>Not only global memory has latency that has to be considered, but also L1 (private) and L2 (local/shared) and constant memory has latency.&lt;br /&gt;&lt;br /&gt;NV papers say 24 clocks for registers (should be L1 memory) and ATI papers say 24 clocks for Global Data Share (should be constant memory).&lt;br /&gt;&lt;br /&gt;So i have to reconsider the decision to use Hash-Tables and Attack-Tables which resist in constant memory. It could be faster just to compute the moves...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-8140762133051142420?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/8140762133051142420/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2010/07/memory-latency.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/8140762133051142420'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/8140762133051142420'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2010/07/memory-latency.html' title='Memory Latency'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-2549995249638057986</id><published>2010-07-01T12:50:00.003+02:00</published><updated>2010-07-01T13:12:16.147+02:00</updated><title type='text'>The future is fusion?</title><content type='html'>According to some &lt;a href="http://news.google.com/news/search?aq=0&amp;pz=1&amp;cf=all&amp;ned=us&amp;hl=us&amp;q=amd+fusion"&gt;news-sites&lt;/a&gt; AMD is going to launch its first APU, Application Processing Unit, in 2010/2011 called Fusion. There will be a quad-core version but how many stream cores will fit on the die?&lt;br /&gt;&lt;br /&gt;With the GPU-Cores on die there is not much latency between GPU and CPU so a Chess Engine could use the CPU in a classic way and the GPU-Cores as a Coprocessor for special tasks like move generation or evaluation in parallel.&lt;br /&gt;&lt;br /&gt;We will see if AMDs APU, Fusion, prevails against the common CPU-GPU concept...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-2549995249638057986?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/2549995249638057986/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2010/07/future-is-fusion.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/2549995249638057986'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/2549995249638057986'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2010/07/future-is-fusion.html' title='The future is fusion?'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-4342219443730209930</id><published>2010-06-30T17:39:00.010+02:00</published><updated>2010-06-30T18:28:25.439+02:00</updated><title type='text'>Zeta 0.8.3.x Quad BitBoard design</title><content type='html'>The new design of the move generator will use &lt;a href="http://chessprogramming.wikispaces.com/Quad-Bitboards"&gt;Quad-BitBoards&lt;/a&gt;, 4*64 Bit = 32 Bytes. I am going to mix precalculated Attack Tables for Pawns, Knights and King, and a &lt;a href="http://chessprogramming.wikispaces.com/Kindergarten+Bitboards"&gt;Kindergarten BitBoard&lt;/a&gt; approach for Rooks, Bishops and Queens. Magic BitBoards are a no go because their tables (40KB+800KB) don't fit in constant memory (64KB). Kindergarten BitBoards use 2*4KB Tables afaik. It is still a "one thread one board" architecture.&lt;br /&gt;&lt;br /&gt;I will assume a min. of 256 Bytes local memory per thread so i can reach a fix search depth of max. 28 per thread without using slow global memory. &lt;br /&gt;&lt;br /&gt;1* 32 Bytes Quad-BitBoard&lt;br /&gt;28 * 4 Bytes move&lt;br /&gt;&lt;br /&gt;Because the move generator is interleaved with the upcoming search algorithm i also need some data for the scores:&lt;br /&gt;&lt;br /&gt;28 * 2 Bytes alpha score&lt;br /&gt;28 * 2 Bytes beta score&lt;br /&gt;&lt;br /&gt;The state of the move generator is stored for each depth in a move with 4 Bytes. So the move generator iterates through all from pieces and throuh all to squares.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;If the fix search depth of 28 is reached i plan a fallback to global memory, packing the previous move and score data in global memory and restoring it when the thread is moving back in search tree.&lt;br /&gt;&lt;br /&gt;With 256 Bytes of local memory as requirement i could use following OpenCL setup:&lt;br /&gt;&lt;br /&gt;NV 8800 GT:&lt;br /&gt;total work size: 896&lt;br /&gt;local work size:  64&lt;br /&gt;&lt;br /&gt;ATI 5770:&lt;br /&gt;total work size: 1280&lt;br /&gt;local work size:   64&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;...this will take some time...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-4342219443730209930?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/4342219443730209930/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2010/06/zeta-083x-quad-bitboard-design.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/4342219443730209930'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/4342219443730209930'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2010/06/zeta-083x-quad-bitboard-design.html' title='Zeta 0.8.3.x Quad BitBoard design'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-3423440797928514559</id><published>2010-06-29T22:14:00.006+02:00</published><updated>2010-06-29T22:40:41.189+02:00</updated><title type='text'>Threads again</title><content type='html'>With thanks to Philippe from &lt;a href="http://blog.cudachess.org/"&gt;Cuda Chess&lt;/a&gt; i got now a clue how GPU threading works.&lt;br /&gt;&lt;br /&gt;The numbers mentioned in an earlier post are the max the device can handle. But every application is different, in memory use or cycles of calculation, so other thread configurations perform better.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-3423440797928514559?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/3423440797928514559/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2010/06/threads-again.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/3423440797928514559'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/3423440797928514559'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2010/06/threads-again.html' title='Threads again'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-3159912691119389436</id><published>2010-06-27T22:00:00.004+02:00</published><updated>2010-06-27T22:03:43.429+02:00</updated><title type='text'>Redesign of the move generator</title><content type='html'>I thought of using one thread for one board, but with the new numbers in mind i got only about 20 Bytes of memory for each thread...an alternative would be to couple threads working on the same board...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-3159912691119389436?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/3159912691119389436/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2010/06/redesign-of-move-generator.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/3159912691119389436'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/3159912691119389436'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2010/06/redesign-of-move-generator.html' title='Redesign of the move generator'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-2562369788345247598</id><published>2010-06-27T21:40:00.005+02:00</published><updated>2010-06-27T21:46:07.155+02:00</updated><title type='text'>Concurrent threads on GPUs</title><content type='html'>Many sources, many different numbers, here some new:&lt;br /&gt;&lt;br /&gt;ATI 5870: 31744&lt;br /&gt;&lt;a href="http://developer.amd.com/gpu_assets/Heterogeneous_Computing_OpenCL_and_the_ATI_Radeon_HD_5870_Architecture_201003.pdf"&gt;5870&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;8800 GT: 10752&lt;br /&gt;&lt;a href="http://www.google.de/url?sa=t&amp;source=web&amp;cd=1&amp;ved=0CBoQFjAA&amp;url=http%3A%2F%2Fwww.nvidia.com%2Fdocs%2FIO%2F55972%2F220401_Reprint.pdf&amp;rct=j&amp;q=nvidia+8800+concurrent+threads&amp;ei=q6gnTITcEcyVOJaEnaoC&amp;usg=AFQjCNHZ1bLxFlwHw92PtDJZrEpVPADqLg"&gt;8800 GT&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-2562369788345247598?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/2562369788345247598/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2010/06/concurrent-threads-on-gpus.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/2562369788345247598'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/2562369788345247598'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2010/06/concurrent-threads-on-gpus.html' title='Concurrent threads on GPUs'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-7553661660956246159</id><published>2010-06-22T16:18:00.017+02:00</published><updated>2011-06-13T02:27:39.434+02:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='8800 GT G92 ATI 5770 Chess OpenCL'/><title type='text'>Close to the metal</title><content type='html'>A comparison of the Nvidia 8800GT and ATI 5770 computing capabilities:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;8800 GT:&lt;/span&gt;&lt;br /&gt;Shaders: 112&lt;br /&gt;SIMD Units: 14&lt;br /&gt;GPU Clock: 600 MHz&lt;br /&gt;Shader Clock: 1500 MHz&lt;br /&gt;GFLOP/s SP: 504 (max. single point precision)&lt;br /&gt;GFLOP/s DP: 63 (max. double point precision)&lt;br /&gt;Memory Interface: 256 bit&lt;br /&gt;Memory Clock: 1800 MHz (GDDR3)&lt;br /&gt;Memory Bandwidth: 57.6 GB/s&lt;br /&gt;L1 Memory: ??? 8 KB per SIMD Unit&lt;br /&gt;L2 Memory: ??? 16KB per SIMD Unit&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;&lt;br /&gt;5770:&lt;/span&gt;&lt;br /&gt;Shaders: 800&lt;br /&gt;SIMD Units: 10&lt;br /&gt;GPU Clock: 850 MHz&lt;br /&gt;Shader Clock: 850 MHz&lt;br /&gt;GFLOP/s SP: 1360 (max. single point precision)&lt;br /&gt;GFLOP/s DP: -/- (max. double point precision)&lt;br /&gt;Memory Interface: 128 bit&lt;br /&gt;Memory Clock: 1200 MHz (GDDR5)&lt;br /&gt;Memory Bandwidth: 76.8 GB/s&lt;br /&gt;L1 Memory: 8KB per SIMD Unit&lt;br /&gt;L2 Memory: 32KB per SIMD Unit&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;The 8800 GT is a little bit outdated (not sure if they are still available) but still a good shot for GPU-Computing and the ATI 5770 is more a midrange card. So this is not a comparison of equivalent GPUs or of NV and ATI. But it will show some differences in GPU Computing which have to be considered.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;ATI 5770&lt;/span&gt;&lt;br /&gt;The 5770 needs 2 cycles to compute a Warp/Wavefront with 16 threads. This means we could run our OpenCL application effectively with 2560 total work-items and a work group size of 256 so we get 128 Bytes of local memory per work-item. &lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Nvidia 8800 GT&lt;/span&gt;&lt;br /&gt;The 88000 GT needs 4 cycles to compute a Warp/Wavefront with 32 threads. This means we could run our OpenCL application effectively with 448 total work-items and a work group size of 32 so we get 512 Bytes of local memory per work-item.&lt;br /&gt;&lt;br /&gt;*** Edit 20100625 ***&lt;br /&gt;ATI: 10240 total, 256 local with 32 Bytes&lt;br /&gt;NV:   3584 total, 256 local with 64 Bytes&lt;br /&gt;should be a theoreticaly possible configuration&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;If someone wishes to design a Chess Engine that is able to run on different GPU architectures effectively then he has to consider that:&lt;br /&gt;&lt;br /&gt;1) global work item size can differ&lt;br /&gt;2) local work item size can differ&lt;br /&gt;3) size of memory per work-item can differ&lt;br /&gt;4) Memory Bandwidth/work-item can differ&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-7553661660956246159?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/7553661660956246159/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2010/06/close-to-metal.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/7553661660956246159'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/7553661660956246159'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2010/06/close-to-metal.html' title='Close to the metal'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-3548653864547701930</id><published>2010-06-22T15:25:00.007+02:00</published><updated>2010-06-22T16:17:22.153+02:00</updated><title type='text'>Zeta 0.8.2.2, 0x88 -&gt; fast move generator</title><content type='html'>With the new knowlegde from canada i was able to get closer to the max possible amount of moves generated on my GPU. The generator is now fast enough to feed a search algorithm with moves, "just" have to care about memory limits...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-3548653864547701930?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/3548653864547701930/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2010/06/zeta-0822-0x88-fast-move-generator.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/3548653864547701930'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/3548653864547701930'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2010/06/zeta-0822-0x88-fast-move-generator.html' title='Zeta 0.8.2.2, 0x88 -&gt; fast move generator'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-3669878059729328311</id><published>2010-06-22T13:32:00.006+02:00</published><updated>2010-06-22T16:15:38.459+02:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='CUDA CHESS'/><title type='text'>Merci beaucoup Monsieur Cuda Chess</title><content type='html'>I am in contact with my french Cuda-Colleague from canada and learned a lot about GPU specs.&lt;br /&gt;&lt;br /&gt;From OpenCL view there are just kernels, work-items and work-groups. But the GPU architecture is more sophisticated. Warps/Wavefronts, different cycles for muls and mads, and memory bandwidth/latency limitations.&lt;br /&gt;&lt;br /&gt;Thanks!&lt;br /&gt;&lt;br /&gt;Visit his blog: &lt;a href="http://blog.cudachess.org/"&gt;Cuda Chess&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-3669878059729328311?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/3669878059729328311/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2010/06/merci-monsieur-cuda-chess.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/3669878059729328311'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/3669878059729328311'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2010/06/merci-monsieur-cuda-chess.html' title='Merci beaucoup Monsieur Cuda Chess'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-6405668246088957260</id><published>2010-06-17T15:28:00.005+02:00</published><updated>2011-06-13T02:30:13.325+02:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='GPU Chess global local memory'/><title type='text'>Global Memory usage</title><content type='html'>Wondered why my move generation is slow....here is the anwser:&lt;br /&gt;&lt;br /&gt;&lt;a href="http://blog.cudachess.org/2010/06/memory-limits-and-new-developer-generation/"&gt;memory-limits-and-new-developer-generation&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;My CUDA-colleague has an better eye for hardware limitations.&lt;br /&gt;&lt;br /&gt;My hole computations ran with "slow" global Memory because the data just didnt fit in private/local memory!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-6405668246088957260?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/6405668246088957260/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2010/06/global-memory-usage.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/6405668246088957260'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/6405668246088957260'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2010/06/global-memory-usage.html' title='Global Memory usage'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-8012932300011845479</id><published>2010-06-15T16:04:00.000+02:00</published><updated>2010-06-15T18:28:35.086+02:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='GPU Chess'/><title type='text'>Summary I: GPUs and Chess</title><content type='html'>At first i thought of using the GPU as a Coprocessor to generate moves in parallel or do fast evaluation. But the Host-GPU latency is too high for this approach.&lt;br /&gt;&lt;br /&gt;So the next idea was to pack the hole search algorithm on the GPU. This is a huge step and consist in main of these parts&lt;br /&gt;&lt;br /&gt;1) Move generation on GPU&lt;br /&gt;Has to be slim and fast to fit for the "little" Processing Elements of the GPU.&lt;br /&gt;&lt;br /&gt;2) Evaluation on GPU&lt;br /&gt;s.a.&lt;br /&gt;&lt;br /&gt;3) Non-recursive search algorithm&lt;br /&gt;Something like a stack and a while loop is needed to resolve recursion.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;But to use the hole power of the GPU there is also a need of parallizing the search algorithm.&lt;br /&gt;This will cause some problems because in OpenCL work-items of a work-group can be synced but not work-items accross other work-groups. So a effective Load-Balancing mechanism is needed to give work to idle Proccesors. Again we need some stack mechanism and loops.&lt;br /&gt;&lt;br /&gt;Outlook:&lt;br /&gt;Zetas move generation is still slow compared to CPU-engines but fast enough to do the next step and work on a parallized search algorithm like PVS.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-8012932300011845479?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/8012932300011845479/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2010/06/summary-i-gpus-and-chess.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/8012932300011845479'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/8012932300011845479'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2010/06/summary-i-gpus-and-chess.html' title='Summary I: GPUs and Chess'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-6301267270843902158</id><published>2010-06-15T15:37:00.005+02:00</published><updated>2011-06-09T00:24:50.849+02:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='SIMD SPMD GPU OpenCL'/><title type='text'>SIMD, SPMD, GPUs and OpenCL</title><content type='html'>Here some technical notes:&lt;br /&gt;&lt;br /&gt;1) GPU architecture&lt;br /&gt;GPUs consist of a lot of simple Processing Elements. These little processors are coupled to Stream Cores and Stream Cores are coupled to SIMD Units on the GPU.&lt;br /&gt;The ATI 5870 for example has 1600 Processing Elements. One Stream Core has 5 Processing Elements (4 for normal computation) and 16 Stream Cores build one SIMD Unit with a total of 20 of these SIMD Units.&lt;br /&gt;&lt;br /&gt;2) SIMD and SPMD on GPUs&lt;br /&gt;Within the Processing Elements of a Stream Core the GPU is working as a SIMD device but every SIMD Unit works autonomous so these SIMD Units work like SPMD.&lt;br /&gt;&lt;br /&gt;3) OpenCL&lt;br /&gt;OpenCL language has the terms work-item and work-group. Work-item are threads within a SIMD-Unit, and work-groups are these SIMD-Units. In OpenCL SIMD-Unit means Computing-Unit. And the hole GPU is an Computing-Device.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-6301267270843902158?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/6301267270843902158/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2010/06/simd-spmd-gpus-and-opencl.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/6301267270843902158'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/6301267270843902158'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2010/06/simd-spmd-gpus-and-opencl.html' title='SIMD, SPMD, GPUs and OpenCL'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-7876418882282768233</id><published>2010-06-13T18:46:00.000+02:00</published><updated>2010-06-14T21:04:38.035+02:00</updated><title type='text'>Zeta 0.8.2.1.3, 0x88 movegen speedup</title><content type='html'>10000 moves need now about 1 second to be calculated on one SIMD processor. The GPU i use has ten of those SIMD processors. This is still very slow...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-7876418882282768233?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/7876418882282768233/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2010/06/zeta-08213-0x88-movegen-speedup.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/7876418882282768233'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/7876418882282768233'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2010/06/zeta-08213-0x88-movegen-speedup.html' title='Zeta 0.8.2.1.3, 0x88 movegen speedup'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-5462542259474873103</id><published>2010-06-13T18:37:00.001+02:00</published><updated>2010-06-13T18:38:40.484+02:00</updated><title type='text'>Cant get BitBoards working on GPU</title><content type='html'>the magic-hashtables i copied from Stockfish dont work on the GPU...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-5462542259474873103?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/5462542259474873103/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2010/06/cant-get-bitboards-working-on-gpu.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/5462542259474873103'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/5462542259474873103'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2010/06/cant-get-bitboards-working-on-gpu.html' title='Cant get BitBoards working on GPU'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-182463504523469301</id><published>2010-06-09T13:04:00.000+02:00</published><updated>2010-06-09T13:07:11.157+02:00</updated><title type='text'>0x88 meets BitBoards</title><content type='html'>Tested the GPU movegeneration with some dummy data to emulate the behavour of an 0x88-BitBoard approach....looks good.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-182463504523469301?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/182463504523469301/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2010/06/0x88-meets-bitboards.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/182463504523469301'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/182463504523469301'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2010/06/0x88-meets-bitboards.html' title='0x88 meets BitBoards'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-7513715286601692954</id><published>2010-06-07T16:08:00.000+02:00</published><updated>2010-06-07T16:16:32.075+02:00</updated><title type='text'>Zeta 0.8.2.1, 0x88, movegen still slow</title><content type='html'>One Processor of the GPU needs ~ 100 milliseconds to calculate 20 moves. The same code on the CPU needs less than 5 milliseconds. So my code of move generation is not optimized to fit on SIMD architecture. &lt;br /&gt;&lt;br /&gt;...maybe i should mix the 0x88 board presentation with bitboard-precalculated attack tables...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-7513715286601692954?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/7513715286601692954/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2010/06/zeta-0821-0x88-movegen-still-slow.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/7513715286601692954'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/7513715286601692954'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2010/06/zeta-0821-0x88-movegen-still-slow.html' title='Zeta 0.8.2.1, 0x88, movegen still slow'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-2541396959025987334</id><published>2010-06-04T22:01:00.001+02:00</published><updated>2010-06-04T22:06:13.713+02:00</updated><title type='text'>Zeta 0.8.2.1, 0x88 with new design</title><content type='html'>Redesigned the 0x88 move generation to fit on SIMD architecture...this is the first time i really believe that the project will hit its goals :)&lt;br /&gt;&lt;br /&gt;Next step is to implement an load balancing mechanism like PVS.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-2541396959025987334?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/2541396959025987334/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2010/06/zeta-0821-0x88-with-new-design.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/2541396959025987334'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/2541396959025987334'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2010/06/zeta-0821-0x88-with-new-design.html' title='Zeta 0.8.2.1, 0x88 with new design'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-6556239365186893737</id><published>2010-06-03T00:50:00.000+02:00</published><updated>2010-06-03T20:51:45.762+02:00</updated><title type='text'>SPMD/SIMD on GPUs</title><content type='html'>Had to realize that every if-else condition doubled the runtime on the GPU.&lt;br /&gt;&lt;br /&gt;I know that this is true for SIMD architecture, but GPU shall be able to act although as SPMD devices where every Proccessing Element has its own instruction counter.&lt;br /&gt;&lt;br /&gt;So i hope it is only my "old" hardware which doesnt support SPMD....&lt;br /&gt;&lt;br /&gt;##edit 20100603###&lt;br /&gt;&lt;br /&gt;its not my hardware...GPUs are more like "SIMD within SPMD"....&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-6556239365186893737?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/6556239365186893737/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2010/06/spmdsimd-on-gpus.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/6556239365186893737'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/6556239365186893737'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2010/06/spmdsimd-on-gpus.html' title='SPMD/SIMD on GPUs'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-779508164728066616</id><published>2010-06-02T18:23:00.000+02:00</published><updated>2010-06-02T18:32:32.882+02:00</updated><title type='text'>Zeta 0.8.2.0, 0x88 -&gt; simple moves on GPU</title><content type='html'>I ported the 0x88 move generation (without castling moves) from MicroMax by H.G. Mueller to the GPU. &lt;br /&gt;&lt;br /&gt;The MicroMax Code is compact and fast, but i still dont have a clue how to port the castling moves in an efficient matter...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-779508164728066616?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/779508164728066616/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2010/06/zeta-0820-0x88-simple-moves-on-gpu.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/779508164728066616'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/779508164728066616'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2010/06/zeta-0820-0x88-simple-moves-on-gpu.html' title='Zeta 0.8.2.0, 0x88 -&gt; simple moves on GPU'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-1846981237294507001</id><published>2010-05-30T14:03:00.000+02:00</published><updated>2010-05-30T14:08:17.987+02:00</updated><title type='text'>Zeta 0.8.2.0,  working on 0x88 on GPU</title><content type='html'>After bitboards and 12x10 board presentation i will give the 0x88 array structure a try on the GPU.&lt;br /&gt;&lt;br /&gt;... i redesigned the 12x10 array move generation but its still too slow on the GPU.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-1846981237294507001?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/1846981237294507001/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2010/05/zeta-0820-working-on-0x88-on-gpu.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/1846981237294507001'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/1846981237294507001'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2010/05/zeta-0820-working-on-0x88-on-gpu.html' title='Zeta 0.8.2.0,  working on 0x88 on GPU'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-5753191472479030419</id><published>2010-05-27T00:55:00.001+02:00</published><updated>2010-05-30T14:16:27.999+02:00</updated><title type='text'>Zeta 0.8.1.0,  12x10 on GPU</title><content type='html'>i implemented an 12x10 array move generation on the gpu, as i thought this approach is much faster than bitboards becouse 1) less memory operations are used and 2) GPUs have more power on single precision operations.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-5753191472479030419?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/5753191472479030419/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2010/05/zeta-0810-12x10-on-gpu.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/5753191472479030419'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/5753191472479030419'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2010/05/zeta-0810-12x10-on-gpu.html' title='Zeta 0.8.1.0,  12x10 on GPU'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-6271074857230400239</id><published>2010-05-26T20:37:00.000+02:00</published><updated>2010-05-30T14:15:38.521+02:00</updated><title type='text'>Zeta, 0.8.0.6, poor bitboard performance</title><content type='html'>i implemented the hole bitboard-move-generation on the GPU. one process needs about 2 seconds to calculate the first 20 moves only. must be some global memory issuses.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-6271074857230400239?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/6271074857230400239/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2010/05/zeta-0806-poor-performance.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/6271074857230400239'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/6271074857230400239'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2010/05/zeta-0806-poor-performance.html' title='Zeta, 0.8.0.6, poor bitboard performance'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-380950098095305943</id><published>2010-05-25T22:28:00.000+02:00</published><updated>2010-05-25T22:40:40.795+02:00</updated><title type='text'>Zeta 0.8.0.6, Pawns are movin :)</title><content type='html'>yeha,&lt;br /&gt;&lt;br /&gt;pawn moves are now generated in OpenCL on the GPU. &lt;br /&gt;&lt;br /&gt;Communication latency between Host and GPU is high (~ 50 ms), but i hope the calculation speedup by the GPU will balance this out.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-380950098095305943?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/380950098095305943/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2010/05/zeta-gpu-version-0806-pawns-are-movin.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/380950098095305943'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/380950098095305943'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2010/05/zeta-gpu-version-0806-pawns-are-movin.html' title='Zeta 0.8.0.6, Pawns are movin :)'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-8326544124524444184</id><published>2010-05-24T16:26:00.000+02:00</published><updated>2010-05-24T16:50:24.750+02:00</updated><title type='text'>Implementing PV-Split</title><content type='html'>Currently i am working on an implementaion of the Principal Variation Splitting algorithm in OpenCL.&lt;br /&gt;&lt;br /&gt;PVS:&lt;br /&gt;The root process goes the left tree down to the max search depth. Then backs one ply up and gives the remaining boards to other processing elements for calculation. When all processors finished their calculation the root process backs again one ply up and gives work to processors. This procedure is repeated until depth 0, the root, is reached.&lt;br /&gt;&lt;br /&gt;Pro:&lt;br /&gt;Easy to implement.&lt;br /&gt;&lt;br /&gt;Contra:&lt;br /&gt;1) We got on an average of 40 Boards by depth. This means if there are more than 40 processing elements not all will get busy.&lt;br /&gt;&lt;br /&gt;2) When one proccesing element finishes its calculation it goes idle, waitung for the others to finish.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Maybe the contras can be reduced with some extensions to the original PVS...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-8326544124524444184?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/8326544124524444184/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2010/05/implementing-pv-split.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/8326544124524444184'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/8326544124524444184'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2010/05/implementing-pv-split.html' title='Implementing PV-Split'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-4659202868581061047</id><published>2010-05-20T14:12:00.000+02:00</published><updated>2010-05-20T14:19:25.129+02:00</updated><title type='text'>No multi-GPU support</title><content type='html'>To keep things simple i decided to perform the chess calculations at the beginning only on one GPU/OpenCL Device.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-4659202868581061047?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/4659202868581061047/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2010/05/no-multi-gpu-support.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/4659202868581061047'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/4659202868581061047'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2010/05/no-multi-gpu-support.html' title='No multi-GPU support'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-2181503772231893711</id><published>2010-05-20T13:58:00.001+02:00</published><updated>2010-05-20T14:07:28.707+02:00</updated><title type='text'>PVS and DTS</title><content type='html'>Had some time to read about PVS and DTS parallel search algorithms.&lt;br /&gt;&lt;br /&gt;DTS (Dynamic Tree Splitting) has better scalability but is difficult to implement.&lt;br /&gt;PVS (Principal Variation Splitting) scalability depends on the amount of moves, i read about a max speedup of 5x in Chess.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-2181503772231893711?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/2181503772231893711/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2010/05/pvs-and-dts.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/2181503772231893711'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/2181503772231893711'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2010/05/pvs-and-dts.html' title='PVS and DTS'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-283272584551314088</id><published>2010-03-27T11:45:00.001+01:00</published><updated>2010-06-22T16:20:10.766+02:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='NVIDIA GTX480 TESLA M2050 M2070 ATI 5870'/><title type='text'>Nvidia Fermi, GTX480 specs out</title><content type='html'>Specifications of the new Fermi cards by Nvidia are out.&lt;br /&gt;&lt;br /&gt;*** Edit 20100521 ***&lt;br /&gt;just read that the nvidia GeForce GTX Cards have only 1/8 DP of SP and Nvidia Tesla series has 1/2.&lt;br /&gt;&lt;br /&gt;GTX480:&lt;br /&gt;Shader Units: 480 Shader&lt;br /&gt;Shader Clock: 1401 MHz &lt;br /&gt;GFlops SP: 1.344,96&lt;br /&gt;GFlops DP: 168,12&lt;br /&gt;Max Power: &gt;250 W   &lt;br /&gt;&lt;br /&gt;Tesla M2050/M2070:&lt;br /&gt;Shader Units: 448 Shader&lt;br /&gt;Shader Clock: 1.150 MHz &lt;br /&gt;GFlops SP: 1.030 Gflops&lt;br /&gt;GFlops DP: 515 Gflops&lt;br /&gt;Max Power: 247W TDP   &lt;br /&gt;&lt;br /&gt;&lt;br /&gt;For comparison:&lt;br /&gt;ATI 5870:&lt;br /&gt;Shader Units: 1600 Shader&lt;br /&gt;Shader Clock: 850 MHz &lt;br /&gt;GFlops SP: 2.720&lt;br /&gt;GFlops DP: 544&lt;br /&gt;Max Power: 188 W&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-283272584551314088?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/283272584551314088/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2010/03/nvidia-fermi-gtx480-specs-out.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/283272584551314088'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/283272584551314088'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2010/03/nvidia-fermi-gtx480-specs-out.html' title='Nvidia Fermi, GTX480 specs out'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-8703777546786843578</id><published>2010-03-23T20:27:00.000+01:00</published><updated>2010-03-23T20:29:01.395+01:00</updated><title type='text'>Summer pause</title><content type='html'>project pauses until middle of july.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-8703777546786843578?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/8703777546786843578/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2010/03/summer-pause.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/8703777546786843578'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/8703777546786843578'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2010/03/summer-pause.html' title='Summer pause'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-8505785332557879225</id><published>2010-03-23T19:25:00.001+01:00</published><updated>2010-03-23T19:28:59.410+01:00</updated><title type='text'>Non-Recursive AlphaBeta</title><content type='html'>Figured it out how to put Alpha Beta in a while-loop. Next step will be to code all the C-functions like Move-Generation and Evaluation in OpenCL and to implement an effective Load-Balancing mechanism...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-8505785332557879225?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/8505785332557879225/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2010/03/non-recursive-alphabeta.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/8505785332557879225'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/8505785332557879225'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2010/03/non-recursive-alphabeta.html' title='Non-Recursive AlphaBeta'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-5358429022031881971</id><published>2010-03-18T20:02:00.000+01:00</published><updated>2010-03-18T20:08:40.025+01:00</updated><title type='text'>NegaMax on SIMD</title><content type='html'>Found a paper from Holger Hopp and Peter Sanders describing an YBWC-AlphaBeta implementation on SIMD devices. They built a stack mechanism for non-recursive alphabeta, but they although use communication between threads for their master/slave concept.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-5358429022031881971?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/5358429022031881971/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2010/03/negamax-on-simd.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/5358429022031881971'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/5358429022031881971'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2010/03/negamax-on-simd.html' title='NegaMax on SIMD'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-54114143740032448</id><published>2010-03-13T22:37:00.000+01:00</published><updated>2010-03-13T22:51:37.213+01:00</updated><title type='text'>OpenCL Negamax search</title><content type='html'>Is hard to implement.&lt;br /&gt;In OpenCL work-groups can not be synced with other work-groups during the calculation, only work-items as a member of a work-group can be synced. So if one work-group computes the moves of one board, there is no way to tell unused work-groups on the fly to take these generated moves and compute further.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-54114143740032448?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/54114143740032448/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2010/03/opencl-negamax-search.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/54114143740032448'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/54114143740032448'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2010/03/opencl-negamax-search.html' title='OpenCL Negamax search'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-2046363078451895147</id><published>2010-03-12T17:48:00.001+01:00</published><updated>2010-03-12T17:50:42.214+01:00</updated><title type='text'>Zeta, GPU version 0.8.0.2</title><content type='html'>thinking about parallizing the threads...&lt;br /&gt;maybe one board by compting unit of the OpenCL device and 6x64 threads for move generation?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-2046363078451895147?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/2046363078451895147/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2010/03/zeta-gpu-version-0802.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/2046363078451895147'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/2046363078451895147'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2010/03/zeta-gpu-version-0802.html' title='Zeta, GPU version 0.8.0.2'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-2703937902341084213</id><published>2010-03-09T16:14:00.000+01:00</published><updated>2010-03-13T22:48:40.942+01:00</updated><title type='text'>Move Generation on GPU</title><content type='html'>I managed to generate some moves on the GPU, but the communication delay between HOST and GPU makes this aproach too slow for an Chess Engine. The point is to package the hole Search Algorithm on the GPU, not only some parts like move generation.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-2703937902341084213?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/2703937902341084213/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2010/03/move-generation-on-gpu.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/2703937902341084213'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/2703937902341084213'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2010/03/move-generation-on-gpu.html' title='Move Generation on GPU'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-6171716198747971367</id><published>2010-03-07T21:26:00.001+01:00</published><updated>2010-03-07T21:27:35.915+01:00</updated><title type='text'>Zeta, GPU version 0.8.0.1</title><content type='html'>CPU Version is running, so i start coding in OpenCl a GPU-Version...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-6171716198747971367?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/6171716198747971367/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2010/03/zeta-gpu-version-0801.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/6171716198747971367'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/6171716198747971367'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2010/03/zeta-gpu-version-0801.html' title='Zeta, GPU version 0.8.0.1'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-7362153259529022420</id><published>2010-03-06T16:27:00.000+01:00</published><updated>2010-03-09T16:55:11.544+01:00</updated><title type='text'>Zeta, CPU version 0.7.0.4</title><content type='html'>Move generation is complete. Looks a lot of faster than version 0.6.0.6. But move ordering is still buggy.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-7362153259529022420?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/7362153259529022420/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2010/03/zeta-cpu-version-0704.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/7362153259529022420'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/7362153259529022420'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2010/03/zeta-cpu-version-0704.html' title='Zeta, CPU version 0.7.0.4'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-4933967719030143609</id><published>2010-03-01T16:58:00.000+01:00</published><updated>2010-03-01T17:00:26.376+01:00</updated><title type='text'>Zeta, CPU version 0.7.0.3</title><content type='html'>General Movegeneration and checkmate funtions are ready.&lt;br /&gt;castling, en passant and pawn promotion are missing...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-4933967719030143609?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/4933967719030143609/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2010/03/zeta-cpu-version-0703.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/4933967719030143609'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/4933967719030143609'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2010/03/zeta-cpu-version-0703.html' title='Zeta, CPU version 0.7.0.3'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-2047827204976231446</id><published>2010-02-27T21:10:00.000+01:00</published><updated>2010-02-27T21:13:12.981+01:00</updated><title type='text'>Zeta, CPU version 0.7.0.2</title><content type='html'>Sliders are on their ways (thanx to Stockfish for the hashtables!), thinking know about an effective method for checking check and checkmate...ahhh 12x10 array was kindergarten...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-2047827204976231446?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/2047827204976231446/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2010/02/zeta-cpu-version-0702.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/2047827204976231446'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/2047827204976231446'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2010/02/zeta-cpu-version-0702.html' title='Zeta, CPU version 0.7.0.2'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-2220205716343360977</id><published>2010-02-27T09:32:00.001+01:00</published><updated>2010-03-09T16:54:05.432+01:00</updated><title type='text'>Zeta, CPU version 0.7.0.1</title><content type='html'>The CPU BitBoard Edition of Zeta is now able to genereate Pawn, Knight and King moves, the more advanced Magic BitBoard technic for sliding pieces will follow....&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-2220205716343360977?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/2220205716343360977/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2010/02/zeta-cpu-version-0701.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/2220205716343360977'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/2220205716343360977'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2010/02/zeta-cpu-version-0701.html' title='Zeta, CPU version 0.7.0.1'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1356846300156443183.post-3463309255631791085</id><published>2010-02-25T18:06:00.000+01:00</published><updated>2010-03-09T16:53:43.086+01:00</updated><title type='text'>Zeta CPU version 0.7.0.1</title><content type='html'>Becouse of speed issueses i am going to change the board presentation from 12x10 array to Magic BitBoards.&lt;br /&gt;&lt;br /&gt;I hope i will finish the CPU Version soon, so i can focus on developing a search algorithm for GPUs.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1356846300156443183-3463309255631791085?l=zeta-chess.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://zeta-chess.blogspot.com/feeds/3463309255631791085/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://zeta-chess.blogspot.com/2010/02/zeta-version-0701-new-attempt.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/3463309255631791085'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1356846300156443183/posts/default/3463309255631791085'/><link rel='alternate' type='text/html' href='http://zeta-chess.blogspot.com/2010/02/zeta-version-0701-new-attempt.html' title='Zeta CPU version 0.7.0.1'/><author><name>Srdja Matovic</name><uri>http://www.blogger.com/profile/05576787940760478344</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry></feed>
