top of page
Search
Writer's pictureDR.GEEK

Optimization through L3 Cache (SHA256D)

( 19th September 2019 )

INTRODUCTION

SHA-2 is the Secure Hash Algorithms required by in U.S. Government applications, including other cryptographic algorithms and protocols, protecting the sensitive unclassified information. SHA-2 is one-way compression function using the Davies-Meyer structure therefore, it is utilized in secured network communications and Block chain applications. In March 2012, the standard was updated in FIPS PUB 180-4, a restriction on padding the input data prior to hash calculation was removed, allowing hash data to be calculated simultaneously with content generation. This updates generates a real-time video or audio feed applications. The SHA-2 hash function is implemented in some widely used security applications and protocols, including TLS and SSL, PGP, SSH, S/MIME, and IPsec too. The performance numbers labeled 'x86' were running using 32-bit code on 64-bit processors, whereas the 'x86-64' numbers are native 64-bit code. SHA-256 is designed for 32-bit calculations, it does benefit from code optimized for 64-bit processors on the x86 architecture. 32-bit implementations of SHA-512 are significantly slower than their 64-bit counterparts. The MiB/S performance is extrapolated from the CPU clock speed on a single core; real-world performance will vary due to variety of actors. The crypto currency mining ware requires adequate hash value by the level of difficulty of each crypto currency. The requirement of mining market, more hash rate is required.

Therefore, the processing unit is changed from CPU to GPU, FPGA and ASIC. The hash rate calculation needs the computation power by processors but there are high electricity consuming task. This proposes L3 cache in the system of mining ware computer using SHA-2 algorithm. L3 cache was proposed by Dr. Hironao Takahashi on Autonomous Decentralized System in 2008. It also implemented Autonomous Decentralized Multi Layered System for low latency applications. The concept of L3 cache is physical location arrangement for operands in instruction of program. Hash calculation is basically CPU intensive program but if it becomes Multi Thread Level Parallelism processes, it needs effective memory management for each process. We evaluate Pool Mining and get the results using L3 cache configuration system.

  • IMPLEMENTATION OF L3 CACHE INTO MINING SOFTWARE

We investigate the source code of Pooler CPU miner to optimize Thread Level Parallelism and modified the code adding on calling sequence for L3 cache layer as Fig-3.1. The original code gets hash block from Crypto currency network and generates multiple processes for available processors. We optimize it to run L3 cache memory space. L3 cache is device driver which is Kernel Program. We modified this device driver for the latest distribution of Ubuntu Linux.


Fig-3.1: Implementation of L3 Cache into Pooler CPU miner program on Linux

The gain calculation by cache memory was obtained by Amdahl’s Law as following:


Where,

Gi = Gain due to cache at level i

Ci = Cache size ratio (hit rate) at level i

Li,j = Overhead factor j, at cache level i,

Xi = Cache speed ratio of lower level storage media to higher-level storage media at level i

Processor has register and L1/L2 Caches. The case of GPU has own local memory on the GPU daughter card. The case of multiple GPU card configuration, each GPU snooping each local memory by messages but it generates processing stall. Therefore, still memory management is one of key factor of performance degradation. The search algorithm of L3 cache utilizes hash based quickly search model that to reduce the latency time of processing unit. The L3 cache consisting of blocks is divided into groups where each group contains equal number of blocks. The jumping algorithm first points to the required group for which that sector corresponds and then searches to the required block of data in the group. Jumping algorithm enhances 30% higher search time than standard sequential search technique for sequential read requests. If the read requests are randomly distributed, then more advantage is expected than standard search. Investigation for the value of number of column for block IO data is needed to be carried out. In the current paper the value is 32, but if CPU performance is higher, then 64 or 128 may be selected as a trade-off depends upon the CPU specifications.

0 views0 comments

Recent Posts

See All

Comentarios


bottom of page