Tuesday, 11 February 2020

Cache Memory Coherence

Role of Cache Memory Coherence in Shared Memory Architecture
By
Rana Sohail
MSCS (Networking), MIT



Abstract— Communication among multi-processors is established through shared memory system which uses shared address space. Resultantly, memory traffic and memory latency increases. To increase the speed and decrease the memory traffic, a cache memory (buffer) is used which enhances the performance of system. Caches are helpful but require coherence among each other once functioning in multi-processors.  This paper presents the cache coherence problems, the available solutions and recommends measures in the real world scenario.

Keywords Multi-processor, shared memory, cache memory, cache coherence


I.      INTRODUCTION

     Cache can be defined as a unit which is used to store the data. The data which is frequently accessed and required in the near future is made available at cache level and no need to access the main memory. This saves the time and effort of the user. Once the cache is approached for data and it is available then it is said to be ‘cache hit’ and if the data is not there then it is called as ‘cache miss’. The data which is not available at cache is then collected from the main memory. The speed and performance is considered fast if maximum requests are made available by the cache memory as in [1].
      Cache is further divided into two groups namely; initiative cache and passive cache. The initiative cache must entertain all the requests either from its location in case data is available or gets it from main memory and sends the data to the users. The passive cache entertains the request only in case the data is available with itself otherwise does nothing as in [2].
      The paper has been distributed into sections. In section II, cache and its classification is defined. Section III describes the shared memory architecture. In section IV, cache coherence and some common issues are highlighted. Section V gives out the related work on the subject. In section VI, the paper will be concluded.     



II.    CACHE MEMORY – CLASSIFICATIONS



There are number of uses of cache memory therefore it is classified according to its uses. These classifications are helpful to two events namely; reading the data and calculating the data. The purpose of cache memory is to save the time of users and operating system as well as in [2]. The classification of cache memory is explained in subsequent paragraphs:--

A.    Local Cache
It is very small in size and present in the memory. Once the same resource is requested by multiple users then it plays its role to avoid such requests. Generally it is a hash table which is in the code of application program. Its function can be explained by an example; if names of the users are to be shown without having any knowledge of the names but the Ids only then local cache memory is the best solution to it.

B.    Local Shared – Memory Cache
It is a medium size locally shared memory which is applicable to the semi static and small data storage. It is best in performance once the user has to visit the data very fast.
C.    Distributed Cache Memory
It is large size cache memory which can be extendable at the time of requirement. The data in the cache memory is created only for the first time and when the data is cached at different places then problem of data inconsistency is not there.
D.   Disk Cache
Since the disk has relatively slow speed component therefore it is mainly appropriate for constant objects to be the disk cache. It is very useful for capturing lost cache through the procedure of error 404.       

III.      SHARED MEMORY ARCHITECTURE


A memory once approached by multiple programs then it is shared by all. All such programs have the intentions either to communicate among each other or to avoid redundant copies. Inter – program data traveling is very easy by shared memory architecture. It has got two aspects; the hardware and software perspective as explained in [3] which are as under:-
A. Hardware Perspective
The shared memory in a multiple processor system is a large block of RAM (random access memory) which is approached by multiple CPUs (central processing units). Since all programs share the single view of data therefore a shared memory system is very easy to program. Since many CPUs try to access the shared memory therefore two main issues arises:-
1) CPU to Memory Connection Bottleneck: Since a large number of processors approach for the collection of data from the shared memory and the connection between the CPU and shared memory does not have much of space so the situation of bottleneck is obvious.
2) Cache CoherenceThere are number of cache memory architectures which are accessed by multiple processors. Once any of them is updated and that has to be used by other processors then such a change should be reflected to other processors as well. Otherwise other processors would be working with incoherent data.
B. Software Perspective   
The software perspective in a shared memory can be explained as under:-
1) Inter Process Communication (IPC)It means a simultaneous process of exchanging the data by the processors which are running parallel. RAM is the place where if one process makes an area then others are at liberty to access that area.
2) Conserving Memory SpaceHere the shared memory is used as a method to preserve a memory space for the data.               
C. Centralized Shared Memory Architecture
This kind of architecture has got few processors chips which have small processor counts and these processors share a single centralized memory. It has large cache which is linked with memory bus. The memory bus joins the processors with main memory as in [4]. Figure 1 shows the Centralized Shared Memory Architecture.

D.    Distributed Shared Memory Architecture

This kind of architecture has got multiprocessors where the memory is distributed among them. The memory requirement of these processors increased so a distributed shared memory approach would be more appropriate as in [4] it is shown in figure 2.



IV.           CACHE COHERENCE AND COMMON ISSUES

A.    Definition
Once the data is stored in the local cache of the shared resources then the cache coherence is determined. Here the only problem which could hinder is the data inconsistency. If one client attains copy of memory block which is updated by another client and in case this new updated copy of that memory block is not shared with others then data inconsistency would occur. The solution of such problem is that local cache memory of every client should be inter-related among each other and that is possible by coherency mechanism. Figure 3 shows the multiple caches of shared resources.


B. Importance of Cache Coherence
The importance of cache coherence as explained in [5] can be determined through following:-
1) Consistency being the most important factor is considered.
2) Multiprocessors perform its task on the shared bus system. The bus is always full of data travelling traffic. The local and private cache simultaneously works on it and work load is tremendously reduced.
3) The shared bus system is always monitored by the cache controller which keeps an eye over all the transactions and takes action as per the instructions.
4) All the cache coherence protocols are bound to have the specification of block state in the local cache for future requirement.
C. Achieving Cache Coherence
The completeness of the process is done through four actions which are as under:-
1) Read Hit
2) Read Miss
3) Write Hit
4) Write Miss
D. Common Issues
There are number of problem areas where cache coherence needs to be improved and addressed. Following are the areas identified as common issues:-
1) Performance: The performance of the computer is always related to the multiprocessors and running programs. All the programs wanted to get priority and overload the bus system.
2) Processors Stalls: The cache receive the data from input device and the output device read out the data from it. Both I/O devices and processor observe the same data. Once the processor stalls due to some dependency of structure, data or control then problem occurs.  
3) State – Data Problem: The I/O devices deal with main memory and no problem is observed. But when I/O devices deal with the cache memory then state- data problem arises.
E. Recommendations
There are certain solutions available which address the issues comfortably. Following policies explains the issue in detail as elaborated in [4]:-
1) Write Back: It is also known as write behind where at the initial stage the writing is done to cache only. Once the data is to be replaced or modified by some new contents than the cache block data is amended and writing to the backing store is also carried out. It has got different implementation which is more complex as compared to the others. It keeps the track of those locations which are to be over written in near future and has to mark them as ‘dirty’ for later writing to backing store. Once the data in the cache are to be evicted then same data is required to be written to the backing store. It uses the ‘write allocate’ once confirmed that the subsequent write will be written to the same location.       
2) Write Through: This is a situation where write is carried out to the cache and backing store at once. It uses the ‘no write allocate’ as there is no requirement of subsequent write.
3) Directory Based Protocol: In this protocol there is a directory which have the shared data of all the processor caches. The directory behaves like a lookup table where all processor look for data updating. The directories keeps the record as a pointer which contains a dirty bit specifying about permissions. This is further categorized as full map, limited and chained directories as in [6]. 
4) Snooping Based Protocol: The cache has to monitor the address lines of a shared bus for all the memory accesses made by the processors. It has two categories known as ‘write invalidate’ and ‘write update’ as in [6]. 

V.       RELATED WORK

There are number of research work which has already been done in the past which is related to what is highlighted in this paper. These are as under:-
A.    Write Back
Lei Li et al. researched over the memory friendly write back and the pre-fetch policies through an experiment which showed the overall improved performance without a last level cache as in [7].
Eager write back reduced the write-induced interference. It writes back the dirty cache blocks in the least recently used (LRU) position and this done once the bus is idle as in [8].
B.    Write Through
Inderpreet et al. researched on graphical processor units (GPU) where ‘Write through’ protocols performance was better than ‘Write back’ protocol. Write back had a drawback of increased traffic as in [9].
P. Bharathi et al. worked over Cache architecture which was a way tagged cache. It showed an improvement of write through caches’ energy efficiency as in [10].
C.    Directory Based Protocol
Stanford Dash worked on the directory based system where every node of directory was processor’s cluster containing a portion of complete memory as in [11].
Scott et al. researched on large scale multiprocessors where state and message overhead are reduced as in [12].  
Hierarchical DDM design was designed over the directories hierarchy system every level used a bus with snoop operation as in [13].
Compaq Piranha implemented the hierarchy directory coherence which has an on chip crossbar as in [14].
D.   Snooping Based Protocol
Barraso et al. worked on the greedily ordered protocol which carried out a comparison on the directory based ring and split transaction bus protocols as explained in [15, 16].
IBM’s Power 4 and Power 5 also researched over the combined snooping response on the ring. On coherence conflict the node retries as in [17, 18 and 19].
Strauss et al. worked on the flexible snooping based on bus. Here snoop is performed first and then request is forwarded to the next node in the ring and in this saved time. As in [20].

vI.    CONCLUSION
       Cache memory is a link between the main memory and processor and its main aim is to save time of the users and the system. The multiprocessors have their own memory which is locally updated by the processors. Once the data is updated then the main memory has to be updated as well. The role of cache coherence is very important in this regard. The paper has been organized to highlight the working and importance of centralized and distributed shared memory architectures and how cache coherence could be achieved.



REFERENCES
[1] Definition Cache website. [Online]. Available: http://en.wikipedia.org/wiki/Cache_(computing)
[2] Zheng Ying, “Research on the Role of Cache and Its Control Policies in Software Application Level Optimization”, Inner Mongolia University for Nationalities, China, 2012, pp. 18-20.
[3] Shared Memory website [Online] Available: http://en.wikipedia.org/wiki/Shared_memory
[4] Sujit Deshpande, Priya Ravale et. al. “Cache Coherence in Centralized Shared Memory and Distributed Shared Memory Architectures”, Solapur University, India, 2010, pp. 40.
[5] James Archibald and Jean-Loup Baer, “Cache Coherence Protocols:  Evaluation Using a Multiprocessor Simulation Model”, University of Washington, USA, 1986, pp.274-282.
[6] Samaher, S.Soomro, et al.  “Snoopy and Directory Based Cache Coherence Protocols: A Critical Analysis”, Journal of Information & Communication Technology Vol. 4, No. 1, Saudi Arabia, 2010.
[7] Lei Li, Wei Zhang et al. “Cache Performance Optimization for SoC Vedio Applications” Journal of Multimedia, Vol 9, No.7, 2014, China, pp. 926-933.
[8] Lee, Tyson, et al “Eager writeback - a technique for improving bandwidth utilization” Proceedings-33rd annual ACM/IEEE intl. symposium on Microarchitecture, USA, 2000, pp. 11–21.
[9] Inderpreet, Arrvindh et al. “Cache Coherence for GPU Architectures” 2013.
[10] P.Bharathi and Praveen, “Way Tagged L2 Cache Architecture UnderWrite through Policy” IJECEAR Vol. 2, SP-1, USA, 2014, pp. 86-89.
[11] D. Lenoski, J. Laudon, et al.  “The Stanford DASH Multiprocessor” IEEE Computer, 25(3):63–79, Mar. 1992.
[12] S. L. Scott and J. R. Goodman, “Performance of Pruning-Cache Directories for Large-Scale Multiprocessors” IEEE Transactions on Parallel and Distributed Systems, 4(5):520–534, May 1993.
[13] E. Hagersten, A. Landin, et al. “DDM–A Cache-Only Memory Architecture” IEEE Computer, 25(9):44–54, Sept. 1992.
[14] L. A. Barroso, K. Gharachorloo, et al. “Piranha: A Scalable Architecture Based on Single-Chip Multiprocessing” In Proceedings 27th Anu. Intl. Symposium on Computer Architecture, 2000, pp. 282–293.
[15] L. A. Barroso and M. Dubois, “Cache Coherence on a Slotted Ring” In Proceedings of Intl. Conf. on Parallel Processing, 1991, pp. 230–237.
[16] L. A. Barroso and M. Dubois, “The Performance of Cache-Coherent Ring-based Multiprocessors” In Proceedings of 20th Anu. Intl. Symposium on Computer Architecture, 1993, pp. 268–277.
[17] J. M. Tendler, S. Dodson, et al. “POWER4 System Microarchitecture” IBM Server Group Whitepaper, Oct. 2001.
[18] B. Sinharoy, R. Kalla, et al. “Power5 System Microarchitecture” IBM Journal of Research and Development, 49(4), 2005.
[19] S. Kunkel. “IBM Future Processor Performance” Server Group. Personal Communication, 2006
[20] K. Strauss, X. Shen, et al. “Flexible Snooping: Adaptive Forwarding and Filtering of Snoops in Embedded-Ring Multiprocessors” In Proceedings of 33rd Anu. Intl. Symposium on Computer Architecture, Jun 2006.

No comments:

Post a Comment

Phonemic Learning – An In-Depth Study Introduction Learning, a non-ending phenomenon starts from the cradle and ends in the grave. Huma...

Popular Posts