Role of Cache Memory Coherence in Shared Memory Architecture
By
Rana Sohail
MSCS (Networking), MIT
Abstract— Communication among multi-processors is established through shared memory system which uses shared address space. Resultantly, memory traffic and memory latency increases. To increase the speed and decrease the memory traffic, a cache memory (buffer) is used which enhances the performance of system. Caches are helpful but require coherence among each other once functioning in multi-processors. This paper presents the cache coherence problems, the available solutions and recommends measures in the real world scenario.
Keywords— Multi-processor, shared memory, cache memory, cache coherence
I. INTRODUCTION
Cache can be defined
as a unit which is used to store the data. The data which is frequently
accessed and required in the near future is made available at cache level and
no need to access the main memory. This saves the time and effort of the user. Once
the cache is approached for data and it is available then it is said to be
‘cache hit’ and if the data is not there then it is called as ‘cache miss’. The
data which is not available at cache is then collected from the main memory.
The speed and performance is considered fast if maximum requests are made
available by the cache memory as in [1].
Cache is further
divided into two groups namely; initiative cache and passive cache. The
initiative cache must entertain all the requests either from its location in
case data is available or gets it from main memory and sends the data to the
users. The passive cache entertains the request only in case the data is
available with itself otherwise does nothing as in [2].
The paper has been
distributed into sections. In section II, cache and its classification is
defined. Section III describes the shared memory architecture. In section IV,
cache coherence and some common issues are highlighted. Section V gives out the
related work on the subject. In section VI, the paper will be concluded.
II. CACHE MEMORY – CLASSIFICATIONS
There are
number of uses of cache memory therefore it is classified according to its
uses. These classifications are helpful to two events namely; reading the data
and calculating the data. The purpose of cache memory is to save the time of
users and operating system as well as in [2]. The classification of cache
memory is explained in subsequent paragraphs:--
A. Local Cache
It is very small in size and present in the memory. Once the same resource is requested by multiple users then it plays its role to avoid such requests. Generally it is a hash table which is in the code of application program. Its function can be explained by an example; if names of the users are to be shown without having any knowledge of the names but the Ids only then local cache memory is the best solution to it.
It is very small in size and present in the memory. Once the same resource is requested by multiple users then it plays its role to avoid such requests. Generally it is a hash table which is in the code of application program. Its function can be explained by an example; if names of the users are to be shown without having any knowledge of the names but the Ids only then local cache memory is the best solution to it.
B. Local Shared – Memory Cache
It is a medium size locally shared memory
which is applicable to the semi static and small data storage. It is best in
performance once the user has to visit the data very fast.
C. Distributed Cache Memory
It is large size cache memory which can be
extendable at the time of requirement. The data in the cache memory is created
only for the first time and when the data is cached at different places then
problem of data inconsistency is not there.
D. Disk Cache
Since the disk has relatively slow speed
component therefore it is mainly appropriate for constant objects to be the
disk cache. It is very useful for capturing lost cache through the procedure of
error 404.
III. SHARED MEMORY ARCHITECTURE
A memory once
approached by multiple programs then it is shared by all. All such programs have the intentions either to communicate among each other or to avoid
redundant copies. Inter – program data traveling is very easy by shared memory
architecture. It has got two aspects; the hardware and software perspective as
explained in [3] which are as under:-
A. Hardware Perspective
The shared memory in a
multiple processor system is a large block of RAM (random access memory) which
is approached by multiple CPUs (central processing units). Since all programs
share the single view of data therefore a shared memory system is very easy to
program. Since many CPUs try
to access the shared memory therefore two main issues arises:-
1) CPU
to Memory Connection Bottleneck: Since
a large number of processors approach for the collection of data from the
shared memory and the connection between the CPU and shared memory does not
have much of space so the situation of bottleneck is obvious.
2) Cache
Coherence: There are number
of cache memory architectures which are accessed by multiple processors. Once
any of them is updated and that has to be used by other processors then such a
change should be reflected to other processors as well. Otherwise other
processors would be working with incoherent data.
B. Software Perspective
The
software perspective in a shared memory can be explained as under:-
1) Inter
Process Communication (IPC): It
means a simultaneous process of exchanging the data by the processors which are
running parallel. RAM is the place where if one process makes an area then
others are at liberty to access that area.
2) Conserving Memory Space: Here
the shared memory is used as a method to preserve a memory space for the data.
C. Centralized Shared Memory Architecture
This kind of
architecture has got few processors chips which have small processor counts and
these processors share a single centralized memory. It has large cache which is
linked with memory bus. The memory bus joins the processors with main memory as
in [4]. Figure 1 shows the
Centralized Shared
Memory Architecture.
D. Distributed Shared Memory Architecture
This
kind of architecture has got multiprocessors where the memory is distributed
among them. The memory requirement of these processors increased so a
distributed shared memory approach would be more appropriate as in [4] it is shown
in figure 2.
IV. CACHE COHERENCE AND COMMON ISSUES
A. Definition
Once the data is stored in the local cache of the shared resources then the cache coherence is determined. Here the only problem which could hinder is the data inconsistency. If one client attains copy of memory block which is updated by another client and in case this new updated copy of that memory block is not shared with others then data inconsistency would occur. The solution of such problem is that local cache memory of every client should be inter-related among each other and that is possible by coherency mechanism. Figure 3 shows the multiple caches of shared resources.
B. Importance
of Cache Coherence
The
importance of cache coherence as explained in [5] can be determined through
following:-
1) Consistency being the most
important factor is considered.
2) Multiprocessors perform its
task on the shared bus system. The bus is always full of data travelling
traffic. The local and private cache simultaneously works on it and work load
is tremendously reduced.
3) The shared bus system is
always monitored by the cache controller which keeps an eye over all the
transactions and takes action as per the instructions.
4) All the cache coherence
protocols are bound to have the specification of block state in the local cache
for future requirement.
C. Achieving Cache Coherence
The
completeness of the process is done through four actions which are as under:-
1) Read
Hit
2) Read
Miss
3) Write
Hit
4) Write
Miss
D. Common Issues
There are number of problem areas where cache
coherence needs to be improved and addressed. Following are the areas
identified as common issues:-
1) Performance: The performance of the computer is always
related to the multiprocessors and running programs. All the programs wanted to
get priority and overload the bus system.
2) Processors Stalls: The
cache receive the data from input device and the output device read out the
data from it. Both I/O devices and processor observe the same data. Once the
processor stalls due to some dependency of structure, data or control then
problem occurs.
3) State – Data Problem: The
I/O devices deal with main memory and no problem is observed. But when I/O devices
deal with the cache memory then state- data problem arises.
E. Recommendations
There are certain solutions
available which address the issues comfortably. Following policies explains the
issue in detail as elaborated in [4]:-
1) Write Back: It
is also known as write behind where at the initial stage the writing is done to
cache only. Once the data is to be replaced or modified by some new contents
than the cache block data is amended and writing to the backing store is also
carried out. It has got different implementation which is more complex as
compared to the others. It keeps the track of those locations which are to be
over written in near future and has to mark them as ‘dirty’ for later writing
to backing store. Once the data in the cache are to be evicted then same data
is required to be written to the backing store. It uses the ‘write allocate’
once confirmed that the subsequent write will be written to the same
location.
2) Write Through: This is a situation where write is carried out to the cache and backing store at once. It uses the ‘no write allocate’ as there is no requirement of subsequent write.
2) Write Through: This is a situation where write is carried out to the cache and backing store at once. It uses the ‘no write allocate’ as there is no requirement of subsequent write.
3) Directory Based Protocol: In
this protocol there is a directory which have the shared data of all the
processor caches. The directory behaves like a lookup table where all processor
look for data updating. The directories keeps the record as a pointer which
contains a dirty bit specifying about permissions. This is further categorized
as full map, limited and chained directories as in [6].
4) Snooping Based Protocol: The
cache has to monitor the address lines of a shared bus for all the memory
accesses made by the processors. It has two categories known as ‘write
invalidate’ and ‘write update’ as in [6].
V.
RELATED WORK
There are number of research work
which has already been done in the past which is related to what is highlighted
in this paper. These are as under:-
A.
Write Back
Lei Li et al. researched over the
memory friendly write back and the pre-fetch policies through an experiment
which showed the overall improved performance without a last level cache as in
[7].
Eager write back reduced the write-induced
interference. It writes back the dirty cache blocks in the least recently used
(LRU) position and this done once the bus is idle as in [8].
B.
Write Through
Inderpreet et al. researched on graphical processor units (GPU) where
‘Write through’ protocols performance was better than ‘Write back’ protocol.
Write back had a drawback of increased traffic as in [9].
P. Bharathi et al. worked over Cache architecture which was a way tagged
cache. It showed an improvement of write through caches’ energy efficiency as
in [10].
C.
Directory Based
Protocol
Stanford Dash worked on the
directory based system where every node of directory was processor’s cluster
containing a portion of complete memory as in [11].
Scott et al. researched on large
scale multiprocessors where state and message overhead are reduced as in [12].
Hierarchical DDM design was designed
over the directories hierarchy system every level used a bus with snoop
operation as in [13].
Compaq Piranha implemented the
hierarchy directory coherence which has an on chip crossbar as in [14].
D.
Snooping Based
Protocol
Barraso et al. worked on the
greedily ordered protocol which carried out a comparison on the directory based
ring and split transaction bus protocols as explained in [15, 16].
IBM’s Power 4 and Power 5 also
researched over the combined snooping response on the ring. On coherence
conflict the node retries as in [17, 18 and 19].
Strauss et al. worked on the
flexible snooping based on bus. Here snoop is performed first and then request
is forwarded to the next node in the ring and in this saved time. As in [20].
vI.
CONCLUSION
Cache
memory is a link between the main memory and processor and its main aim is to
save time of the users and the system. The multiprocessors have their own
memory which is locally updated by the processors. Once the data is updated
then the main memory has to be updated as well. The role of cache coherence is
very important in this regard. The paper has been organized to highlight the
working and importance of centralized and distributed shared memory
architectures and how cache coherence could be achieved.
REFERENCES
[1] Definition
Cache website. [Online]. Available: http://en.wikipedia.org/wiki/Cache_(computing)
[2] Zheng
Ying, “Research on the Role of Cache and Its Control Policies in Software
Application Level Optimization”, Inner Mongolia University for Nationalities,
China, 2012, pp. 18-20.
[3] Shared
Memory website [Online] Available: http://en.wikipedia.org/wiki/Shared_memory
[4] Sujit
Deshpande, Priya Ravale et. al. “Cache Coherence in Centralized Shared Memory
and Distributed Shared Memory Architectures”, Solapur University, India, 2010, pp.
40.
[5] James
Archibald and Jean-Loup Baer, “Cache Coherence Protocols: Evaluation Using a Multiprocessor Simulation
Model”, University of Washington, USA, 1986, pp.274-282.
[6] Samaher, S.Soomro,
et al. “Snoopy and Directory Based Cache
Coherence Protocols: A Critical Analysis”, Journal of Information &
Communication Technology Vol. 4, No. 1, Saudi Arabia, 2010.
[7] Lei Li, Wei
Zhang et al. “Cache Performance Optimization for SoC Vedio Applications”
Journal of Multimedia, Vol 9, No.7, 2014, China, pp. 926-933.
[8] Lee, Tyson,
et al “Eager writeback - a technique for improving bandwidth utilization”
Proceedings-33rd annual ACM/IEEE intl. symposium on Microarchitecture, USA,
2000, pp. 11–21.
[9] Inderpreet,
Arrvindh et al. “Cache Coherence for GPU Architectures” 2013.
[10] P.Bharathi
and Praveen, “Way Tagged L2 Cache Architecture UnderWrite through Policy” IJECEAR
Vol. 2, SP-1, USA, 2014, pp. 86-89.
[11] D.
Lenoski, J. Laudon, et al. “The Stanford
DASH Multiprocessor” IEEE Computer, 25(3):63–79, Mar. 1992.
[12] S. L.
Scott and J. R. Goodman, “Performance of Pruning-Cache Directories for
Large-Scale Multiprocessors” IEEE Transactions on Parallel and Distributed
Systems, 4(5):520–534, May 1993.
[13] E.
Hagersten, A. Landin, et al. “DDM–A Cache-Only Memory Architecture” IEEE
Computer, 25(9):44–54, Sept. 1992.
[14] L. A.
Barroso, K. Gharachorloo, et al. “Piranha: A Scalable Architecture Based on
Single-Chip Multiprocessing” In Proceedings 27th Anu. Intl.
Symposium on Computer Architecture, 2000, pp. 282–293.
[15] L. A.
Barroso and M. Dubois, “Cache Coherence on a Slotted Ring” In Proceedings of
Intl. Conf. on Parallel Processing, 1991, pp. 230–237.
[16] L. A.
Barroso and M. Dubois, “The Performance of Cache-Coherent Ring-based Multiprocessors”
In Proceedings of 20th Anu. Intl. Symposium on Computer
Architecture, 1993, pp. 268–277.
[17] J. M.
Tendler, S. Dodson, et al. “POWER4 System Microarchitecture” IBM Server Group
Whitepaper, Oct. 2001.
[18] B.
Sinharoy, R. Kalla, et al. “Power5 System Microarchitecture” IBM Journal of
Research and Development, 49(4), 2005.
[19] S. Kunkel.
“IBM Future Processor Performance” Server Group. Personal Communication, 2006
[20] K.
Strauss, X. Shen, et al. “Flexible Snooping: Adaptive Forwarding and Filtering
of Snoops in Embedded-Ring Multiprocessors” In Proceedings of 33rd Anu.
Intl. Symposium on Computer Architecture, Jun 2006.
No comments:
Post a Comment