[1]Thornton J. Considerations in Computer Design-Leading up to the Control Data 6600[Z]. 1963.
[2]Radin G. The 801 minicomputer[C].ASPLOS-I:Proceedings of the first international symposium on Architectural support for programming languages and operating systems, 1982: 39-47.
[3]Weaver D L,Germond T. The SPARC Architecture Manual, v9[Z]. SPARC International, Inc.
[4]MIPS32 Architecture. Imagination Technologies.
[5]R Kessler. The Alpha 21264 Microprocessor[J]. IEEE Micro, 1999, 19(2): 24-36.
[6]Gronowski P E, Bowhill W J, Donchin D R, Blake-Campos R P, Carlson D A, Equi E R, Loughlin B J, Mehta S, Mueller R O, Olesin A, Noorlag D J W, Preston R P.A 433-MHz 64-b quad-issue RISC microprocessor[J]. IEEE Journal of Solid-State Circuits, 1996, 31(11):1687-1696.
[7]ARM Architecture Reference Manual, ARMv7-A and ARMv7-R edition[Z]. ARM Limited.
[8]May Cathy, et al. The PowerPC Architecture: A Specification for A New Family of RISC Processors [M]. 2nd ed. Morgan Kaufmann Publishers, 1994.
[9]Schlansker,Rau. EPIC: An Architecture for Instruction-Level Parallel Processors[R]. HP Laboratories Palo Alto, HPL-1999-111, 2000-2.
[10]AMBA specification v1.0[S].
[11]HyperTransport I/O Link Specification Revision 3.10[S]. 2010.
[12]PCI Local Bus Specification Revision 2.3[S]. 2002.
[13]PCI Express 2.0 Base Specification Revision 1.0[S]. 2006.
[15]D Lenoski, J Laudon, K Gharachorloo, A Gupta, J Hennessy. The Directory-based Cache Coherence Protocol for the DASH Multiprocessor[C]. Proceedings of the 17th Annual International Symposium on Computer Architecture(ISCA), 1990: 148-159.
[16]D Chaiken, C Fields, K Kurihara, A Agarwal. Directory-based cache coherence in large-scale multiprocessors[J]. Computer, 1990, 23(6): 49-58.
[17]L Lamport. How to Make a Multiprocessor Computer That Correctly Executes Multiprocessor Programs[J]. IEEE Transactions on Computers, 1979, C-28(9): 690-691.
[18]K Li. IVY: A Shared Virtual Memory System for Parallel Computing[C]. Proceedings of the 1988 International conference on Parallel Processing, 1988, 2: 94-101.
[19]R Alverson, D Callahan, D Cummings, B Koblenz, A Porterfield, B Smith. The Tera Computer System[C]. Proceedings of the 4th International Conference on Supercomputing(ICS), 1990: 1-6.
[20]A Agarwal, R Bianchini, D Chaiken, K Johnson, D Kranz, J Kubiatowicz, B Lim, K Mackenzie, D Yeung. The MIT Alewife machine: architecture and performance[C]. Proceedings of the 22nd Annual International Symposium on Computer Architecture(ISCA), 1995: 2-13.
[21]T Anderson. The performance of spin lock alternatives for shared-money multiprocessors[J]. IEEE Transactions on Parallel and Distributed Systems, 1990, 1(1): 6-16.
[22]G Graunke, S Thakkar. Synchronization algorithms for shared-memory multiprocessors [J]. Computer, 1990, 23(6): 60-69.
[23]J M Mellor-Crummey, M L Scott. Algorithms for Scalable Synchronization on Shared-memory Multiprocessors[J]. ACM Trans. Comput. Syst., 1991, 9(1): 21-65.
[24]P C Yew, N F Tzeng, D H Lawrie. Distributing Hot-Spot Addressing in Large-Scale Multiprocessors[J]. IEEE Transactions on Computers, 1987, 36(4): 388-395.
[25]William James Dally, Brian Patrick Towles. Principles and practices of interconnection networks[M]. Elsevier, 2004.
[26]陈国良. 并行计算——结构、算法、编程[M].北京:高等教育出版社, 2011.
[29]Weiwu Hu, Fuxin Zhang, Zusong Li. Microarchitecture of the Godson-2 Processor[J]. Journal of Computer Science and Technology, 2005, 20(2): 243-249.
[30]Weiwu Hu, Jian Wang, Xiang Gao, Yunji Chen, Qi Liu, Guojie Li. Godson-3: A Scalable Multicore RISC Processor With X86 Emulation[J]. IEEE Micro, 2009, 29(2):17-29.
[31]Weiwu Hu, Ru Wang, Yunji Chen, et al. Godson-3B: A 1GHz 40W 8-Core 128GFlops Processor in 65nm CMOS[C]. Proceedings of the IEEE International Solid-State Circuit Conference(ISSCC), 2011: 76-77.
[32]Weiwu Hu, Yifu Zhang, Liang Yang, et al. Godson3B1500: A 32nm 1.35GHz 40W 172.8GFlops 8-core Processor[C]. Proceedings of the IEEE International Solid-State Circuit Conference(ISSCC), 2013: 15-17.
[33]Weiwu Hu, Liang Yang, Baoxia Fan, Huandong Wang, Yunji Chen. An 8-Core MIPS-Compatible Processor in 32/28 nm Bulk CMOS[J]. IEEE Journal of Solid-State Circuits(JSSC), 2014, 49(1): 41-49.
[35]Efraim Rotem, Alon Naveh, Doron Rajwan, Avinash Ananthakrishnan, Eliezer Weissmann. Power-management Architecture of the Intel Microarchitecture Code-name Sandy Bridge[J]. IEEE Micro, 2012, 32(2): 20-27.
[36]NVIDIA's Next Generation CUDA Compute Architecture:Fermi,whitepaper[Z].
[37]The List[EB/OL].http://www.top500.org.
[38]Doug R D, Burger D, Keckler S W, Austin T. Sim-alpha: a validated execution driven alpha 21264 simulator[R]. 2001.
[39]C Bienia, S Kumar, J P Singh, K Li. The parsec benchmark suite: Characterization and architectural implications[C]. Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, 2008.
[40]S Bird,A Phansalkar, K Lizy, A Mericas, R Indukuru. Performance characterization of SPEC CPU benchmarks on Intel's core microarchitecture based processor[R]. SPEC Benchmark Workshop, 2007-1-21.
[41]P Bose, T Conte, T Austin. Challenges in Processor Modeling and Validation[J]. IEEE Micro, 1999, 19(3).
[42]J Lilja. Simulation of computer architectures: Simula-tors, benchmarks, methodologies, and recommendations[J]. IEEE Transaction on Computers, 2006, 55.
[43]J M Anderson, et al. Continuous profiling: where have all the cycles gone[J]. ACM Trans. Comput. Syst., 1997, 15(4): 357-390.
[44]M Srinivas, B Sinharoy, et al. IBM POWER7 performance modeling, verification, and evaluation[J]. IBM Journal of Research and Development, 2011, 55(3).
[45]M Moudgill, J Wellman, J Moreno. Environment for PowerPC Microarchitecture Exploration[J]. IEEE Micro, 1999, 19(3): 15-25.
[46]R Gilad, N Ahituv. SPEC as a Performance Evaluation Measure[J]. Computer, 1995, 28(8): 33-44.
[47]N Binkert, B Beckmann, G Black. The gem5 simulator[R]. Acm Sigarch Computer Architecture News, 2011.
[48]Intel Corporation. Intel 64 and IA-32 Architectures Software Developer's Manual[Z]. 2016.