A predecoding technique for ILP exploitation in Java processors

Sideris, I; Pekmestzi, K; Economakos, G

dc.contributor.author	Sideris, I	en
dc.contributor.author	Pekmestzi, K	en
dc.contributor.author	Economakos, G	en
dc.date.accessioned	2014-03-01T01:27:47Z
dc.date.available	2014-03-01T01:27:47Z
dc.date.issued	2008	en
dc.identifier.issn	1383-7621	en
dc.identifier.uri	https://dspace.lib.ntua.gr/xmlui/handle/123456789/18572
dc.subject	ILP	en
dc.subject	Java processor	en
dc.subject	Predecoded cache	en
dc.subject	Stack folding	en
dc.subject.classification	Computer Science, Hardware & Architecture	en
dc.subject.other	Computer programming languages	en
dc.subject.other	Decoding	en
dc.subject.other	Mathematical transformations	en
dc.subject.other	Program processors	en
dc.subject.other	Reduced instruction set computing	en
dc.subject.other	Throughput	en
dc.subject.other	Elsevier (CO)	en
dc.subject.other	Execution performance	en
dc.subject.other	hardware accelerations	en
dc.subject.other	In order	en
dc.subject.other	Instruction-level parallelism (ILP)	en
dc.subject.other	JAVA applications	en
dc.subject.other	Java bytecodes	en
dc.subject.other	Java processors	en
dc.subject.other	JAVA programs	en
dc.subject.other	Java Virtual Machine (JVM)	en
dc.subject.other	Out of order	en
dc.subject.other	Out-of-order execution	en
dc.subject.other	Predecoded cache	en
dc.subject.other	Stack folding algorithms	en
dc.subject.other	super scalar	en
dc.subject.other	Java programming language	en
dc.title	A predecoding technique for ILP exploitation in Java processors	en
heal.type	journalArticle	en
heal.identifier.primary	10.1016/j.sysarc.2008.01.008	en
heal.identifier.secondary	http://dx.doi.org/10.1016/j.sysarc.2008.01.008	en
heal.language	English	en
heal.publicationDate	2008	en
heal.abstract	Java processors have been introduced to offer hardware acceleration for Java applications. They execute Java bytecodes directly in hardware. However, the stack nature of the Java virtual machine instruction set imposes a limitation on the achievable execution performance. In order to exploit instruction level parallelism and allow out of order execution, we must remove the stack completely. This can be achieved by recursive stack folding algorithms, such as OPEX, which dynamically transform groups of Java bytecodes to RISC like instructions. However, the decoding throughputs that are obtained are limited. In this paper, we explore microarchitectural techniques to improve the decoding throughput of Java processors. Our techniques are based on the use of a predecoded cache to store the folding results, so that it could be reused. The ultimate goal is to exploit every possible instruction level parallelism in Java programs by having a superscalar out of order core in the backend being fed at a sustainable rate. With the use of a predecoded cache of 2 x 2048 entries and a 4-way superscalar core we have from 4.8 to 18.3 times better performance than an architecture employing pattern based folding. (c) 2008 Elsevier B.V. All rights reserved.	en
heal.publisher	ELSEVIER SCIENCE BV	en
heal.journalName	Journal of Systems Architecture	en
dc.identifier.doi	10.1016/j.sysarc.2008.01.008	en
dc.identifier.isi	ISI:000259160500007	en
dc.identifier.volume	54	en
dc.identifier.issue	7	en
dc.identifier.spage	707	en
dc.identifier.epage	728	en