dc.contributor.author |
Sideris, I |
en |
dc.contributor.author |
Pekmestzi, K |
en |
dc.contributor.author |
Economakos, G |
en |
dc.date.accessioned |
2014-03-01T01:27:47Z |
|
dc.date.available |
2014-03-01T01:27:47Z |
|
dc.date.issued |
2008 |
en |
dc.identifier.issn |
1383-7621 |
en |
dc.identifier.uri |
https://dspace.lib.ntua.gr/xmlui/handle/123456789/18572 |
|
dc.subject |
ILP |
en |
dc.subject |
Java processor |
en |
dc.subject |
Predecoded cache |
en |
dc.subject |
Stack folding |
en |
dc.subject.classification |
Computer Science, Hardware & Architecture |
en |
dc.subject.other |
Computer programming languages |
en |
dc.subject.other |
Decoding |
en |
dc.subject.other |
Mathematical transformations |
en |
dc.subject.other |
Program processors |
en |
dc.subject.other |
Reduced instruction set computing |
en |
dc.subject.other |
Throughput |
en |
dc.subject.other |
Elsevier (CO) |
en |
dc.subject.other |
Execution performance |
en |
dc.subject.other |
hardware accelerations |
en |
dc.subject.other |
In order |
en |
dc.subject.other |
Instruction-level parallelism (ILP) |
en |
dc.subject.other |
JAVA applications |
en |
dc.subject.other |
Java bytecodes |
en |
dc.subject.other |
Java processors |
en |
dc.subject.other |
JAVA programs |
en |
dc.subject.other |
Java Virtual Machine (JVM) |
en |
dc.subject.other |
Out of order |
en |
dc.subject.other |
Out-of-order execution |
en |
dc.subject.other |
Predecoded cache |
en |
dc.subject.other |
Stack folding algorithms |
en |
dc.subject.other |
super scalar |
en |
dc.subject.other |
Java programming language |
en |
dc.title |
A predecoding technique for ILP exploitation in Java processors |
en |
heal.type |
journalArticle |
en |
heal.identifier.primary |
10.1016/j.sysarc.2008.01.008 |
en |
heal.identifier.secondary |
http://dx.doi.org/10.1016/j.sysarc.2008.01.008 |
en |
heal.language |
English |
en |
heal.publicationDate |
2008 |
en |
heal.abstract |
Java processors have been introduced to offer hardware acceleration for Java applications. They execute Java bytecodes directly in hardware. However, the stack nature of the Java virtual machine instruction set imposes a limitation on the achievable execution performance. In order to exploit instruction level parallelism and allow out of order execution, we must remove the stack completely. This can be achieved by recursive stack folding algorithms, such as OPEX, which dynamically transform groups of Java bytecodes to RISC like instructions. However, the decoding throughputs that are obtained are limited. In this paper, we explore microarchitectural techniques to improve the decoding throughput of Java processors. Our techniques are based on the use of a predecoded cache to store the folding results, so that it could be reused. The ultimate goal is to exploit every possible instruction level parallelism in Java programs by having a superscalar out of order core in the backend being fed at a sustainable rate. With the use of a predecoded cache of 2 x 2048 entries and a 4-way superscalar core we have from 4.8 to 18.3 times better performance than an architecture employing pattern based folding. (c) 2008 Elsevier B.V. All rights reserved. |
en |
heal.publisher |
ELSEVIER SCIENCE BV |
en |
heal.journalName |
Journal of Systems Architecture |
en |
dc.identifier.doi |
10.1016/j.sysarc.2008.01.008 |
en |
dc.identifier.isi |
ISI:000259160500007 |
en |
dc.identifier.volume |
54 |
en |
dc.identifier.issue |
7 |
en |
dc.identifier.spage |
707 |
en |
dc.identifier.epage |
728 |
en |