コンパイラ開発者のためのJSR133クックブック

"The JSR-133 Cookbook for Compiler Writers"

original website is http://g.oswego.edu/dl/jmm/cookbook.html. by Doug Lea, with help from members of the JMM mailing list.

dl@cs.oswego.edu.

Japanese edition is translated by T.Murayama and Java Reading Group.

(*)部は訳注予定．

Table of Contents(目次)

Reorderings (順序変更)

Volatiles and Monitors (volatile変数とモニタ)
Final Fields(Finalフィールド)

Memory Barriers(メモリアクセスに対するバリア同期)

Categories(区分)
Data Dependency and Barriers(データ依存とバリア)
Interactions with Atomic Instructions(アトミック命令間の相互作用)

Multiprocessors(マルチプロセッサ)

Notes(メモ)

Recipes(レシピ)

Uniprocessors(単一プロセッサ)
Inserting Barriers(バリアの挿入)
Removing Barriers(バリアの除去)
Miscellany(その他)

Acknowledgments(謝辞)
和訳版謝辞

This is an unofficial guide to implementing the new Java Memory Model (JMM) specified by JSR-133 . It provides at most brief backgrounds about why various rules exist, instead concentrating on their consequences for compilers and JVMs with respect to instruction reorderings, multiprocessor barrier instructions, and atomic operations. It includes a set of recommended recipes for complying to JSR-133. This guide is "unofficial" because it includes interpretations of particular processor properties and specifications. We cannot guarantee that the intepretations are correct. Also, processor specifications and implementations may change over time.

これは JSR-133 で定義された新しい Java Memory Model (JMM) についての非公式の手引き書である．多くのルールが設けられている背景については簡潔に説明するに留め、命令の順序変更、マルチプロセッサのバリア命令、およびアトミック演算に関してコンパイラとJVMに与える影響について重点的に解説する。それはJSR133に準拠する上で推奨されるレシピを含んでいる．この手引きは，特定のプロセッサの特徴と仕様に関する解釈を含んでいるので「非公式」である．我々は，この解釈が正しいとは保証できない．また時が経てばプロセッサの仕様や実装も変更されるかもしれない．

Reorderings (順序変更)

For a compiler writer, the JMM mainly consists of rules disallowing reorderings of certain instructions that access fields (where "fields" include array elements) as well as monitors (locks).

コンパイラ作成者にとってJMMは，主にモニター(ロック)だけでなくフィールド(ここでいう「フィールド」には配列の要素も含む)にアクセスする特定の命令列に関して，順序変更 (reorderings) を禁止するルールより構成されている．

Volatiles and Monitors (volatile変数とモニタ)

The main JMM rules for volatiles and monitors can be viewed as a matrix with cells indicating that you cannot reorder instructions associated with particular sequences of bytecodes. This table is not itself the JMM specification; it is just a useful way of viewing its main consequences for compilers and runtime systems.

volatile変数とモニタに関するJMMの主要なルールは，各要素が特定のバイトコード命令の並びと関連した順序変更不可能な命令群を意味する行列とみなすことができる．この表はそれ自体はJMMの仕様ではない；これは単にコンパイラやランタイムシステムに関するJMMの主な結論を，便利な形で図示したものだ．

Can Reorder	2nd operation
1st operation	Normal Load Normal Store	Volatile Load MonitorEnter	Volatile Store MonitorExit
Normal Load Normal Store			No
Volatile Load MonitorEnter	No	No	No
Volatile store MonitorExit		No	No

Where:

Normal Loads are getfield, getstatic, array load of non-volatile fields
Normal Stores are putfield, putstatic, array store of non-volatile fields
Volatile Loads are getfield, getstatic of volatile fields that are accessible by multiple threads
Volatile Stores are putfield, putstatic of volatile fields that are accessible by multiple threads
MonitorEnters (including entry to synchronized methods) are for lock objects accessible by multiple threads.
MonitorExits (including exit from synchronized methods) are for lock objects accessible by multiple threads.

この時

通常ロード(Normal Load)とは，非volatile変数からのgetfield, getstatic(*), 配列からのロードのいずれかのことを指す．
通常ストア(Normal Store)とは，非volatile変数へのputfield, putstatic, 配列へのストアのいずれかのことを指す．
volatileロード(Volatile Load)とは，複数スレッドよりアクセス可能な volatile 変数への getfield, getstatic のいずれかのことである．
volatileストア(Volatile Store)とは，複数スレッドよりアクセス可能な volatile 変数への putfield, putstatic のいずれかのことである．
MonitorEnter (synchronizedメソッドの開始を含む)とは，複数スレッドよりアクセス可能なロックオブジェクト(*)に関するものである．
MonitorExit (synchronizedメソッドの終了を含む)とは，複数スレッドよりアクセス可能なロックオブジェクト(*)に関するものである．

The cells for Normal Loads are the same as for Normal Stores, those for Volatile Loads are the same as MonitorEnter, and those for Volatile Stores are same as MonitorExit, so they are collapsed together here (but are expanded out as needed in subsequent tables).

Nomal Loadのマス目とNormal Storeのマス目は完全に一致する．同様にVolatile LoadとMonitorEnter，Volatile StoreとMonitorExitも完全に一致する．よってこの表では一つにまとめてある．（以下で出てくる表においては，必要に応じて分割する．）

Any number of other operations might be present between the indicated 1st and 2nd operations in the table. So, for example, the "No" in cell [Normal Store, Volatile Store] says that a non-volatile store cannot be reordered with ANY subsequent volatile store; at least any that can make a difference in multithreaded program semantics.

1stと2ndで指定される命令の間には，任意の個数のその他の演算が出現することが許される．だから，例えば， [Normal StoreとVolatile Store]の組に対するマス目に書かれている「No」とは，「非volatileストアは，それより後にある(少なくともマルチスレッドプログラム上のセマンティクス的に違いを生じる可能性のある)全ての Volatile Storeと，順序変更してはならない．」ということを意味する．

The JSR-133 specification is worded such that the rules for both volatiles and monitors apply only to those that may be accessed by multiple threads. If a compiler can somehow (usually only with great effort) prove that a lock is only accessible from a single thread, it may be eliminated. Similarly, a volatile field provably accessible from only a single thread acts as a normal field. More fine-grained analyses and optimizations are also possible, for example, those relying on provable inaccessibility from multiple threads only during certain intervals.

JSR-133は複数スレッドよりアクセスされる可能性のある， volatile変数とモニタに関してのみ適用されるルールについて述べている．もしコンパイラが（通常は多大なる努力を伴った）何らかの方法により，あるロックが単一スレッドからのみアクセス可能であることが証明できたならば，それは消去可能である．同様に，単一スレッドよりアクセスされることが証明できる volatileフィールドは， Normalフィールドのように振る舞う．さらなる細粒度の分析と最適化も可能である．例えば，特定の間隔で発生するであろう，複数スレッドからのアクセス不可能性に依存した，分析と最適化のようなものが考えられる．

Blank cells in the table mean that the reordering is allowed if the accesses aren't otherwise dependent with respect to basic Java semantics (as specified in the JLS). For example even though the table doesn't say so, you can't reorder a load with a subsequent store to the same location. But you can reorder a load and store to two distinct locations, and may wish to do so in the course of various compiler transformations and optimizations. This includes cases that aren't usually thought of as reorderings; for example reusing a computed value based on a loaded field rather than reloading and recomputing the value acts as a reordering. However, the JMM spec permits transformations that eliminate avoidable dependencies, and in turn allow reorderings.

表中の空白部分は，それらのアクセスが他の理由により(JLSで定義された)Java のセマンティクス上の依存関係がない限り，順序変更可能であることをを意味している．例えば，表中に何も言及されていなくても，同一箇所に対するロードとそれに続くストアとは順序変更不可能である．しかし異なる箇所に対するロードとそれに続くストアについては順序変更可能であり，あなたはコンパイラによる様々な変更と最適化の中で，実際にそうなることを期待するかもしれない．これには，通常は順序変更と見なされないケースも含んでいる．例えば，フィールドを再ロードして再計算するのではなく，ロード済みの値を元に計算した値の再利用は，順序変更のように振る舞う．しかしながら，JMM仕様は無視しても良い依存性を消去するような変換を許しており，そうすると順序変更も認められるようになる．

In all cases, permitted reorderings must maintain minimal Java safety properties even when accesses are incorrectly synchronized by programmers: All observed field values must be either the default zero/null "pre-construction" values, or those written by some thread. This usually entails zeroing all heap memory holding objects before it is used in constructors and never reordering other loads with the zeroing stores. A good way to do this is to zero out reclaimed memory within the garbage collector. See the JSR-133 spec for rules dealing with other corner cases surrounding safety guarantees.

全ての例において，たとえプログラマによってアクセスが正しく同期されていなかったとしても，許された順序変更はJavaの安全性に関する最低限の特性を維持しなければならない．全ての観測されたフィールド値は，デフォルトのゼロ/nullの「構築前 (pre-construction) 」の値か，あるスレッドによって書き込まれた値のいずれかでなければならない．これは通常は，コンストラクタ内で使用される前にオブジェクトを保持しているヒープメモリが0クリアされ，その0ストアに対してその他のロードが順序変更されないということを意味する．これを実現する良い方法は，ガベージコレクタ内でメモリを再生 (reclaim) する時に全て0にすることだ(*)．安全性の保証に関する，他の困難な事例については JSR-133仕様を参照のこと．

The rules and properties described here are for accesses to Java-level fields. In practice, these will additionally interact with accesses to internal bookkeeping fields and data, for example object headers, GC tables, and dynamically generated code.

ここで描かれた全てのルールと特性は，Javaレベルのフィールドへのアクセスに関するものである．実際には，これに加えてオブジェクトのヘッダ，GCのテーブル，動的生成されるコードなどの，内部的に保持／利用されるフィールドやデータとの相互作用も追加される．

Final Fields(Finalフィールド)

Loads and Stores of final fields act as "normal" accesses with respect to locks and volatiles, but impose two additional reordering rules:

finalフィールドに対するロードとストアは，ロックとvolatileに関しては「通常」のアクセスと同様に振る舞うが，順序変更に関する二つのルールが追加される．

A store of a final field (inside a constructor) and, if the field is a reference, any store that this final can reference, cannot be reordered with a subsequent store (outside that constructor) of the reference to the object holding that field into a variable accessible to other threads. For example, you cannot reorder
x.finalField = v; ... ; sharedRef = x;
This comes into play for example when inlining constructors, where "..." spans the logical end of the constructor. You cannot move stores of finals within constructors down below a store outside of the constructor that might make the object visible to other threads. (As seen below, this may also require issuing a barrier). Similarly, you cannot reorder either of the first two with the third assignment in:
v.afield = 1; x.finalField = v; ... ; sharedRef = x;
The initial load (i.e., the very first encounter by a thread) of a final field cannot be reordered with the initial load of the reference to the object containing the final field. This comes into play in:
x = sharedRef; ... ; i = x.finalField;
A compiler would never reorder these since they are dependent, but there can be consequences of this rule on some processors.

(コンストラクタ中での)finalフィールドへのストアと，finalフィールドが参照型の時にこのfinalが参照可能な全てのストアは，その後に続く(コンストラクタ外の)，そのfinalフィールドを保持するオブジェクトの参照の，他のスレッドよりアクセス可能な変数へのストアと，順序変更してはならない．例えば，以下の例では順序変更は許されない．
x.finalField = v; ... ; sharedRef = x;
例えば"..."が，コンストラクタの論理的な終端にまで及ぶコンストラクタのインライン化の際に，これが関係してくる．コンストラクタ中のfinalへのストアは，そのオブジェクトを他のスレッドから見える様にするかもしれないコンストラクタ外にあるストアの後に，移動してはならない．(以下で見られるように，これはバリアの発行が必要になるかもしれない．)同様に以下の例において，三つ目の割り当ては，前の二つのいずれかと順序変更することは許されない．
v.afield = 1; x.finalField = v; ... ; sharedRef = x;
finalフィールドの初期化ロード(例えば，本当に最初にスレッドに遭遇した時)finalフィールドを保持するオブジェクトへの参照の初期化ロードと順序変更してはならない．これは次の例で意味を持つ．
x = sharedRef; ... ; i = x.finalField;
これらには依存関係があるため，コンパイラはこれらを順序変更しようとはしないだろう．しかしある種のプロセッサではこのルールの結果，順序変更されるかもしれない．(?)

These rules imply that reliable use of final fields by Java programmers requires that the load of a shared reference to an object with a final field itself be synchronized, volatile, or final, or derived from such a load, thus ultimately ordering the initializing stores in constructors with subsequent uses outside constructors.

これらのルールは以下のことを示している． Javaプログラマが finalフィールドを信頼して使用するには， finalフィールドを持つオブジェクトへの参照型が共有される時に，その参照型のロードは，それ自体が同期化されているか， volatile又は finalであるか，又はそのようなロードに由来するかのいずれかでなければならない．そして，それゆえに究極的にはコンストラクタ中の初期化ストアと，それの後に続くコンストラクタ外の参照の利用が順序づけられる．

Memory Barriers(メモリアクセスに対するバリア同期)

Compilers and processors must both obey reordering rules. No particular effort is required to ensure that uniprocessors maintain proper ordering, since they all guarantee "as-if-sequential" consistency. But on multiprocessors, guaranteeing conformance often requires emitting barrier instructions. Even if a compiler optimizes away a field access (for example because a loaded value is not used), barriers must still be generated as if the access were still present. (Although see below about independently optimizing away barriers.)

コンパイラとプロセッサは共に順序変更のルールに従わなければならない．単一プロセッサでは，「まるで逐次であるかのような (as-if sequential *)」一貫性がいつも保証されるので，命令の特定の順序を維持することを保証するのに，特殊な努力は必要ない．しかしマルチプロセッサ上で準拠していることを保証するには，多くはバリア命令(*)の発行が必要になる．たとえコンパイラが最適化の結果，フィールドアクセス自体を無くしてしまったとしても，(たとえば，ロードされた値が使用されない場合，)それでもアクセスがある時と同じ様にバリアを生成しなければならない．（しかし，独立した最適化によるバリアの除去については，以下を参照すること．）

Memory barriers are only indirectly related to higher-level notions described in memory models such as "acquire" and "release". And memory barriers are not themselves "synchronization barriers". And memory barriers are unrelated to the kinds of "write barriers" used in some garbage collectors. Memory barrier instructions directly control only the interaction of a CPU with its cache, with its write-buffer that holds stores waiting to be flushed to memory, and/or its buffer of waiting loads or speculatively executed instructions. These effects may lead to further interaction among caches, main memory and other processors. But there is nothing in the JMM that mandates any particular form of communication across processors so long as stores eventually become globally performed; i.e., visible across all processors, and that loads retrieve them when they are visible.

メモリ=バリアは，「獲得」や「解放」のようなメモリモデルで描写される，上位の概念とは間接的にしか関連していない．メモリ=バリアはそれ自体は「同期バリア(*)」ではない．そして，メモリ=バリアはある種のGCで使われる「ライト= バリア(*)」とも関係がない．メモリ=バリア命令はCPUとそのキャッシュ－もう少し具体的言えば，メモリへフラッシュされるストアデータを保持している書き込みバッファと，ロードデータと投機的実行される命令(*)の(読み込み)バッファ－間の相互作用を直接的に制御する．これらの効果は，キャッシュ，メインメモリ，他のプロセッサとの間に，さらなる相互作用をもたらすかも知れない．しかし最終的にストアが大域的に実行される－即ち，全てのプロセッサに見えるようになり，そしてそれが見えるようになった時に，そのロードはそれらを取得する．(?)－限り，JMMはプロセッサ間通信の形式についてはなんら規定していない．

Categories(区分)

Nearly all processors support at least a coarse-grained barrier instruction, often just called a Fence, that guarantees that all loads and stores initiated before the fence will be strictly ordered before any load or store initiated after the fence. This is usually among the most time-consuming instructions on any given processor (often nearly as, or even more expensive than atomic instructions). Most processors additionally support more fine-grained barriers.

ほとんど全てのプロセッサは，少なくとも粗粒度のバリア命令をサポートしている．それはFenceとも呼ばれ，fenceの前に開始された全てのロードとストアが，fenceの後に開始された全てのロードとストアの前に，強く順序づけされることを保証する．これは通常は既知のプロセッサにおいて最も時間のかかる命令の一つである．（多くは，アトミック命令と同程度か，或いはそれ以上に高く付く．）多くのプロセッサは細粒度のバリアを付加的にサポートしている．

A property of memory barriers that takes some getting used to is that they apply BETWEEN memory accesses. Despite the names given for barrier instructions on some processors, the right/best barrier to use depends on the kinds of accesses it separates. Here's a common categorization of barrier types that maps pretty well to specific instructions (sometimes no-ops) on existing processors:

慣れるのに少々手間がかかる(?)メモリバリアの特性は，それがメモリアクセス間に適用されるということだ．幾つかのプロセッサでのバリア命令の名前にも関わらず，使用するのに正しい／最適なバリアは個々のアクセスの種類に依存する．特定の命令に上手く(時にはno-opsに)マッピングされる，バリアの一般的な分類を，ここに示す．

LoadLoad Barriers: The sequence: Load1; LoadLoad; Load2
ensures that Load1's data are loaded before data accessed by Load2 and all subsequent load instructions are loaded. In general, explicit LoadLoad barriers are needed on processors that perform speculative loads and/or out-of-order processing in which waiting load instructions can bypass waiting stores. On processors that guarantee to always preserve load ordering, the barriers amount to no-ops.

Load1; LoadLoad; Load2において
Load2がデータにアクセスしたり，その後の全てのロード命令がロードするより先に，Load1のデータがロードされることが保証される．一般論として，投機的なロード，及び待機中のロード命令が待機中のストア命令をバイパスできる out-of-order 実行(*)機能を有するプロセッサには，明示的なLoadLoad バリアが必要である．ロードの順序が保存されるプロセッサ上では，このバリアはno-ops(*)に等しい．
StoreStore Barriers: The sequence: Store1; StoreStore; Store2
ensures that Store1's data are visible to other processors (i.e., flushed to memory) before the data associated with Store2 and all subsequent store instructions. In general, StoreStore barriers are needed on processors that do not otherwise guarantee strict ordering of flushes from write buffers and/or caches to other processors or main memory.

Store1; StoreStore; Store2という命令列において
Store2とその後の全てのストア命令に関連したデータが他のプロセッサに見えるようになる前に，Store1 データが見えるようになることを保証する(即ち，メモリへフラッシュする)．一般論として，他のプロセッサや主メモリに対する，ライトバッファやキャッシュからのフラッシュの強い順序付けが，それなしには保証できないプロセッサに，StoreStore バリアは必要である．
LoadStore Barriers: The sequence: Load1; LoadStore; Store2
ensures that Load1's data are loaded before all data associated with Store2 and subsequent store instructions are flushed. LoadStore barriers are needed only on those out-of-order procesors in which waiting store instructions can bypass loads.

Load1; LoadStore; Store2という命令列において
Store2とその後のストア命令に関連した全てのデータがフラッシュされるより先に，Load1のデータが先にロードされることが保証される．LoadStore バリアは待機しているストア命令がロード命令をバイパスできる out-of-order 有りのプロセッサにのみ必要である．
StoreLoad Barriers: The sequence: Store1; StoreLoad; Load2
ensures that Store1's data are made visible to other processors (i.e., flushed to main memory) before data accessed by Load2 and all subsequent load instructions are loaded. StoreLoad barriers protect against a subsequent load incorrectly using Store1's data value rather than that from a more recent store to the same location performed by a different processor. Because of this, on the processors discussed below, a StoreLoad is strictly necessary only for separating stores from subsequent loads of the same location(s) as were stored before the barrier. StoreLoad barriers are needed on nearly all recent multiprocessors, and are usually the most expensive kind. Part of the reason they are expensive is that they must disable mechanisms that ordinarily bypass cache to satisfy loads from write-buffers. This might be implemented by letting the buffer fully flush, among other possible stalls.

Store1; StoreLoad; Load2という命令列において
Load2とその後の全てのロード命令のデータがロードされる前に，Store1 のデータは他のプロセッサより見えることが保証される．(即ちメインメモリにフラッシュされる．) StoreLoad 命令は，異なるプロセッサが実行した同じ場所に対するストア命令に対する保護というよりは，その後のロード命令がStore1 のデータの値を不正に使用することを防ぐためのものである．このため，以下で議論されているプロセッサにおいて，StoreLoad 命令は，バリア以前にストアされた場所と同じ場所に対する，後続のロード命令から独立したストア命令においてのみ厳密に必要である(?)．StoreLoad バリアは最新のマルチプロセッサのほとんど全てに必要で，そして，それは通常，最も高価な類の命令である．それらが高価になる理由の一部は，それらがライトバッファからのロードを満足させるために，キャッシュをバイパスする通常のメカニズムを無効にしなければならないためである(*)．これは他の可能なストール(失速)の中でも，バッファ全体を完全にフラッシュすることで実装されるかもしれない，

On all processors discussed below, it turns out that instructions that perform StoreLoad also obtain the other three barrier effects, so StoreLoad can serve as a general-purpose (but usually expensive) Fence. (This is an empirical fact, not a necessity.) The opposite doesn't hold though. It is NOT usually the case that issuing any combination of other barriers gives the equivalent of a StoreLoad.

以下で議論されている全てのプロセッサにおいて，StoreLoadを実行する命令は，その他の三つのバリアの効果も持つことが分かっており，それゆえ汎用の(しかし通常は高価な)FenceとしてStoreLoadを提供できる．(これはただの経験的な事実であり，必然ではない．)しかし，その逆は成立しない．その他の三つのバリアのいかなる組み合わせにおいても，StoreLoadと等価になることは，通常はありえない．

The following table shows how these barriers correspond to JSR-133 ordering rules.

次に続く表は，これらのバリアがJSR-133の順序に関するルールと如何に関連するかを示している．

Required barriers	2nd operation
1st operation	Normal Load	Normal Store	Volatile Load MonitorEnter	Volatile Store MonitorExit
Normal Load				LoadStore
Normal Store				StoreStore
Volatile Load MonitorEnter	LoadLoad	LoadStore	LoadLoad	LoadStore
Volatile Store MonitorExit			StoreLoad	StoreStore

Plus the special final-field rule requiring a StoreStore barrier in
x.finalField = v; StoreStore; sharedRef = x;

Here's an example showing placements.

加えて，以下の例ではStoreStoreバリアには， finalフィールドについて特別なルールが必要になる．
x.finalField = v; StoreStore; sharedRef = x;

ここに，その一例を示す．

Java	Instructions
`class X { int a, b; volatile int v, u; void f() { int i, j; i = a; j = b; i = v; j = u; a = i; b = j; v = i; u = j; i = u; j = b; a = i; } }`	`load a load b load v LoadLoad load u LoadStore store a store b StoreStore store v StoreStore store u StoreLoad load u load b store a`

Data Dependency and Barriers(データ依存とバリア)

The need for LoadLoad and LoadStore barriers on some processors interacts with their ordering guarantees for dependent instructions. On some (most) processors, a load or store that is dependent on the value of a previous load are ordered by the processor without need for an explicit barrier. This commonly arises in two kinds of cases, indirection:
Load x; Load x.field
and control
Load x; if (predicate(x)) Load or Store y;

幾つかのプロセッサにおけるLoadLoadバリア，及びLoadStoreバリアの必要性は，依存する命令群の順序の保証と相互に影響し合う(?)．幾つかの(或いはほとんどの)プロセッサでは，前回のロードの値に依存するロード又はストア命令は，明示的なバリアの必要無しに，プロセッサにより順序づけられる，これは主に次の二種類の形で間接的に現れる
Load x; Load x.field
そして制御は次のようになる．(?)
Load x; if (predicate(x)) Load or Store y;

Processors that do NOT respect indirection ordering in particular require barriers for final field access for references initially obtained through shared references:
x = sharedRef; ... ; LoadLoad; i = x.finalField;

間接的な順序付けに関して特に注意を払わないプロセッサは，共有している参照を通じて最初に獲得される参照のfinalフィールドへのアクセスのためにバリアを要求する：
x = sharedRef; ... ; LoadLoad; i = x.finalField;

Conversely, as discussed below, processors that DO respect data dependencies provide several opportunities to optimize away LoadLoad and LoadStore barrier instructions that would otherwise need to be issued. (However, dependency does NOT automatically remove the need for StoreLoad barriers on any processor.)

逆に，以下で議論するように，データの依存性に注目するプロセッサは，本来なら発行するであろう LoadLoadと LoadStoreを除去することで最適化するチャンスを提供する(しかし依存性は， StoreLoadバリアの必要性を全てのプロセッサにおいて自動的に除去するわけではない．)

Interactions with Atomic Instructions(アトミック命令間の相互作用)

The kinds of barriers needed on different processors further interact with implementation of MonitorEnter and MonitorExit. Locking and/or unlocking usually entail the use of atomic conditional update operations CompareAndSwap (CAS) or LoadLinked/StoreConditional (LL/SC) that have the semantics of performing a volatile load followed by a volatile store. While CAS or LL/SC minimally suffice, some processors also support other atomic instructions (for example, an unconditional exchange) that can sometimes be used instead of or in conjunction with atomic conditional updates.

異なるプロセッサ上で必要とされるその種のバリアは(?)， MonitorEnterと MonitorExit(*)の実装ににさらなる相互作用を引き起こす．ロックとアンロックはアトミックな条件付き更新操作である CompareAndSwap(CAS)や LoadLinked/StoreConditional (LL/SC)を含んでいる．それらはセマンティクス的には volatileロードに続く volatileストアになる．CASや LL/SCがあれば最低限の機能は満たすけれど，幾つかのプロセッサでは他のアトミックな命令(例えば "unconditional exchange"，無条件変換)もサポートしている．それはアトミックな条件付き更新( "conditional update")の代わりに，あるいはそれと一緒に使われる．

On all processors, atomic operations protect against read-after-write problems for the locations being read/updated. (Otherwise standard loop-until-success constructions wouldn't work in the desired way.) But processors differ in whether atomic instructions provide more general barrier properties than the implicit StoreLoad for their target locations. On some processors these instructions also intrinsically perform barriers that would otherwise be needed for MonitorEnter/Exit; on others some or all of these barriers must be specifically issued.

全てのプロセッサにおいて，アトミックな操作はその場所を読んで／書くという read-after-write問題を防いている．（さもなくば，標準的な loop-until-success 構築は望んだようには機能しない．）しかしターゲットの場所に対する暗黙的な StoreLoad命令よりも，より一般的なバリア機能をアトミック命令が提供するかどうかで，プロセッサには違いが見られる．幾つかのプロセッサでは，これらの命令もまた本質的にバリアとして振る舞っており，さもなくば MonitorEnter/Exitが必要になるだろう．：その他の幾つかのプロセッサでは，これらのバリアの全て，或いは幾らかは，明確に発行する必要がある．

Volatiles and Monitors have to be separated to disentangle these effects, giving:

これの効果をより分けるためには， volatile変数やモニタは独立していなければならない．与えられた表について，(?)

Required Barriers	2nd operation
1st operation	Normal Load	Normal Store	Volatile Load	Volatile Store	MonitorEnter	MonitorExit
Normal Load				LoadStore		LoadStore
Normal Store				StoreStore		StoreExit
Volatile Load	LoadLoad	LoadStore	LoadLoad	LoadStore	LoadEnter	LoadExit
Volatile Store			StoreLoad	StoreStore	StoreEnter	StoreExit
MonitorEnter	EnterLoad	EnterStore	EnterLoad	EnterStore	EnterEnter	EnterExit
MonitorExit			ExitLoad	ExitStore	ExitEnter	ExitExit

Plus the special final-field rule requiring a StoreStore barrier in:
x.finalField = v; StoreStore; sharedRef = x;

加えて以下の例において，finalフィールドに関する特別なルールがStoreStoreバリアに特別なルールが要求される．(?)
x.finalField = v; StoreStore; sharedRef =x;

In this table, "Enter" is the same as "Load" and "Exit" is the same as "Store", unless overridden by the use and nature of atomic instructions. In particular:

この表において，アトミック命令の使用と本質によって上書きされない限り(?)，"Enter"は"Load"に等しく，"Exit"は"Store"に等しい．特に；

EnterLoad is needed on entry to any synchronized block/method that performs a load. It is the same as LoadLoad unless an atomic instruction is used in MonitorEnter and itself provides a barrier with at least the properties of LoadLoad, in which case it is a no-op.
EnterLoadは，内部でロードが実行される全てのsynchronizedブロック／メソッドに入る時に必要とされる． MonitorEnter内でアトミック命令が使われ，それ自体が最低でも LoadLoadの特性のバリアを提供しない限り，それ(EnterLoad)は LoadLoad と同じである．それ(LoadLoad級のバリア)が提供された場合はno-opになる．

StoreExit is needed on exit of any synchronized block/method that performs a store. It is the same as StoreStore unless an atomic instruction is used in MonitorExit and itself provides a barrier with at least the properties of StoreStore, in which case it is a no-op.
StoreExitは，その内部でストアを行う全ての synchronizedブロックやメソッドの出口で必要になる． MonitorExitでアトミック命令が使われ，それ自身が最低でも StoreStoreと同じ性質のバリアを提供しない限り，それ(StoreExit)は StoreStoreと同じである．それ(StoreStore級のバリア)が提供される場合はno-opとなる．

ExitEnter is the same as StoreLoad unless atomic instructions are used in MonitorExit and/or MonitorEnter and at least one of these provide a barrier with at least the properties of StoreLoad, in which case it is a no-op.
MonitorExitや MonitorEnter内でアトミック命令が使われており，そして少なくともその中の一つが，最低でも StoreLoadと同じ性質のバリアを提供しない限り， ExitEnterは StoreLoadと同じである．それ(StoreLoad級のバリア)が提供されない場合はno-opとなる．

The other types are specializations that are unlikely to play a role in compilation (see below) and/or reduce to no-ops on current processors. For example, EnterEnter is needed to separate nested MonitorEnters when there are no intervening loads or stores. Here's an example showing placements of most types:

もう一つのタイプは，コンパイル中で役割を果たしそうになく(下の例を参照)，そして現在のプロセッサ上では no-opsに縮小される特殊化である．例えば，インターリーブされたロードとストアが無い時に，複数のネストされた MonitorEnterを分離するのに EnterEnterが必要である．以下に，ほとんどのパターンを含む(?)例を示す．

Java	Instructions
`class X { int a; volatile int v; void f() { int i; synchronized(this) { i = a; a = i; } synchronized(this) { synchronized(this) { } } i = v; synchronized(this) { } v = i; synchronized(this) { } } }`	`enter EnterLoad EnterStore load a store a LoadExit StoreExit exit ExitEnter enter EnterEnter enter EnterExit exit ExitExit exit ExitEnter ExitLoad load v LoadEnter enter EnterExit exit ExitEnter ExitStore store v StoreEnter enter EnterExit exit`

Java-level access to atomic conditional update operations will be available in JDK1.5 via JSR-166 (concurrency utilities) so compilers will need to issue associated code, using a variant of the above table that collapses MonitorEnter and MonitorExit -- semantically, and sometimes in practice, these Java-level atomic updates act as if they are surrounded by locks.

Javaレベルでアクセスできるアトミックな条件付き更新操作はJDK1.5の JSR-166(concurrency utilities) で利用可能になる．そのため，コンパイラは関連するコードを出力する必要があり， MonitorEnterと MonitorExitを押し潰す上の表のバリエーションを使って，－セマンティクス的に，時には実践的に，これらの Javaレベルのアトミックな更新は，まるでロックで取り囲まれたかのように動作する．

Multiprocessors(マルチプロセッサ)

Here's a listing of processors that are commonly used in MPs, along with links to documents providing information about them. (Some require some clicking around from the linked site and/or free registration to access manuals). This isn't an exhaustive list, but it includes processors used in all current and near-future multiprocessor Java implementations I know of. The list and the properties of processors decribed below are not definitive. In some cases I'm just reporting what I read, and could have misread. Several reference manuals are not very clear about some properties relevant to the JMM. Please help make it definitive.

ここに，マルチプロセッサで良く使われるプロセッサ一覧を，ドキュメントへのリンクも付けて示す．（うち幾つかは，マニュアルにアクセスするために，リンクされたサイトでの何度かのクリックと無料登録を要求する．）これは網羅的なリストではないとは言え，私が知る限り現在又は近い将来においてJavaが実装されるのに使われる全てのプロセッサを含んでいる．このリストとそこで記述されているプロセッサの特徴は決定版とは言えない．幾つかの例では，私が読んだ物をただ報告しているだけなので，誤解しているかもしれない．幾つかのリファレンスマニュアルではJMM関係の特徴について分かりやすくはなっていない．読者の方は，これを決定版にするのに力を貸して欲しい．

Good sources of hardware-specific information about barriers and related properties of machines not listed here are Hans Boehm's atomic_ops library, the Linux Kernel Source, and Linux Scalability Effort. Barriers needed in the linux kernel correspond in straightforward ways to those discussed here, and have been ported to most processors. For descriptions of the underlying models supported on different processors, see Sarita Adve et al, Recent Advances in Memory Consistency Models for Hardware Shared-Memory Systems and Sarita Adve and Kourosh Gharachorloo, Shared Memory Consistency Models: A Tutorial.

ここに記載されていないマシンの，バリアとそれに関係する特徴のハードウエア固有の情報についての良い資料には， Hans Boehm's atomic_ops library (Hans Boehm氏による atomic_opsライブラリ)と Linux Kernel Source (Linuxカーネルのソースコード)， Linux Scalability Effort がある． linuxカーネル内で必要となるバリアは，直接的にここで議論されたものと合致するし，多くのプロセッサ上へ移植されている．異なるプロセッサ上でサポートされている基礎を成すモデルの記述は， Sarita Adve et al, Recent Advances in Memory Consistency Models for Hardware Shared-Memory Systems (ハードウエア共有メモリシステムのためのメモリ一貫性モデルの最近の進歩)と Sarita Adve and Kourosh Gharachorloo, Shared Memory Consistency Models: A Tutorial (共有メモリ一貫性モデル：チュートリアル)を参照すること．

sparc-TSO: Ultrasparc 1, 2, 3 (sparcv9) in TSO (Total Store Order) mode. Ultra3s only support TSO mode. (RMO mode in Ultra1/2 is never used so can be ignored.) See UltraSPARC III Cu User's Manual and The SPARC Architecture Manual, Version 9 .
x86-PO: Intel 486, Pentium, P2, P3, P4, P4 with hyperthreading, Xeon, AMD Athlon and Opteron and others. Intel calls consistency properties for these "Processor Ordering" (PO). See The IA-32 Intel Architecture Software Developers Manual, Volume 3: System Programming Guide and AMD x86-64 Architecture Programmer's Manual Volume 2: System Programming.
x86-SPO: Proposed but unimplemented x86 rules that Intel calls "Speculative Processor Ordering". As of this writing, no existing x86 or x86-64 processors are known to be SPO. All are PO.
ia64: Itanium. See Intel Itanium Architecture Software Developer's Manual, Volume 2: System Architecture
ppc: All versions (6xx, 7xx, 7xxx (G3/G4), 64bit POWER4, "Book-E" enhanced powerpc, PowerPC-440, Motorola-e500 G5) have the same basic memory model, but differ (as discussed below) in the availability and definition of some memory barrier instructions. See MPC603e RISC Microprocessor Users Manual, MPC7410/MPC7400 RISC Microprocessor Users Manual , Book II of PowerPC Architecture Book, PowerPC Microprocessor Family: Software reference manual, Book E- Enhanced PowerPC Architecture, EREF: A Reference for Motorola Book E and the e500 Core. For discussion of barriers see IBM article on power4 barriers, and IBM article on powerpc barriers.
alpha: 21264x and I think all others. See Alpha Architecture Handbook
pa-risc: HP pa-risc implementations. See the pa-risc 2.0 Architecture manual.

Here's how these processors support barriers and atomics:

ここにこれらのプロセッサがバリアとアトミック命令をどのようにサポートするか示す．

Processor	LoadStore	LoadLoad	StoreStore	StoreLoad	Data dependency orders?	Atomic Conditional	Other Atomics	Atomics provide barrier?
sparc-TSO	no-op	no-op	no-op	membar (StoreLoad)	yes	CAS: casa	swap, ldstub	full
x86-PO	no-op	no-op	no-op	mfence or cpuid or locked insn	yes	CAS: cmpxchg	xchg, locked insn	full
x86-SPO	no-op	lfence	no-op	mfence	yes	CAS: cmpxchg	xchg, locked insn	full
ia64	combine with st.rel or ld.acq	ld.acq	st.rel	mf	yes	CAS: cmpxchg	xchg, fetchadd	target + acq/rel
ppc	dependency or isync	dependency plus isync	mbar eieio lwsync	msync sync	yes	LL/SC: ldarx/stwcx		target only
alpha	mb	mb	wmb	mb	no	LL/SC: ldx_l/stx_c		target only
pa-risc	no-op	no-op	no-op	no-op	yes	build from ldcw	ldcw	(NA)

Notes(メモ)

Some of the listed barrier instructions have stronger properties than actually needed in the indicated cells, but seem to be the cheapest way to get desired effects.
このリストに挙げられたバリア命令の幾つかは，本来そのマス目で必要とされている特徴より強いものもある．しかし，求めている効果を得るには最も安いと思われる．

The listed barrier instructions are those designed for use with normal program memory, but not necessarily other special forms/modes of caching and memory used for IO and system tasks. For example, on x86-SPO, StoreStore barriers ("sfence") are needed with WriteCombining (WC) caching mode, which is designed for use in system-level bulk transfers etc. OSes use Writeback mode for programs and data, which doesn't require StoreStore barriers.
リストに挙げられたバリア命令は，通常のプログラムメモリと共に使用されるよう設計されているが，その他の特別な形式/モードのキャッシュやI/Oやシステムタスクに使われるメモリと使用するようには必ずしもなっていない．例えば x86-SPOでは，システムレベルの bulk-transferなどに用いられる WriteCombining(WC)キャッシュモードでは StoreStoreバリア ("sfence")が必要になる．OSはプログラムやデータのために Writebackモードを使うが，これは StoreStoreバリアを要求しない．

On x86 (both PO and SPO), any lock-prefixed instruction can be used as a StoreLoad barrier. (The form used in linux kernels is the no-op lock; addl $0,0(%%esp).) Versions supporting the "SSE2" extensions (Pentium4 and later) support the mfence instruction which seems preferable unless a lock-prefixed instruction like CAS is needed anyway. The cpuid instruction also works but is slower.
x86(POとSPOの両方とも）では，全ての lock-prefixed(lockという接頭辞を持つ(?))命令が StoreLoadバリアとして利用できる．(linuxカーネルでは no-op lock; addl $0,0(%%esp)の形式が使用されている．) SSE2拡張をサポートしているバージョン (Pentium4以後)では mfence命令もサポートされており，CASのような lock-prefixed命令を使用しない限りは好ましいように思える． cpuid命令もバリアとして機能するが，より低速である．

On ia64, LoadStore, LoadLoad and StoreStore barriers are folded into special forms of load and store instructions -- there aren't separate instructions. ld.acq acts as (load; LoadLoad+LoadStore) and st.rel acts as (LoadStore+StoreStore; store). Neither of these provide a StoreLoad barrier -- you need a separate mf barrier instruction for that.
ia64では，LoadStore， LoadLoad， StoreStoreが， "load and store"－これらは分割されているわけではない－という特殊な形式に折り畳まれる． ld.acq は(load; LoadLoad+ LoadStore)のように振る舞い，st.relは (LoadStore+ StoreStore; store) のように振る舞う. これらのいずれもが StoreLoadバリアを提供しない．そのためには独立したmfバリア命令を使う必要がある．

The "Book-E" ppcs support mbar and msync instructions that map well to the barrier categorizations here. Power4 uses lwsync instead of mbar. The mbar instruction is the same opcode as the eieio instruction. The original ppcs supported only a single heavy "sync" instruction.
"Book-E"ppcは mbar命令，及び msync命令をサポートしており，ここで分類しているバリアに上手くマップされる． Power4では mbarの代わりに lwsyncが使われる． mbar命令は eieio命令と同じオペコードを持つ．オリジナルのppcは，重い "sync"命令のみサポートしている．

The sparc membar instruction supports all four barrier modes, as well as combinations of modes. But only the StoreLoad mode is ever needed in TSO. On some UltraSparcs, any membar instruction produces the effects of a StoreLoad, regardless of mode.
sparcのmembar命令(?)は4種類全てのモードのバリアをサポートしているだけでなく，そのモードの組み合わせもサポートしている．しかし StoreLoadモードだけはTSOで必要とされる．幾つかの UltraSparcでは，モードに関係なくどのmember命令も StoreLoadの効果を生み出さない．

The x86 documents do not explicitly say that they obey data dependency orderings, but all current implementations do so, and OSes and other low-level software widely assume that they do.
x86のドキュメントは，データの依存性に基づく順序づけに従うとは明確には述べていない．しかし現在の実装はそうなっているし，OSやその他の低レベルソフトウエアは広範囲にわたって，そうなることを仮定している．

The x86-PO processors supporting "streaming SIMD" SSE2 extensions require LoadLoad "lfence" only only in connection with these streaming instructions.
"streaming SIMD(*)" SSE2拡張をサポートしている X86-POプロセッサは，この拡張命令と関連する時のみ LoadLoad "lfence"命令を必要とする．

The recommended technique for implementing LoadStore barriers on ppcs is to introduce an artificial dependency rather than use a memory barrier instruction. As in:
Load x; if (x == x) Store y;
ppcで推奨されるLoadStoreバリアの実装法はメモリバリア命令を使うことではなく，以下に示すように人工的な依存性を導入することである．
Load x; if (x == x) Store y;

The recommended technique for implementing LoadLoad barriers on ppcs is to introduce an artificial dependency (as in the above case) if one is not already present, in addition to an isync instruction. An isync alone does not suffice.
ppcで推奨されるLoadLoadバリアの実装法は，(上の例で示すように)人工的な依存性を導入することである．既にロードされたもの(?)がない場合は，それにisync命令を追加する．isync単独では十分ではない．

Although the pa-risc specification does not mandate it, all HP pa-risc implementations are sequentially consistent, so have no memory barrier instructions.
PA-RISC仕様では必須とはしていないが，HPのPA-RISCは全て順序整合(*) (sequentialy consistent)で実装されている．よってPA-RISCにメモリバリア命令はない．

The only atomic primitive on pa-risc is ldcw, a form of test-and-set, from which you would need to build up atomic conditional updates using techniques such as those in the HP white paper on spinlocks.
PA-RISCにあるアトミック処理のプリミティブは， test-and-set形式の一種である ldcwのみである．このため HP white paper on spinlocks のようなテクニックを使って，アトミックな条件付き更新を構築する必要があるだろう．

CAS and LL/SC take multiple forms on different processors, differing only with respect to field width, minimially including 4 and 8 byte versions.
CASとLL/SCは異なるプロセッサでは多様な形式を取る．フィールドの幅の違いだけでも，最低でも4バイトと8バイトバージョンがある．

On sparc and x86, CAS has implicit preceding and trailing full StoreLoad barriers. The sparcv9 architecture manual says CAS need not have post-StoreLoad barrier property, but the chip manuals indicate that it does on ultrasparcs.
sparcとx86では，CASは暗黙的に完全な StoreLoadバリアを前後に持つ(?)． SPARCv9アーキテクチャマニュアルでは「CASは事後の StoreLoad 特性を持つ必要ない」としているが， SPARCv9のチップマニュアルでは， ultrasparcにその必要があることを示唆している．

On ppc and alpha, LL/SC have implicit barriers only with respect to the locations being loaded/stored, but don't have more general barrier properties.
ppcとalphaでは，LL/SCはロード或いはストアされた場所についてのみ暗黙的なバリアを持つが，より一般的なバリアの特性は持っていない．

The ia64 cmpxchg instruction also has implicit barriers with respect to the locations being loaded/stored, but additionally takes an optional .acq (post-LoadLoad+LoadStore) or .rel (pre-StoreStore+LoadStore) modifier. The form cmpxchg.acq can be used for MonitorEnter, and cmpxchg.rel for MonitorExit. In those cases where exits and enters are not guaranteed to be matched, an ExitEnter (StoreLoad) barrier may also be needed.
IA64の cmpxchg命令もまた， load/storeされた箇所について暗黙的にバリアを持つが，それに加えてオプションで .acq (事後LoadLoad + LoadStore) 又は .rel (事前 StoreStore + LoadStore ) 修飾子を取る． cmpxchg.acq形式は MonitorEnterに使用でき， cmpexchg.relは MonitorExitに利用できる．これらの事例ではexitとenterが対応していることは保証されず， ExitEnter (StoreLoad) バリアが必要になるかもしれない．

Sparc, x86 and ia64 support unconditional-exchange (swap, xchg). Sparc ldstub is a one-byte test-and-set. ia64 fetchadd returns previous value and adds to it. On x86, several instructions (for example add-to-memory) can be lock-prefixed, causing them to act atomically.
x86とIA64は無条件交換命令 (swap/xchg) をサポートしている． sparcの ldstubは一バイトの test-and-set 命令である． IA64の fetchadd は以前の値を返して、それ自身に加える命令である． x86の幾つかの命令(例えば add-to-memory) はロック付き(lock-prefixed) にでき，それはアトミックに動作する．

Recipes(レシピ)

Uniprocessors(単一プロセッサ)

If you are generating code that is guaranteed to only run on a uniprocessor, then you can probably skip the rest of this section. Because uniprocessors preserve apparent sequential consistency, you never need to issue barriers unless object memory is somehow shared with asynchrononously accessible IO memory. This might occur with specially mapped java.nio buffers, but probably only in ways that affect internal JVM support code, not Java code. Also, it is conceivable that some special barriers would be needed if context switching doesn't entail sufficient synchronization.

もし，あなたが作っているコードが単一プロセッサ上で実行されることが保証されるなら，この章の残りはスキップしても構わない．何故なら単一プロセッサ環境では明らかに順序整合性(*) (sequential consistency) が保証されるので，非同期でアクセス可能なIOメモリとオブジェクトメモリとが何らかの形で共有されない限り，バリアを発行する必要がないからだ．これは特別にマップされた java.nio.buffer で起こる可能性のあるものだが， JVM内部でサポートするコードに影響を与えるかもしれないだけで Javaコードには影響はない．またコンテキストスイッチが十分な同期を含んでいない場合は，特別なバリアが必要になることも考えられる．(*)

Inserting Barriers(バリアの挿入)

Barrier instructions apply between different kinds of accesses as they occur during execution of a program. Finding an "optimal" placement that minimizes the total number of executed barriers is all but impossible. Compilers often cannot tell if a given load or store will be preceded or followed by another that requires a barrier; for example, when a volatile store is followed by a return. The easiest conservative strategy is to assume that the kind of access requiring the "heaviest" kind of barrier will occur when generating code for any given load, store, lock, or unlock:

バリア命令はプログラムの実行中に発生する，異なる種類のアクセスの間に適用される．実行されるバリアの総数を最小化する「最適」な設置場所の特定は，ほとんど不可能だ．コンパイラはしばしばバリアが必要な何かに対し，定められたload/storeが先行するか後に続くかどうか分からない．例えばリターンに続く volatileストアがそうだ．最も簡単で保守的な戦略は，定められたload， store， lock，unlockのコードを生成する時はいつも，「最も重い」種類のバリアを要求するある種のアクセスが発生すると仮定することだ．

Issue a StoreStore barrier before each volatile store.
(On ia64 you must instead fold this and most barriers into corresponding load or store instructions.)

volatile変数への各ストアの前に StoreStoreバリアを発行する．
(IA64では代わりにこれを折り畳まなければならず，ほとんどのバリアは関連する load又は store命令に織り込まれる(?)．)

Issue a StoreStore barrier after all stores but before return from any constructor for any class with a final field.

finalフィールドを持つクラスのコンストラクタよりリターンする前で、且つ全てのストアより後に，StoreStoreバリアを発行する．

Issue a StoreLoad barrier after each volatile store.
Note that you could instead issue one before each volatile load, but this would be slower for typical programs using volatiles in which reads greatly outnumber writes. Alternatively, if available, you can implement volatile store as an atomic instruction (for example XCHG on x86) and omit the barrier. This may be more efficient if atomic instructions are cheaper than StoreLoad barriers.

各volatileストアの後に StoreLoadバリアを発行する．
事後に発行する代わりに，それを各 volatileロードの直前に発行することもできるが，読み出しの方が書き込みよりも遙かに多く volatileを使う典型的なプログラムでは遅くなる点に注意すること．代わりに，もし可能であれば， volatileストアをアトミック命令(例えばx86のXCHG) を使って実装し，バリアを除去することもできる．これは，アトミック命令が StoreLoadバリアよりも安ければ、より効率的かもしれない．

Issue LoadLoad and LoadStore barriers after each volatile load.
On processors that preserve data dependent ordering, you need not issue a barrier if the next access instruction is dependent on the value of the load. In particular, you do not need a barrier after a load of a volatile reference if the subsequent instruction is a null-check or load of a field of that reference.

各volatileロードの前に LoadLoadバリアと LoadStoreバリアを発行する．
データ依存の順序を保存するプロセッサにおいて，次にアクセスする命令がロードされた値に依存するものである場合は，バリアを発行する必要はない．特に volatileな参照のロード命令の後続の命令が， null-check，又はその参照のフィールドのロード命令である場合は，最初のロード命令の後にバリアは必要ない．

Issue an ExitEnter barrier either before each MonitorEnter or after each MonitorExit.
(As discussed above, ExitEnter is a no-op if either MonitorExit or MonitorEnter uses an atomic instruction that supplies the equivalent of a StoreLoad barrier. Similarly for others involving Enter and Exit in the remaining steps.)

各MonitorEnterの前，或いは各MonitorExitの後にExitEnterバリアを発行する．
(上で議論したように， MonitorExit又は MonitorEnterで StoreLoadバリアと等価な効果をもたらすアトミック命令が使われている時は， ExitEnterバリアは no-opになる．その他についても同様に，残されたステップに Enterと Exitを含む(?)．)

Issue EnterLoad and EnterStore barriers after each MonitorEnter.

各MonitorEnterの後に EnterLoadバリアと EnterStoreバリアを発行する．

Issue StoreExit and LoadExit barriers before each MonitorExit.

各MonitorExitの後に StoreExitバリアと LoadExitバリアを発行する

If on a processor that does not intrinsically provide ordering on indirect loads, issue a LoadLoad barrier before each load of a final field. (Some alternative strategies are discussed in this JMM list posting, and this description of linux data dependent barriers.)

間接ロードに関して本質的に順序性を提供しないプロセッサ上では，各 finalフィールドのロードの前に LoadLoadバリアを発行する．(その他の代案については this JMM list posting(JMM MLの過去ログ)や this description of linux data dependent barriersで議論されている．)

Many of these barriers usually reduce to no-ops. In fact, most of them reduce to no-ops, but in different ways under different processors and locking schemes. For the simplest examples, basic conformance to JSR-133 on x86-PO or sparc-TSO using CAS for locking amounts only to placing a StoreLoad barrier after volatile stores.

これらのバリアの多くは削減されて no-opになる．事実，多くは削減されて no-opになるが，異なるプロセッサとロックスキーム下においては様々な手法が存在する(?)．最も単純な例では， x86-PO又は sparc-TSO上でロックにCASを用いて JSR-133に基本的に準拠すると，結局は volatileストアの後に StoreLoadバリアを設置するだけになる．

Removing Barriers(バリアの除去)

The conservative strategy above is likely to perform acceptably for many programs. The main performance issues surrounding volatiles occur for the StoreLoad barriers associated with stores. These ought to be relatively rare -- the main reason for using volatiles in concurrent programs is to avoid the need to use locks around reads, which is only an issue when reads greatly overwhelm writes. But this strategy can be improved in at least the following ways:

上記の保守的な戦略は，多くのプログラムにおいて許容できるだろう． volatile変数を巡る主なパフォーマンス問題は，ストアに関連する StoreLoadバリアにおいて発生する．これは比較的少ないハズだ．なぜなら，並列プログラムで volatile変数を使う主な理由は，読み込みに関連してロックを使用する必要性を避けるためだからだ．書き込みより読み込みの方が圧倒的に多い時の，それは単なる事実だ(?)．しかしこの戦略は，少なくとも以下のようなやり方で改善しうる．

Removing redundant barriers. The above tables indicate that barriers can be eliminated as follows:
冗長なバリアの除去．バリアは以下のような形で消去できることを，上の表は示唆している．

Original			=>	Transformed
1st	ops	2nd	=>	1st	ops	2nd
LoadLoad	[no loads]	LoadLoad	=>		[no loads]	LoadLoad
LoadLoad	[no loads]	StoreLoad	=>		[no loads]	StoreLoad
StoreStore	[no stores]	StoreStore	=>		[no stores]	StoreStore
StoreStore	[no stores]	StoreLoad	=>		[no stores]	StoreLoad
StoreLoad	[no loads]	LoadLoad	=>	StoreLoad	[no loads]
StoreLoad	[no stores]	StoreStore	=>	StoreLoad	[no stores]
StoreLoad	[no volatile loads]	StoreLoad	=>		[no volatile loads]	StoreLoad

Similar eliminations can be used for interactions with locks, but depend on how locks are implemented. Doing all this in the presence of loops, calls, and branches is left as an exercise for the reader. :-)
ロックとの相互作用について同様な消去が可能だが，ロックの実装に依存する．ループ，呼び出し，分岐に直面した表を全て完成させるのは，読者への練習課題として残しておく．:-)

Rearranging code (within the allowed constraints) to further enable removing LoadLoad and LoadStore barriers that are not needed because of data dependencies on processors that preserve such orderings.
そのような順序を保存するプロセッサ上のデータ依存性によって不要となった LoadLoadバリアと LoadStoreバリアの除去をさらに可能にする，(認められた束縛条件の範囲内の)コードの再配置．

Moving the point in the instruction stream that the barriers are issued, to improve scheduling, so long as they still occur somewhere in the interval they are required.
スケジューリングを改良するために行われる，命令列中のバリアが発行されるポイントの移動．ただしこの場合でも，バリアはそれが要求されている期間中のどこかで発生するのは変わらない．(?)

Removing barriers that aren't needed because there is no possibility that multiple threads could rely on them; for example volatiles that are provably visible only from a single thread. Also, removing some barriers when it can be proven that threads can only store or only load certain fields. All this usually requires a fair amount of analysis.
複数スレッドがそれに依存する可能性がないことで，不要となったバリアの除去：例えば，単一スレッドからのみ見えることが保証される volatile変数．或いは，複数のスレッドが特定のフィールドをストアのみ，或いはロードのみすることが証明できた時の，幾つかのバリアの除去．これらは通常，かなりの量の解析を必要とする．

Miscellany(その他)

JSR-133 also addresses a few other issues that may entail barriers in more specialized cases:

またJSR-133は，より特殊な場合にバリアを引き起こすかもしれない，その他の幾つかの問題についても述べている．

Thread.start() requires barriers ensuring that the started thread sees all stores visible to the caller at the call point. Conversely, Thread.join() requires barriers ensuring that the caller sees all stores by the terminating thread. These are normally generated by the synchronization entailed in implementations of these constructs.
Thread.start()は呼び出し側スレッドの呼び出し時点での全てのストアされたデータの可視性を，開始されたスレッドに対して保証するためにバリアを要求する．逆に join()は終了したスレッドのストアが，呼び出し側に見えることを保証するためにバリアを要求する．これらは通常は，これらの構築物の実装に含まれる同期によって生成される．

Static final initialization requires StoreStore barriers that are normally entailed in mechanics needed to obey Java class loading and initialization rules.
static finalの初期化は StoreStoreバリアを要求する．通常これは， Javaのクラスローディングと初期化ルールに従うのに必要とされるメカニズムに含まれている．

Ensuring default zero/null initial field values normally entails barriers, synchronization, and/or low-level cache control within garbage collectors.
各フィールドの初期値のデフォルトのゼロ／nullの保証には，通常はGC内の低レベルのキャッシュコントロールと同期，バリアが含まれる．

JVM-private routines that "magically" set System.in, System.out, and System.err outside of constructors or static initializers need special attention since they are special legacy exceptions to JMM rules for final fields.
コンストラクタやstatic初期化子の外で System.in， System.out， System.errを「不思議な力」でセットするJVMプライベートなルーチンは， finalフィールドに関するJMMルールの特別にレガシーな例外であるため，特別な配慮を必要とする．

Similarly, internal JVM deserialization code that sets final fields normally requires a StoreStore barrier.
同様にデシリアライズ時にfinalフィールドを設定するJVMの内部的なコードは，通常 StoreStoreバリアを要求する．

Finalization support may require barriers (within garbage collectors) to ensure that Object.finalize code sees all stores to all fields prior to the objects becoming unreferenced. This is usually ensured via the synchronization used to add and remove references in reference queues.
ファイナライザのサポートには，オブジェクトが参照されなくなる以前にストアされた全てのフィールドの全てのストアを， Object.finalizeのコードから見えることを保証するために，（ガベージコレクタ内で）バリアを要求するかもしれない．これは通常は，参照を追加／削除する際に参照キュー内で使われている同期を経由して保証される．

Calls to and returns from JNI routines may require barriers, although this seems to be a quality of implementation issue.
これは実装品質の問題に思われるが，JNIルーチンの呼び出しとリターンはバリアを要求するかもしれない．

Most processors have other synchronizing instructions designed primarily for use with IO and OS actions. These don't impact JMM issues directly, but may be involved in IO, class loading, and dynamic code generation.
多くのプロセッサは，この他にもI/OやOS機能に使われることを前提として設計された同期命令を持つ．これらはJMM問題に直接的には影響を与えないが，I/O，クラスローディングや動的コード生成には関係するかもしれない．

Acknowledgments(謝辞)

Thanks to Bill Pugh, Dave Dice, Jeremy Manson, Kourosh Gharachorloo, Tim Harris, Cliff Click, Allan Kielstra, Yue Yang, Hans Boehm, Kevin Normoyle, Juergen Kreileder, Alexander Terekhov, Tom Deneau, Clark Verbrugge, and Peter Kessler for corrections and suggestions.

和訳版謝辞

記載予定

Toru Takahashi,Yasuhiro Endoh,

Doug Lea

Last modified: Thu Jul 13 12:55:03 EDT 2006