Journal of Behavioral Data Science, 2023, 3 (1), 34–58.
DOI: https://doi.org/10.35566/jbds/v3n1/ogasawara

On Some Known Derivations and New Ones for the Wishart Distribution: A Didactic

Haruhiko Ogasawara\(^{[0000-0002-5029-2086]}\)
Otaru University of Commerce, Otaru, Japan
emt-hogasa@emt.otaru-uc.ac.jp
Abstract. The proofs of the probability density function (pdf) of the Wishart distribution tend to be complicated with geometric viewpoints, tedious Jacobians and not self-contained algebra. In this paper, some known proofs and simple new ones for uncorrelated and correlated cases are provided with didactic explanations. For the new derivation of the uncorrelated case, an elementary direct derivation of the distribution of the Bartlett-decomposed matrix is provided. In the derivation of the correlated case from the uncorrelated one, simple methods including a new one are shown.

Keywords: Jacobian · Multivariate normality · Probability density function (pdf) · Triangular matrix · Bartlett decomposition.

1 Introduction

The Wishart distribution has been often used for the matrix of the squares and cross products of random vectors. In multivariate analysis or more specifically structural equation modeling (SEM), a modified log-likelihood of this distribution (see e.g., Ogasawara2016, Equation (2.8)) has been used probably as a gold-standard discrepancy function for estimation even under non-normality though the distribution is given under multivariate normality. In SEM, variations of the distribution are also used as priors for covariance matrices (Liu, Qu, Zhang, & Wu2022Zhang2021). The distribution has various extensions e.g., the inverted distribution (Anderson2003, Section 7.7), singular cases (Bodnar & Okhrin2008Mathai & Provost2022Srivastava2003), complex-valued ones (Srivastava & Khatri1979, Section 3.7; Mathai, Provost, & Haubold2022, Section 5.5), those with two different degrees of freedom (df’s) (Ogasawara2023b), the joint distributions of the Wishart matrix and normal vectors (Yonenaga2022) and cases under arbitrary distributions (Hsu1940; Srivastava & Khatri1979, Lemma 3.2.3; Olkin2002, Section 2).

Asymptotic results associated with the Wishart distribution are also of practical use. In SEM, the asymptotic standard errors of the Wishart maximum likelihood estimators for structural parameters are often used under normality or non-normality. In this situation, the large df is assumed. When the number of variables is also large under some condition as in high-dimensional data (see e.g., Yao, Zheng, & Bai2015), the limiting distribution of the eigenvalues of the Wishart matrix is given by the Marčenko and Pastur (1967, M-P) distribution (the author is indebted to an anonymous reviewer for this point). The M-P distribution gives a tool for the problems of the numbers of factors or components in SEM (Chen & Weng2023).

The probability density functions (pdf’s) of the Wishart distribution were given by Fisher (1915, p. 510) and Wishart (1928) for the bivariate and general multivariate cases, respectively. The derivations tend to be involved with geometric viewpoints (see e.g., Anderson2003, Section 7.2) or not self-contained algebra as criticized by Ghosh and Sinha (2002) (for the references of derivations see Srivastava & Khatri1979, p. 73 and Anderson2003, pp. 256-257). Khatri (1963) showed a brief derivation using an integral of the unity over the constant quadratic forms having the chi-square density. Ghosh and Sinha (2002) gave a self-contained concise proof of the Wishart density though it is an indirect method. In spite of frequent use of the Wishart density and its variations in SEM, the derivation of the pdf seems to be often intractable for beginning students/researchers. Probably, many of them use the Wishart pdf as if referencing a cook book without understanding the derivation, which is an undesirable situation. A relatively concise derivation is to use the characteristic function and its inversion (Wishart & Bartlett1933; Wilks1962, Section 18.2). However, this method requires the Fourier integral theorem or Levy’s inversion formula, which may be unfamiliar for beginners. In this paper, almost self-contained known proofs and new ones for the uncorrelated and correlated multivariate cases are shown with didactic explanations.

2 Proofs of the Wishart Distributions

2.1 The distribution of a lower-triangular matrix for the Wishart density

Suppose that in the random matrix \({\bf {X}} = \{ {X_{ij}}\} \,\,(i = 1,...,p;j = 1,...,n;p \le n)\), each column is multivariate normally distributed as \({{\rm {N}}_p}({\bf {0}},{{\bf {I}}_p})\) independent of the other columns with the population mean vector \(\bf {0}\) and covariance matrix \({\bf {I}}_p\) denoting the \(p \times p\) identity matrix. That is, all the elements of \(\bf {X}\) are mutually independently distributed as standard normal.

Let \({\bf {S}} \equiv {\bf {X}}{{\bf {X}}^{\rm {T}}} = {\bf {T}}{{\bf {T}}^{\rm {T}}}\) be Bartlett-decomposed such that \(\bf {T}\) is a \(p \times p\) lower-triangular matrix whose diagonal elements are positive. Define \({\bf {s}} = \\ ({s_{11}},\,{s_{21}},{s_{22}},...,{s_{p1}},...,{s_{pp}})^{\rm {T}}\) and \({\bf {t}} = {({t_{11}},\,{t_{21}},{t_{22}},...,{t_{p1}},...,{t_{pp}})^{\rm {T}}}\), where \(\bf {s}\) and \(\bf {t}\) are the \(\{ ({p^2} + p)/2\} \times {\kern 1pt} 1\) vectors of the non-duplicated elements of \(\bf {S}\) and the random elements of \(\bf {T}\), respectively. Let \(|\partial {\bf {s}}/\partial {{\bf {t}}^{\rm {T}}}{|_ + }\) (Srivastava & Khatri1979, p. 28) be the absolute value of the determinant of the Jacobian matrix for the transformation \({\bf {S}} \to {\bf {T}}\): \[\frac {{\partial {\bf {s}}}}{{\partial {{\bf {t}}^{\rm {T}}}}} = \left \{ {\frac {{\partial {s_{ij}}}}{{\partial {t_{kl}}}}} \right \}(p \ge i \ge j \ge 1;p \ge k \ge l \ge 1)\] using the double subscript notation for the rows of the elements of \(\bf {S}\) and columns for those of \({\bf {t}}^{\rm {T}}\) in \(\partial {\bf {s}}/\partial {{\bf {t}}^{\rm {T}}}\). Then, the Jacobian of the transformation is given by \(|\partial {\bf {s}}/\partial {{\bf {t}}^{\rm {T}}}{|_ + }\). For the proof of the Wishart distribution, the following lemmas are used.

Lemma 1 Suppose that each of \(2m\) variables \(X_{ik}\) and \(X_{jk}\) \((i \ne j;k = 1,...,m;\,\,\\m = 1,2,...)\) independently follows \({\rm {N}}(0,1) \equiv {{\rm {N}}_1}(0,1)\). Then, the distribution of \(\sum \nolimits _{k = 1}^m {{X_{ik}}{X_{jk}}}\) is the same as that of \({X_{il}}\sqrt {\sum \nolimits _{k = 1}^m {X_{jk}^2} } \,\,(i \ne j;\,\,l = 1,...,m)\).

Proof. When \(m = 1\), the equal distribution of \({X_{i1}}{X_{j1}}\) and \({X_{i1}}\sqrt {X_{j1}^2} \, = {X_{i1}}|{X_{j1}}|\) is given by the symmetric distribution of \({X_{i1}}{X_{j1}}\) about zero. For general cases, consider the moment generating functions (mgf’s). By definition, the mgf of \(\sum \nolimits _{k = 1}^m {{X_{ik}}{X_{jk}}}\) is \[\begin {array}{l} {\rm {E}}\left \{ {\exp \left ( {t\sum \nolimits _{k = 1}^m {{X_{ik}}{X_{jk}}} } \right )} \right \} = \prod \nolimits _{k = 1}^m {{\rm {E}}\left \{ {\exp (t{X_{ik}}{X_{jk}})} \right \}} \\ = \prod \nolimits _{k = 1}^m {\frac {1}{{2\pi }}} \int _{ - \infty }^\infty {\int _{ - \infty }^\infty {\exp \left ( {t{x_{ik}}{x_{jk}} - \frac {{x_{jk}^2}}{2}} \right )} } {\rm {d}}{x_{jk}}\exp \left ( { - \frac {{x_{ik}^2}}{2}} \right ){\rm {d}}{x_{ik}}\\ = \prod \nolimits _{k = 1}^m {\frac {1}{{\sqrt {2\pi } }}} \int _{ - \infty }^\infty {\int _{ - \infty }^\infty {\frac {1}{{\sqrt {2\pi } }}\exp \left \{ { - \frac {{{{({x_{jk}} - t{x_{ik}})}^2}}}{2}} \right \}} } {\rm {d}}{x_{jk}}\exp \left \{ { - \frac {{(1 - {t^2})x_{ik}^2}}{2}} \right \}{\rm {d}}{x_{ik}}\\ = \prod \nolimits _{k = 1}^m {\frac {1}{{\sqrt {2\pi } }}} \int _{ - \infty }^\infty {\exp \left \{ { - \frac {{(1 - {t^2})x_{ik}^2}}{2}} \right \}{\rm {d}}{x_{ik}}} \\ = {(1 - {t^2})^{ - m/2}}\,\,(|t|\,\, < 1). \end {array}\] On the other hand, the mgf of \({X_{il}}\sqrt {\sum \nolimits _{k = 1}^m {X_{jk}^2} } \,\) is \[\begin {array}{l} {\rm {E}}\exp \left ( {t{X_{il}}\sqrt {\sum \nolimits _{k = 1}^m {X_{jk}^2} } } \right )\,\\ = \frac {1}{{{{(2\pi )}^{(m + 1)/2}}}}\int _{ - \infty }^\infty { \cdots \int _{ - \infty }^\infty {\exp \left ( {t{x_{il}}\sqrt {\sum \nolimits _{k = 1}^m {x_{jk}^2} } - \frac {{x_{il}^2}}{2} - \frac {{\sum \nolimits _{k = 1}^m {x_{jk}^2} }}{2}} \right )} }\\ \,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\, \times {\rm {d}}{x_{il}}{\rm {d}}{x_{j1}} \cdots {\rm {d}}{x_{jm}}\\ = \frac {1}{{{{(2\pi )}^{m/2}}}}\int _{ - \infty }^\infty { \cdots \int _{ - \infty }^\infty {\frac {1}{{{{(2\pi )}^{1/2}}}}\exp \left \{ { - {{\left ( {{x_{il}} - t\sqrt {\sum \nolimits _{k = 1}^m {x_{jk}^2} } } \right )}^2}/2} \right \}} } \,{\rm {d}}{x_{il}}\\ \,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\, \times {\rm {exp}}\left \{ { - (1 - {t^2})\sum \nolimits _{k = 1}^m {x_{jk}^2} /2} \right \}{\rm {d}}{x_{j1}} \cdots {\rm {d}}{x_{jm}}\\ = \frac {1}{{{{(2\pi )}^{m/2}}}}\int _{ - \infty }^\infty { \cdots \int _{ - \infty }^\infty {{\rm {exp}}\left \{ { - (1 - {t^2})\sum \nolimits _{k = 1}^m {x_{jk}^2} /2} \right \}{\rm {d}}{x_{j1}} \cdots {\rm {d}}{x_{jm}}} } \\ = {(1 - {t^2})^{ - m/2}}\,\,(|t|\,\, < 1). \end {array}\] It is found that the above two mgf’s are the same, which shows the same distribution of \(\sum \nolimits _{k = 1}^m {{X_{ik}}{X_{jk}}}\) and \({X_{il}}\sqrt {\sum \nolimits _{k = 1}^m {X_{jk}^2} } \,\,(i \ne j;\,\,l = 1,...,m)\). \(\sqcap \)\(\sqcup \)

The second proof using the pdf of the chi-distribution is given in the supplement to this paper (Ogasawara2023a).

Lemma 2 (Deemer & Olkin1951, Theorem 4.1; Srivastava & Khatri1979, Exercise 1.28 (i); Muirhead1982, Theorem 2.1.9; Anderson2003, p. 255). The Jacobian of the transformation \({\bf {S}} \to {\bf {T}}\) is \[|\partial {\bf {s}}/\partial {{\bf {t}}^{\rm {T}}}{|_ + } = {2^p}\prod \nolimits _{i = 1}^p {t_{ii}^{p - i + 1}}. \]

Proof. Deemer and Olkin (1951) derived the result as a special case of another general theorem. Muirhead (1982) used the exterior product while an essential standard proof was given by Anderson (2003). The derivation is given here by induction. When \(p = 1\), \(|\partial {\bf {s}}/\partial {{\bf {t}}^{\rm {T}}}{|_ + } = {\rm {d}}{s_{11}}/{\rm {d}}{t_{11}}\) \(= {\rm {d}}t_{11}^2/{\rm {d}}{t_{11}} = 2{t_{11}} > 0\) showing that the above result holds. Assume that the result holds when \(p = {p^*}\) i.e., \(|\partial {\bf {s}}/\partial {{\bf {t}}^{\rm {T}}}{|_ + }\) \( = {2^{{p^*}}}\prod \nolimits _{i = 1}^{{p^*}} {t_{ii}^{{p^*} - i + 1}} ({p^*} \ge 1)\). When \(p = {p^*} + 1\), the elements \({s_{p* + 1,1}},{s_{p* + 1,2}},...,{s_{p* + 1,p* + 1}}\) are added to \(\bf {s}\) at its end. Similarly, \(\\ {t_{p* + 1,1}},{t_{p* + 1,2}},...,{t_{p* + 1,p* + 1}}\) are added to \({\bf {t}}^{\rm {T}}\). Noting that \({s_{ij}} = \sum \limits _{k = 1}^j {{t_{ik}}{t_{jk}}} \,\,(p \ge i \ge j \ge 1)\), we find that \(\partial {\bf {s}}/\partial {{\bf {t}}^{\rm {T}}}\) is a lower-triangular matrix. Consequently, the added factor in \(|\partial {\bf {s}}/\partial {{\bf {t}}^{\rm {T}}}{|_ + }\) when \(p = {p^*} + 1\) over when \(p = {p^*}\) is given by the product of the added diagonal elements: \[\frac {{\partial {s_{p* + 1,1}}}}{{\partial {t_{p* + 1,1}}}}\frac {{\partial {s_{p* + 1,2}}}}{{\partial {t_{p* + 1,2}}}} \cdots \frac {{\partial {s_{p* + 1,p*}}}}{{\partial {t_{p* + 1,p*}}}}\frac {{\partial {s_{p* + 1,p* + 1}}}}{{\partial {t_{p* + 1,p* + 1}}}} = {t_{11}}{t_{22}} \cdots {t_{p*p*}}2{t_{p* + 1,p* + 1}}.\] That is, \(|\partial {\bf {s}}/\partial {{\bf {t}}^{\rm {T}}}{|_ + }\) becomes \[2^{{p^*}}\left ( {\prod \nolimits _{i = 1}^{{p^*}} {t_{ii}^{{p^*} - i + 1}} } \right ){t_{11}}{t_{22}} \cdots {t_{p*p*}}2{t_{p* + 1,p* + 1}} = {2^{{p^*} + 1}}\prod \nolimits _{i = 1}^{{p^*} + 1} {t_{ii}^{{p^*} + 1 - i + 1}}, \] which shows that the formula \(|\partial {\bf {s}}/\partial {{\bf {t}}^{\rm {T}}}{|_ + } = {2^p}\prod \nolimits _{i = 1}^p {t_{ii}^{p - i + 1}}\) holds when \(p = {p^*} + 1\) indicating the required result. \(\sqcap \)\(\sqcup \)

In the following theorem for a known Wishart density, we use \({\Gamma _p}(n/2)\) \(\equiv {\pi ^{p(p - 1)/4}}\prod \nolimits _{i = 1}^p {\Gamma \{ (n - i + 1)/2\} } \) i.e., the \(p\)-variate Gamma function (Anderson2003, Definition 7.2.1; Subsection 7.2, Equation (18); see also DLMF2021, Section 35.3, https://dlmf.nist.gov/35.3), where \(\Gamma (k) = \int _0^\infty {{v^{k - 1}}\exp ( - v){\rm {d}}v} \,\,(k > 0)\) is the usual gamma function.

Theorem 1 Under the condition that the n columns of \(\bf {X}\) independently follow \({{\rm {N}}_p}({\bf {0}},{{\bf {I}}_p})\), the pdf of the Wishart distributed \(\bf {S}\) is given by \[{w_p}({\bf {S}}|{{\bf {I}}_p},n) = \frac {{\exp \{ - {\rm {tr}}({\bf {S}})/2\} |{\bf {S}}{|^{(n - p - 1)/2}}}}{{{2^{np/2}}{\Gamma _p}(n/2)}}\,\,(n \ge p).\]

Proof. Consider the case of \({t_{ij}} = {X_{ij}}\) and \({t_{ii}} = \sqrt {\sum \nolimits _{k = i}^n {X_{ik}^2} } \,(i = 1,...,p;j = 1,...,i - 1)\). Since \({X_{ij}}(i = 1,...,p;j = 1,...,n)\) are mutually independent, \({t_{ij}}\,\,(i = 1,...,p;j = 1,...,i)\) are independent. Note that \({({\bf {T}}{{\bf {T}}^{\rm {T}}})_{ii}} = \sum \nolimits _{j = 1}^i {t_{ij}^2} = {({\bf {X}}{{\bf {X}}^{\rm {T}}})_{ii}}\) \((i = 1,...,p)\) are independently chi-square distributed with \(n\) df, where \(( \cdot )_{ij}\) is the \((i, j)\)-th element of a matrix; and \(t_{ii}\) is chi-distributed with \(n - i + 1\) df. Further, Lemma 1 shows that the distributions of the off-diagonal elements \({({\bf {T}}{{\bf {T}}^{\rm {T}}})_{ij}} = \sum \nolimits _{k = 1}^j {{t_{ik}}{t_{jk}}}\) and \({({\bf {X}}{{\bf {X}}^{\rm {T}}})_{ij}}(p \ge i > j \ge 1)\) using \(t_{jj}\) and \({t_{ij}}\,(i = 1,...,p;j = 1,...,i - 1)\) are the same. That is, the distribution of \({\bf {S}} = {\bf {X}}{{\bf {X}}^{\rm {T}}}\) and \({\bf {T}}{{\bf {T}}^{\rm {T}}}\) are the same when \({t_{ij}}\,\,(i = 1,...,p;j = 1,...,i)\) are distributed as above. The pdf of the constructed \(t_{ij}\)’s \((p \ge i \ge j \ge 1)\) denoted by \({f_p}({\bf {T}})\) becomes \[\begin {array}{l} {f_p}({\bf {T}}) = \left [ {\prod \limits _{i = 1}^p {\cfrac {{t_{ii}^{n - i}\exp ( - t_{ii}^2/2)}}{{{2^{\{ (n - i + 1)/2\} - 1}}\Gamma \{ (n - i + 1)/2\} }}} } \right ] \\ \,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\, \times \cfrac {1}{{{{(\sqrt {2\pi } )}^{({p^2} - p)/2}}}}\left \{ {\prod \limits _{p \ge i > j \ge 1} {\exp \left ( { - t_{ij}^2/2} \right )} } \right \}\\ = \cfrac {{\left \{ {\prod \limits _{i = 1}^p {t_{ii}^{n - i}\exp ( - t_{ii}^2/2)} } \right \}\left \{ {\prod \limits _{p \ge i > j \ge 1} {\exp \left ( { - t_{ij}^2/2} \right )} } \right \}}}{{{2^{\frac {{(n + 1)p}}{2} - \frac {{p(p + 1)}}{4} - p}} \times {2^{\frac {{p(p - 1)}}{4}}}{\pi ^{\frac {{p(p - 1)}}{4}}}\prod \limits _{i = 1}^p {\Gamma \{ (n - i + 1)/2\} } }} \\ = \cfrac {{\left ( {\prod \limits _{i = 1}^p {t_{ii}^{n - i}} } \right )\exp \{ - {\rm {tr}}({\bf {T}}{{\bf {T}}^{\rm {T}}})/2\} }}{{{2^{\frac {{np}}{2} - p}}{\Gamma _p}(n/2)}}. \end {array}\] In the above expression, the pdf of the chi-distributed \(t_{ii}\) with \(k\) df denoted by \({f_\chi }({t_{ii}}|k)\) is given by that of the chi-square distributed \(u = t_{ii}^2\) with \(k\) df i.e., \({f_{{\chi ^2}}}(u|k) = \cfrac {{{u^{(k/2) - 1}}}}{{{2^{k/2}}\Gamma (k/2)}}\exp ( - u/2)\) with the Jacobian \({\rm {d}}u/{\rm {d}}{t_{ii}} = 2{t_{ii}}\), yielding \[{f_\chi }({t_{ii}}|k) = \frac {{{u^{(k/2) - 1}}}}{{{2^{k/2}}\Gamma (k/2)}}\exp ( - u/2)\frac {{{\rm {d}}u}}{{{\rm {d}}{t_{ii}}}} = \frac {{t_{ii}^{(n - i + 1) - 2 + 1}\exp ( - t_{ii}^2/2)}}{{{2^{(n - i + 1)/2 - 1}}\Gamma \{ (n - i + 1)/2\} }}\] as shown earlier, when \(u = t_{ii}^2\) and \(k = n - i + 1\).

Consider the transformation \({\bf {T}} \to {\bf {S}}\) in \({\bf {S}} = {\bf {X}}{{\bf {X}}^{\rm {T}}} = {\bf {T}}{{\bf {T}}^{\rm {T}}}\). The Jacobian \(J({\bf {T}} \to {\bf {S}})\) of this transformation is given by the reciprocal of \(J({\bf {S}} \to {\bf {T}})\) obtained in Lemma 2 as \(J({\bf {T}} \to {\bf {S}}) = 1/|\partial {\bf {s}}/\partial {{\bf {t}}^{\rm {T}}}{|_ + } = {\left ( {{2^p}\prod \nolimits _{i = 1}^p {t_{ii}^{p - i + 1}} } \right )^{ - 1}}\). Consequently, using \(|{\bf {S}}{|^{1/2}}\, = \,|{\bf {T}}|\,\, = {t_{11}} \cdots {t_{pp}}\) the pdf of \(\bf {S}\) becomes \[\begin {array}{l} {w_p}({\bf {S}}|{{\bf {I}}_p},n) = {f_p}({\bf {T}})J({\bf {T}} \to {\bf {S}})\\ = \cfrac {{\left ( {\prod \nolimits _{i = 1}^p {t_{ii}^{n - i}} } \right )\exp \{ - {\rm {tr}}({\bf {T}}{{\bf {T}}^{\rm {T}}})/2\} }}{{{2^{\frac {{np}}{2} - p}}{\Gamma _p}(n/2){2^p}\prod \nolimits _{i = 1}^p {t_{ii}^{p - i + 1}} }} = \cfrac {{\exp \{ - {\rm {tr}}({\bf {S}})/2\} |{\bf {S}}{|^{(n - p - 1)/2}}}}{{{2^{np/2}}{\Gamma _p}(n/2)}}. \end {array}\] \(\sqcap \)\(\sqcup \)

Remark 1 The pdf of \(t_{ij}\)’s \((p \ge i \ge j \ge 1)\) i.e., \({f_p}({\bf {T}})\) given above using Lemma 1 is algebraically equal to those of Anderson (2003, Equation (6), p. 253, Corollary 7.2.1), Wijsman (1957, Equation (12)) and Kshirsagar (1959, Remarks). However, a typical derivation by e.g., Anderson is an indirect one using orthogonalization and the conditional normal density. Since Anderson’s derivation seems to give some complicated impressions for beginning students/researchers though it is almost self-contained, the corresponding didactic explanation of his derivation is given below. Anderson (2003, Equation (2), p. 252) defined the \(n\)-dimensional independent random vectors \({{\bf {v}}_i} \sim {{\rm {N}}_n}({\bf {0}},{{\bf {I}}_n})\,\,(i = 1,...,p)\) with \[{\bf {X = }}\left ( \begin {array}{l} {\bf {v}}_1^{\rm {T}}\\ \, \vdots \\ {\bf {v}}_p^{\rm {T}} \end {array} \right ).\] Then, the Gram-Schmidt sequential orthogonalization is employed (Anderson2003, Equation (3), p. 253) as \[{{\bf {w}}_i} = {{\bf {v}}_i} - \sum \limits _{j = 1}^{i - 1} {{{\bf {w}}_j}\frac {{{\bf {w}}_j^{\rm {T}}{{\bf {v}}_i}}}{{{\bf {w}}_j^{\rm {T}}{{\bf {w}}_j}}}\,} (i = 2,...,p)\,\,\, \rm {and}\,\,\, {{\bf {w}}_1} = {{\bf {v}}_1}, \] where he used the expression \({\bf {v}}_j^{\rm {T}}{{\bf {w}}_j}\) for the denominator \({\bf {w}}_j^{\rm {T}}{{\bf {w}}_j}\). Though \({\bf {v}}_j^{\rm {T}}{{\bf {w}}_j} = {\bf {w}}_j^{\rm {T}}{{\bf {w}}_j}\) \((j = 1,...,i)\) as will become apparent, \({\bf {w}}_j^{\rm {T}}{{\bf {w}}_j}\) may be more natural and appropriate. While he included the short derivation of the orthogonality among \({\bf {w}}_i\)’s by induction, it is repeated here with some added explanations. When \(i\) = 2, we have \[{\bf {w}}_2^{\rm {T}}{{\bf {w}}_1} = {\{ {{\bf {v}}_2} - {{\bf {w}}_1}{({\bf {w}}_1^{\rm {T}}{{\bf {w}}_1})^{ - 1}}{\bf {w}}_1^{\rm {T}}{{\bf {v}}_2}\} ^{\rm {T}}}{{\bf {w}}_1} = {{\bf {v}}_2}^{\rm {T}}{{\bf {w}}_1} - {{\bf {v}}_2}^{\rm {T}}{{\bf {w}}_1}{({\bf {w}}_1^{\rm {T}}{{\bf {w}}_1})^{ - 1}}{\bf {w}}_1^{\rm {T}}{{\bf {w}}_1} = 0\] showing the orthogonality. Suppose that \[{\bf {w}}_j^{\rm {T}}{{\bf {w}}_k}\,\, = 0\,\,(j,k = 1,...,i - 1;\,\,j \ne k)\] hold. Then, we have \[\begin {array}{l} {\bf {w}}_k^{\rm {T}}{{\bf {w}}_i} = {\bf {w}}_k^{\rm {T}}\left ( {{{\bf {v}}_i} - \sum \limits _{j = 1}^{i - 1} {{{\bf {w}}_j}\cfrac {{{\bf {w}}_j^{\rm {T}}{{\bf {v}}_i}}}{{{\bf {w}}_j^{\rm {T}}{{\bf {w}}_j}}}\,} } \right ) = {\bf {w}}_k^{\rm {T}}{{\bf {v}}_i} - \sum \limits _{j = 1}^{i - 1} {{\bf {w}}_k^{\rm {T}}{{\bf {w}}_j}\cfrac {{{\bf {w}}_j^{\rm {T}}{{\bf {v}}_i}}}{{{\bf {w}}_j^{\rm {T}}{{\bf {w}}_j}}}\,} \\ \,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\, = {\bf {w}}_k^{\rm {T}}{{\bf {v}}_i} - {\bf {w}}_k^{\rm {T}}{{\bf {w}}_k}\cfrac {{{\bf {w}}_k^{\rm {T}}{{\bf {v}}_i}}}{{{\bf {w}}_k^{\rm {T}}{{\bf {w}}_k}}} = 0\,\,(i = 2,...,p;\,\,k = 1,...,i - 1), \end {array}\] due to the assumption \({\bf {w}}_j^{\rm {T}}{{\bf {w}}_k}\,\, = 0\,\,(j,k = 1,...,i - 1;\,\,j \ne k)\), showing the required result \({\bf {w}}_j^{\rm {T}}{{\bf {w}}_k}\,\, = 0\,\,(j,k = 1,...,i;\,\,j \ne k)\). Recall that \({\bf {v}}_j^{\rm {T}}{{\bf {w}}_j} = {\bf {w}}_j^{\rm {T}}{{\bf {w}}_j}\,\,(j = 1,...,i)\) mentioned earlier, which is obtained by \({\bf {w}}_j^{\rm {T}}{{\bf {w}}_k}\,\, = 0\,\,(j,k = 1,...,i;\,\,j \ne k)\) and \({{\bf {w}}_i} = {{\bf {v}}_i} - \sum \limits _{j = 1}^{i - 1} {{{\bf {w}}_j}\cfrac {{{\bf {w}}_j^{\rm {T}}{{\bf {v}}_i}}}{{{\bf {w}}_j^{\rm {T}}{{\bf {w}}_j}}}\,}\) \((i = 2,...,p)\).

The orthogonalization procedure is re-expressed by \[\begin {array}{l} {{\bf {w}}_i} = {{\bf {v}}_i} - \sum \limits _{j = 1}^{i - 1} {{{\bf {w}}_j}\cfrac {{{\bf {w}}_j^{\rm {T}}{{\bf {v}}_i}}}{{{\bf {w}}_j^{\rm {T}}{{\bf {w}}_j}}}\,} \\ \,\,\,\,\,\,\, = {{\bf {v}}_i} - ({{\bf {w}}_1},...,{{\bf {w}}_{i - 1}}){\rm {diag}}\{ {({\bf {w}}_1^{\rm {T}}{{\bf {w}}_1})^{ - 1}},...,{({\bf {w}}_{i - 1}^{\rm {T}}{{\bf {w}}_{i - 1}})^{ - 1}}\} {({{\bf {w}}_1},...,{{\bf {w}}_{i - 1}})^{\rm {T}}}{{\bf {v}}_i}\\ \,\,\,\,\,\,\, \equiv {{\bf {v}}_i} - {{\bf {P}}_{{{\bf {W}}_{i - 1}}}}{{\bf {v}}_i} = ({{\bf {I}}_n} - {{\bf {P}}_{{{\bf {W}}_{i - 1}}}}){{\bf {v}}_i} \equiv {{\bf {Q}}_{{{\bf {W}}_{i - 1}}}}{{\bf {v}}_i}\,\,(i = 2,...,p), \end {array}\] where \({{\bf {P}}_{{{\bf {W}}_{i - 1}}}} \equiv {{\bf {W}}_{i - 1}}{({\bf {W}}_{i - 1}^{\rm {T}}{{\bf {W}}_{i - 1}})^{ - 1}}{\bf {W}}_{i - 1}^{\rm {T}}\) is the idempotent (i.e., \({\bf {P}}_{{{\bf {W}}_{i - 1}}}^2 = {{\bf {P}}_{{{\bf {W}}_{i - 1}}}}\)) and symmetric projection matrix transforming or projecting \({\bf {v}}_i\) onto the space spanned by the columns of \({{\bf {W}}_{i - 1}} \equiv ({{\bf {w}}_1},...,{{\bf {w}}_{i - 1}})\) of full column rank by assumption; and \({{\bf {Q}}_{{{\bf {W}}_{i - 1}}}} = {{\bf {I}}_n} - {{\bf {P}}_{{{\bf {W}}_{i - 1}}}}\) is also an idempotent and symmetric projection matrix yielding the residual vector \({{\bf {v}}_i} - {{\bf {P}}_{{{\bf {W}}_{i - 1}}}}{{\bf {v}}_i}\) or the projected vector on the space orthogonal to the column space of \({\bf {W}}_{i - 1}\) with \({{\bf {v}}_i} = {{\bf {P}}_{{{\bf {W}}_{i - 1}}}}{{\bf {v}}_i} + {{\bf {Q}}_{{{\bf {W}}_{i - 1}}}}{{\bf {v}}_i}\). Anderson (2003, p. 252) stated that “\({\bf {w}}_i\) is the vector from \({\bf {v}}_i\) to the projection on \({{\bf {w}}_1},...,{{\bf {w}}_{i - 1}}\)” with his Figure 7.1. He repeatedly stressed the equivalence of the column space of \({\bf {W}}_{i - 1}\) and that of \({{\bf {v}}_1},...,{{\bf {v}}_{i - 1}}\) in our expression.

Using the constructed \({{\bf {w}}_1},...,{{\bf {w}}_{i - 1}}\) by the Gram-Schmidt orthogonalization or projection, Anderson (2003, p. 252) defined \[{t_{ii}} = \,||{{\bf {w}}_i}||\, = \,\sqrt {{\bf {w}}_i^{\rm {T}}{{\bf {w}}_i}} \,\,(i = 1,...,p)\] and \[{t_{ij}} = {\bf {v}}_i^{\rm {T}}{{\bf {w}}_j}/\,||{{\bf {w}}_j}||\,\,\,(i = 2,...,p;\,j = 1,...,i - 1),\] which may be uniformly expressed by \({t_{ij}} = {\bf {v}}_i^{\rm {T}}{{\bf {w}}_j}/\,||{{\bf {w}}_j}||\, = \,\,(i = 2,...,p;\,j = 1,...,i)\) due to \({\bf {v}}_j^{\rm {T}}{{\bf {w}}_j} = {\bf {w}}_j^{\rm {T}}{{\bf {w}}_j}\,(\,j = 1,...,i)\) mentioned earlier. Then, noting that \({{\bf {w}}_i} = {{\bf {v}}_i} - {{\bf {P}}_{{{\bf {W}}_{i - 1}}}}{{\bf {v}}_i}\), we have \[{{\bf {v}}_i} = {{\bf {w}}_i} + {{\bf {P}}_{{{\bf {W}}_{i - 1}}}}{{\bf {v}}_i} = {{\bf {w}}_i} + \sum \limits _{j = 1}^{i - 1} {\frac {{{{\bf {w}}_j}{\bf {w}}_j^{\rm {T}}}}{{{\bf {w}}_j^{\rm {T}}{{\bf {w}}_j}}}} {{\bf {v}}_i} = \sum \limits _{j = 1}^i {\frac {{{\bf {w}}_j^{\rm {T}}{{\bf {v}}_i}}}{{||{{\bf {w}}_j}|{|^2}}}{{\bf {w}}_j}} = \sum \limits _{j = 1}^i {\frac {{{t_{ij}}}}{{||{{\bf {w}}_j}||}}} {{\bf {w}}_j}\] and \[\begin {array}{l} {({\bf {X}}{{\bf {X}}^{\rm {T}}})_{ij}} = {\bf {v}}_i^{\rm {T}}{{\bf {v}}_j} = \left \{ {\sum \limits _{k = 1}^i {\cfrac {{{t_{ik}}}}{{||{{\bf {w}}_k}||}}{\bf {w}}_k^{\rm {T}}} } \right \}\sum \limits _{k = 1}^j {\cfrac {{{t_{jk}}}}{{||{{\bf {w}}_k}||}}{{\bf {w}}_k}} \\ \,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\, = \sum \limits _{k = 1}^j {\cfrac {{{t_{ik}}}}{{||{{\bf {w}}_k}||}}{\bf {w}}_k^{\rm {T}}} {{\bf {w}}_k}\cfrac {{{t_{jk}}}}{{||{{\bf {w}}_k}||}} = \sum \limits _{k = 1}^j {{t_{ik}}{t_{jk}}\,\,} (p \ge i \ge j \ge 1) \end {array}\] (Anderson2003, p. 252). In \({{\bf {v}}_i} = \sum \limits _{j = 1}^i {\cfrac {{{t_{ij}}}}{{||{{\bf {w}}_j}||}}} {{\bf {w}}_j}\,(i = 1,...,p)\), \({{\bf {w}}_j}/\,||{{\bf {w}}_j}||\,\,(j = 1,...,i - 1)\) is seen as the unit-norm vector representing the direction for the \(j\)-th coordinate in the \(i - 1\) coordinates given by \({{\bf {w}}_1},...,{{\bf {w}}_{i - 1}}\). He stated that “\({t_{ij}},j = 1,...,i - 1\) are the first \(i - 1\) coordinates in the coordinate system with \({{\bf {w}}_1},...,{{\bf {w}}_{i - 1}}\) as the first coordinates axes” (p. 252). We also find that \(t_{ij}\) is \(||{{\bf {w}}_j}||\) times the regression coefficient \(b_{ij}\) for \({\bf {v}}_i\) on \({\bf {w}}_j\) since \[\begin {array}{l} {t_{ij}} = {\bf {v}}_i^{\rm {T}}{{\bf {w}}_j}/\,||{{\bf {w}}_j}||\, = ({\bf {v}}_i^{\rm {T}}{{\bf {w}}_j}\,/{\bf {w}}_j^{\rm {T}}{{\bf {w}}_j}\,)\,||{{\bf {w}}_j}||\,\\ \,\,\,\,\,\,\,\, = {b_{ij}}||{{\bf {w}}_j}||(i = 2,...,p;\,j = 1,...,i - 1). \end {array}\]

The properties of the normality of \({t_{ij}} = {\bf {v}}_i^{\rm {T}}{{\bf {w}}_j}/\,||{{\bf {w}}_j}||\,\,\,(i = 2,...,p;\,j = 1,...,i - 1)\) and their mutual independence shown by Anderson are based on the normality of the conditional distribution of the multivariate normal when \({{\bf {w}}_j}(j = 1,...,i - 1)\) are given and orthogonal transformation in \({t_{ij}} = {\bf {v}}_i^{\rm {T}}{{\bf {w}}_j}/\,||{{\bf {w}}_j}||\,\,\,(i = 2,...,p;\,j = 1,...,i - 1)\). That is, the standard normally-distributed variables \({t_{ij}} = {\bf {v}}_i^{\rm {T}}{{\bf {w}}_j}/\,||{{\bf {w}}_j}||\) do not depend on \({{\bf {w}}_1},...,{{\bf {w}}_{i - 1}}\) indicating independence with \({({{\bf {w}}_j}/\,||{{\bf {w}}_j}||)^{\rm {T}}}{{\bf {w}}_k}/\,||{{\bf {w}}_k}||\) \( = {\delta _{jk}}\,\,\,(j,k = 1,...,i - 1)\), where \(\delta _{jk}\) is the Kronecker delta with \({\delta _{jj}} = 1\) and \({\delta _{jk}} = 0\,\,(j \ne k)\) (Anderson2003, Theorem 3.3.1). The independent property of \(t_{ii}\)’s is given by \({t_{ii}} = {\left \{ {{{({\bf {X}}{{\bf {X}}^{\rm {T}}}{\rm {)}}}_{ii}} - \sum \nolimits _{j = 1}^{i - 1} {t_{ij}^2} } \right \}^{1/2}}\). Although the same result as shown above by the didactic explanation of Anderson’s derivation is directly given by Lemma 1, the two methods may be insightful with compensatory properties. [end of Remark 1]

2.2 The Wishart density for general correlated cases

For the correlated cases, four lemmas are provided. Lemma 3 is for three Jacobians in the product of two lower-triangular matrices, where the first Jacobian was used by Anderson (2003, Theorem 7.2.2) to derive the Wishart density for general correlated cases while the remaining two are given for generality with didactic purposes. Lemmas 4 and 5 are provided for the Jacobians in two alternative derivations of the general Wishart density. The proof of Lemma 6 associated with sufficient statistics is based on Ghosh and Sinha (2002).

Lemma 3 Suppose that \({\bf {A}} = {\bf {BC}}\), where \({\bf {A}},\,\,{\bf {B}}\) and \(\bf {C}\) are \(p \times p\) lower-triangular matrices. Consider the variable transformation from the non-zero elements of \(\bf {C}\) or \(\bf {B}\) to those of \(\bf {A}\). Then, the Jacobians \(J({\bf {C}} \to {\bf {A}})\) and \(J({\bf {B}} \to {\bf {A}})\) are \(\left | {\prod \nolimits _{i = 1}^p {b_{ii}^i} } \right |^{ - 1}\) and \(\left | {\prod \nolimits _{i = 1}^p {c_{ii}^{p - i + 1}} } \right |^{ - 1}\), respectively. When \({\bf {B}} = {\bf {C}}\), \(J({\bf {B}} \to {\bf {A}}) = {\left | {\prod \nolimits _{i = 1}^p {\prod \nolimits _{j = 1}^i {({b_{ii}} + {b_{jj}})} } } \right |^{ - 1}}\).

Proof. Note that Anderson (2003, p. 254) gave \(J({\bf {C}} \to {\bf {A}})\). Since \({a_{ij}} = \sum \nolimits _{k = j}^i {{b_{ik}}{c_{kj}}} \,\, \\(p \ge i \ge j \ge 1)\), we have

\[\left [ \begin {array}{l} {a_{11}}\\ {a_{21}}\\ {a_{22}}\\ \, \vdots \\ {a_{p1}}\\ \, \vdots \\ {a_{pm}} \end {array} \right ] = \left [ \begin {array}{l} {b_{11}}\,\,0\,\,\,\,\,0\,\,\,\,\, \cdots \,\,0\,\,\,\,\, \cdots \,\,\,\,0\\ {*}\,\,\,\,\,{b_{22}}\,\,0\,\,\,\,\, \cdots \,\,0\,\,\,\,\, \cdots \,\,\,\,0\\ {*}\,\,\,\,\,{*}\,\,\,\,\,{b_{22}}\,\, \cdots \,\,0\,\,\,\,\,\, \cdots \,\,\,0\\ \, \vdots \,\,\,\,\,\, \vdots \,\,\,\,\,\, \vdots \,\,\,\,\,\,\,\,\,\,\,\,\, \vdots \,\,\,\,\,\,\,\,\,\,\,\,\,\,\, \vdots \\ {*}\,\,\,\,\,{*}\,\,\,\,\,{*}\,\,\,\, \cdots \,\,\,\,{b_{pp}}\,\, \cdots \,\,\,0\\ \, \vdots \,\,\,\,\,\, \vdots \,\,\,\,\,\, \vdots \,\,\,\,\,\,\,\,\,\,\,\,\, \vdots \,\,\,\,\,\,\,\,\,\,\,\,\,\,\, \vdots \\ {*}\,\,\,\,\,{*}\,\,\,\,\,{*}\,\,\,\, \cdots \,\,\,\,{*}\,\,\,\,\,\, \cdots \,\,\,{b_{pp}} \end {array} \right ]\left [ \begin {array}{l} {c_{11}}\\ {c_{21}}\\ {c_{22}}\\ \, \vdots \\ {c_{p1}}\\ \, \vdots \\ {c_{pp}} \end {array} \right ],\] where the diagonal element of the lower-triangular matrix corresponding to the row for \(a_{ij}\) and the column for \(c_{ij}\) is \({b_{ii}}\,\,(p \ge i \ge j \ge 1)\); the asterisks indicate zero or non-zero elements; and \[\left [ \begin {array}{l} {a_{11}}\\ {a_{21}}\\ {a_{22}}\\ \, \vdots \\ {a_{p1}}\\ \, \vdots \\ {a_{pp}} \end {array} \right ] = \left [ \begin {array}{l} {c_{11}}\,\,0\,\,\,\,\,0\,\,\,\,\, \cdots \,\,0\,\,\,\,\, \cdots \,\,\,\,0\\ {*}\,\,\,\,\,{c_{11}}\,\,0\,\,\,\,\, \cdots \,\,0\,\,\,\,\, \cdots \,\,\,\,0\\ {*}\,\,\,\,\,{*}\,\,\,\,\,{c_{22}}\,\, \cdots \,\,0\,\,\,\,\,\, \cdots \,\,\,0\\ \, \vdots \,\,\,\,\,\, \vdots \,\,\,\,\,\, \vdots \,\,\,\,\,\,\,\,\,\,\, \vdots \,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\, \vdots \\ {*}\,\,\,\,\,{*}\,\,\,\,\,{*}\,\,\,\, \cdots \,\,\,\,{c_{11}}\, \cdots \,\,\,\,0\\ \, \vdots \,\,\,\,\,\, \vdots \,\,\,\,\,\, \vdots \,\,\,\,\,\,\,\,\,\,\,\, \vdots \,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\, \vdots \\ {*}\,\,\,\,\,{*}\,\,\,\,\,{*}\,\,\,\, \cdots \,\,\,\,{*}\,\,\,\,\,\, \cdots \,\,\,{c_{pp}} \end {array} \right ]\left [ \begin {array}{l} {b_{11}}\\ {b_{21}}\\ {b_{22}}\\ \, \vdots \\ {b_{p1}}\\ \, \vdots \\ {b_{pp}} \end {array} \right ],\] where the corresponding diagonal element for \(a_{ij}\) and \(b_{ij}\) is \({c_{jj}}\,\,(p \ge i \ge j \ge 1)\). Since the inverses of the Jacobian matrices for \(J({\bf {C}} \to {\bf {A}})\) and \(J({\bf {B}} \to {\bf {A}})\) on the right-hand sides of the above equations are lower-triangular, the Jacobians become the reciprocals of the absolute values of the determinants i.e., \(\prod \nolimits _{i = 1}^p {b_{ii}^i}\) and \(\prod \nolimits _{i = 1}^p {c_{ii}^{p - i + 1}}\), respectively. The result when \({\bf {B}} = {\bf {C}}\) is obtained by the reciprocal of the determinant of the sum of the two lower-triangular matrices. \(\sqcap \)\(\sqcup \)

Lemma 4 Suppose that \({\bf {A}} = {\bf {BC}}{{\bf {B}}^{\rm {T}}}\), where \(\bf {A}\) and \(\bf {C}\) are \(p \times p\) symmetric matrices; and \(\bf {B}\) is a lower-triangular matrix. Consider the variable transformation from the non-duplicated elements of \(\bf {C}\) to those of \(\bf {A}\). Then, the Jacobian \(J({\bf {C}} \to {\bf {A}})\) is \(|{\bf {B}}|_ + ^{ - (p + 1)}\).

Proof. Since the non-duplicated elements of \(\bf {A}\) using its diagonal and infra-diagonal elements are \({a_{ij}} = \sum \nolimits _{k = 1}^i {\sum \nolimits _{l = 1}^j {{b_{ik}}{c_{kl}}{b_{jl}}} } \,\,(p \ge i \ge j \ge 1)\), we have \[\frac {{\partial {a_{ij}}}}{{\partial {c_{kl}}}} = {b_{ik}}{b_{jl}}\,\,(p \ge i \ge j \ge 1;\,k = 1,...,i;\,l = 1,...,j),\] which gives \[\left [ \begin {array}{l} {a_{11}}\\ {a_{21}}\\ {a_{22}}\\ \, \vdots \\ {a_{p1}}\\ \, \vdots \\ {a_{pp}} \end {array} \right ] = \left [ \begin {array}{l} {b_{11}}{b_{11}}\,\,\,\,0\,\,\,\,\,\,\,0\,\,\,\,\,\,\,\, \cdots \,\,\,\,0\,\,\,\,\,\,\,\, \cdots \,\,\,\,\,0\\ \,\,{*}\,\,\,\,\,\,\,{b_{22}}{b_{11}}\,\,0\,\,\,\,\,\,\,\, \cdots \,\,\,\,0\,\,\,\,\,\,\,\, \cdots \,\,\,\,\,0\\ \,\,{*}\,\,\,\,\,\,\,\,\,\,{*}\,\,\,\,\,{b_{22}}{b_{22}}\,\, \cdots \,\,\,\,0\,\,\,\,\,\,\,\, \cdots \,\,\,\,\,0\\ \,\,\, \vdots \,\,\,\,\,\,\,\,\,\,\, \vdots \,\,\,\,\,\,\,\,\,\,\, \vdots \,\,\,\,\,\,\,\,\,\,\,\,\,\,\, \vdots \,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\, \vdots \\ \,\,{*}\,\,\,\,\,\,\,\,\,\,{*}\,\,\,\,\,\,\,\,{*}\,\,\,\,\,\,\,\, \cdots \,\,{b_{pp}}{b_{11}}\, \cdots \,\,\,\,\,0\\ \,\,\, \vdots \,\,\,\,\,\,\,\,\,\,\, \vdots \,\,\,\,\,\,\,\,\,\,\, \vdots \,\,\,\,\,\,\,\,\,\,\,\,\,\,\, \vdots \,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\, \vdots \\ \,\,{*}\,\,\,\,\,\,\,\,\,\,{*}\,\,\,\,\,\,\,\,{*}\,\,\,\,\,\,\,\,\, \cdots \,\,\,\,\,\,{*}\,\,\,\,\,\,\,\, \cdots \,\,{b_{pp}}{b_{pp}} \end {array} \right ]\left [ \begin {array}{l} {c_{11}}\\ {c_{21}}\\ {c_{22}}\\ \, \vdots \\ {c_{p1}}\\ \, \vdots \\ {c_{pp}} \end {array} \right ],\] where the diagonal element of the lower-triangular matrix for \(a_{ij}\) and \(c_{ij}\) is \(\partial {a_{ij}}/\partial {c_{ij}} = {b_{ii}}{b_{jj}}\,\,(p \ge i \ge j \ge 1)\). Since \(J({\bf {C}} \to {\bf {A}})\) is the reciprocal of the absolute value of the determinant of the above lower-triangular matrix, we obtain \(J({\bf {C}} \to {\bf {A}}) = 1/\left | {\prod \nolimits _{i = 1}^p {b_{ii}^{p + 1}} } \right | = \,|{\bf {B}}|_ + ^{ - (p + 1)}\). \(\sqcap \)\(\sqcup \)

Lemma 5 Suppose that \({\bf {A}} = {\bf {BC}}{{\bf {C}}^{\rm {T}}}{{\bf {B}}^{\rm {T}}}\), where \(\bf {A}\) is a \(p \times p\) symmetric matrix; and \(\bf {B}\) and \(\bf {C}\) are lower-triangular matrices. Consider the variable transformation from the non-zero elements of \(\bf {C}\) to the non-duplicated elements of \(\bf {A}\). Then, the Jacobian \(J({\bf {C}} \to {\bf {A}})\) is \(|{\bf {B}}|_ + ^{ - (p + 1)}/\left | {{2^p}\prod \nolimits _{i = 1}^p {c_{ii}^{p - i + 1}} } \right |\).

Proof 1 The diagonal and infra-diagonal elements of \(\bf {A}\) are employed for its non-duplicated ones without loss of generality. Then, define \({\bf {a}} = \\ {({a_{11}},\,{a_{21}},{a_{22}},...,{a_{p1}},...,{a_{pp}})^{\rm {T}}}\) and \({\bf {c}} = {({c_{11}},\,{c_{21}},{c_{22}},...,{c_{p1}},...,{c_{pp}})^{\rm {T}}}\) with the elements lexicographically ordered. Since \(\bf {B}\), \(\bf {C}\) and \(\bf {BC}\) are lower-triangular, the Jacobian matrix \(\partial {\bf {a}}/\partial {{\bf {c}}^{\rm {T}}} = \left \{ {\partial {a_{ij}}/\partial {c_{kl}}} \right \}\) \((p \ge i \ge j \ge 1;p \ge k \ge l \ge 1)\) becomes lower-triangular. This can be shown by \[\begin {array}{l} \frac {{\partial {a_{ij}}}}{{\partial {c_{kl}}}} = {\{ {\bf {B}}({{\bf {E}}_{kl}}{{\bf {C}}^{\rm {T}}} + {\bf {C}}{{\bf {E}}_{lk}}){{\bf {B}}^{\rm {T}}}\} _{ij}} = {({\bf {B}}{{\bf {E}}_{kl}}{{\bf {C}}^{\rm {T}}}{{\bf {B}}^{\rm {T}}})_{ij}} + {({\bf {BC}}{{\bf {E}}_{lk}}{{\bf {B}}^{\rm {T}}})_{ij}}\\ \,\,\,\,\,\,\,\,\,\,\,\,\, = {b_{ik}}{({\bf {BC}})_{jl}}\, + {({\bf {BC}})_{il}}{b_{jk}}\,(p \ge i \ge j \ge 1;\,\,p \ge k \ge l \ge 1), \end {array}\] where \({\bf {E}}_{ij}\) is the matrix of an appropriate size, whose \((i, j)\)th element is 1 with the remaining ones being 0. The right-hand side of the last equation in the above expression vanishes when \(i < k\) or \(\{ i = k\} \cap \{ j < l\}\). This condition indicates the lower-triangular form of \(\partial {\bf {a}}/\partial {{\bf {c}}^{\rm {T}}} = \left \{ {\partial {a_{ij}}/\partial {c_{kl}}} \right \}\). Then, the diagonal elements are \[\frac {{\partial {a_{ij}}}}{{\partial {c_{ij}}}} = {\{ {\bf {B}}({{\bf {E}}_{ij}}{{\bf {C}}^{\rm {T}}} + {\bf {C}}{{\bf {E}}_{ji}}){{\bf {B}}^{\rm {T}}}\} _{ij}} = {({\bf {B}}{{\bf {E}}_{ij}}{{\bf {C}}^{\rm {T}}}{{\bf {B}}^{\rm {T}}})_{ij}} = {b_{ii}}{c_{jj}}{b_{jj}}\,\,(p \ge i > j \ge 1)\] and \[\frac {{\partial {a_{ii}}}}{{\partial {c_{ii}}}} = {\{ {\bf {B}}({{\bf {E}}_{ii}}{{\bf {C}}^{\rm {T}}} + {\bf {C}}{{\bf {E}}_{ii}}){{\bf {B}}^{\rm {T}}}\} _{ii}} = 2b_{ii}^2{c_{ii}}\,\,(i = 1,...,p).\] Since the determinant of the Jacobian matrix for \(J({\bf {A}} \to {\bf {C}})\) is \[\begin {array}{l} \prod \nolimits _{i = 1}^p {\prod \nolimits _{j = 1}^i {\frac {{\partial {a_{ij}}}}{{\partial {c_{ij}}}}} } = \left ( {\prod \nolimits _{i = 1}^p {\prod \nolimits _{j = 1}^{i - 1} {\frac {{\partial {a_{ij}}}}{{\partial {c_{ij}}}}} } } \right )\prod \nolimits _{i = 1}^p {\frac {{\partial {a_{ii}}}}{{\partial {c_{ii}}}}} = {2^p}\prod \nolimits _{i = 1}^p {\prod \nolimits _{j = 1}^i {{b_{ii}}{c_{jj}}{b_{jj}}} } \\ = {2^p}\left ( {\prod \nolimits _{i = 1}^p {b_{ii}^i} } \right )\prod \nolimits _{j = 1}^p {c_{jj}^{p - j + 1}b_{jj}^{p - j + 1}} = {2^p}\prod \nolimits _{i = 1}^p {b_{ii}^{p + 1}c_{ii}^{p - i + 1}} \\ = {2^p}|{\bf {B}}{|^{p + 1}}\prod \nolimits _{i = 1}^p {c_{ii}^{p - i + 1}} , \end {array}\] the Jacobian \(J({\bf {C}} \to {\bf {A}})\) is the reciprocal of the absolute value of the above quantity: \[J({\bf {C}} \to {\bf {A}}) = \,|{\bf {B}}|_ + ^{ - (p + 1)}/\left | {{2^p}\prod \nolimits _{i = 1}^p {c_{ii}^{p - i + 1}} } \right |,\] which is the required result. \(\sqcap \)\(\sqcup \)

Proof 2 The transformation \({\bf {A}} = {\bf {BC}}{{\bf {C}}^{\rm {T}}}{{\bf {B}}^{\rm {T}}}\) is seen in two steps. In the first step, the transformation \({\bf {C}} \to {\bf {C}}{{\bf {C}}^{\rm {T}}}\) is considered, whose Jacobian is given by Lemma 2 as \(J({\bf {C}} \to {\bf {C}}{{\bf {C}}^{\rm {T}}}) = \,1/\left | {{2^p}\prod \nolimits _{i = 1}^p {c_{ii}^{p - i + 1}} } \right |\). The second step is for the transformation \({\bf {C}}{{\bf {C}}^{\rm {T}}} \to {\bf {A}} = {\bf {BC}}{{\bf {C}}^{\rm {T}}}{{\bf {B}}^{\rm {T}}}\) with the Jacobian \(J({\bf {C}}{{\bf {C}}^{\rm {T}}} \to {\bf {A}}) = \,|{\bf {B}}|_ + ^{ - (p + 1)}\), which is given by Lemma 4. Then, the Jacobian \(J({\bf {C}} \to {\bf {A}})\) is the product of the two Jacobians due to the chain rule, which gives the required result. \(\sqcap \)\(\sqcup \)

Suppose that each column of a \(p \times n\) matrix \(\bf {Y}\) follows \({{\rm {N}}_p}({\bf {0}},{\bf {\Sigma }})\) with positive definite \(\bf {\Sigma }\) independent of the other columns. Recall \(\bf {X}\) in Theorem 1. Let \({\bf {\Sigma }} = {\bf {B}}{{\bf {B}}^{\rm {T}}}\) be the Cholesky decomposition, where \(\bf {B}\) is a fixed lower-triangular matrix whose diagonal elements are positive for identification and convenience. Then, each column of \({\bf {Y}} = {\bf {BX}}\) independently follows \({{\rm {N}}_p}({\bf {0}},{\bf {\Sigma }})\). Define \({{\bf {S}}_{\bf {\Sigma }}} \equiv {\bf {Y}}{{\bf {Y}}^{\rm {T}}} = {\bf {BX}}{{\bf {X}}^{\rm {T}}}{{\bf {B}}^{\rm {T}}} = {\bf {BS}}{{\bf {B}}^{\rm {T}}}\), where \({\bf {S}} = {{\bf {S}}_{{{\bf {I}}_p}}} = {\bf {X}}{{\bf {X}}^{\rm {T}}} = {\bf {T}}{{\bf {T}}^{\rm {T}}}\), and the \(\{ p(p + 1)/2\} \times 1\) vector \({{\bf {s}}_{\bf {\Sigma }}} \equiv {({s_{{\bf {\Sigma }}11}},\,{s_{{\bf {\Sigma }}21}},{s_{{\bf {\Sigma }}22}},...,{s_{{\bf {\Sigma }}p1}},...,{s_{{\bf {\Sigma }}pp}})^{\rm {T}}}\) with \({{\bf {S}}_{\bf {\Sigma }}} = \{ {s_{{\bf {\Sigma }}ij}}\} \,\,(i,j = 1,...,p)\).

Lemma 6 Define positive definite \({{\bf {\Sigma }}_i} = {{\bf {B}}_i}{\bf {B}}_i^{\rm {T}}\) and \({{\bf {S}}_{{{\bf {\Sigma }}_i}}} = {{\bf {B}}_i}{\bf {SB}}_i^{\rm {T}}\) \((i = 1,2)\), where \(\bf {S}\) is as before. Denote the pdf’s of \({\bf {S}}_{{{\bf {\Sigma }}_i}}\) at \({\bf {S}}_{\bf {\Sigma }}\) by \({g_{{\bf {\Sigma }} = {{\bf {\Sigma }}_i}}}({{\bf {S}}_{\bf {\Sigma }}})\,\,(i = 1,2)\). Then, \[\frac {{{g_{{\bf {\Sigma }} = {{\bf {\Sigma }}_1}}}({{\bf {S}}_{\bf {\Sigma }}})}}{{{g_{{\bf {\Sigma }} = {{\bf {\Sigma }}_2}}}({{\bf {S}}_{\bf {\Sigma }}})}} = \frac {{{\phi _{p,n}}({\bf {Y}}|{\bf {0}},\,\,{{\bf {\Sigma }}_1})}}{{{\phi _{p,n}}({\bf {Y}}|{\bf {0}},\,\,{{\bf {\Sigma }}_2})}},\] where \({\phi _{p,n}}({\bf {Y}}|{\bf {0}},\,\,{{\bf {\Sigma }}_i}) = \prod \nolimits _{j = 1}^n {{\phi _p}\{ {{({\bf {Y}})}_{ \cdot \,j}}|{\bf {0}},\,\,{{\bf {\Sigma }}_i}\} }\); \(({\bf {Y}})_{ \cdot \,j}\) is the \(j\)-th column of \(\bf {Y}\); and \[{\phi _p}\{ {({\bf {Y}})_{ \cdot \,j}}|{\bf {0}},\,\,{{\bf {\Sigma }}_i}\} = \frac {{\exp \{ - ({\bf {Y}})_{\, \cdot j}^{\rm {T}}{\bf {\Sigma }}_i^{ - 1}{{({\bf {Y}})}_{ \cdot \,j}}/2\} }}{{{{(2\pi )}^{n/2}}|{{\bf {\Sigma }}_i}{|^{1/2}}}}\,\,\,(\,i = 1,2;\,j = 1,...,n).\]

Proof. The derivation is given by the factorization theorem for the sufficient statistic corresponding to \({\bf {S}}_{\bf {\Sigma }}\) for \(\bf {\Sigma }\) as used by Ghosh and Sinha (2002, Equation (8)): \[{\phi _{p,n}}({\bf {Y}}|{\bf {0}},\,\,{{\bf {\Sigma }}_i}) = {g_{{\bf {\Sigma }} = {{\bf {\Sigma }}_i}}}({{\bf {S}}_{\bf {\Sigma }}})h({\bf {Y}})\,\,(i = 1,2),\] which gives the required result. \(\sqcap \)\(\sqcup \)

The Wishart density for general correlated cases (see e.g., Srivastava & Khatri1979, Theorem 3.2.1; Anderson2003, Theorem 7.2.2) is derived in different ways.

Theorem 2 Let each column of a \(p \times n\) matrix \(\bf {Y}\) follows \({{\rm {N}}_p}({\bf {0}},{\bf {\Sigma }})\) with positive definite \(\bf {\Sigma }\) independent of the other columns. Then, the pdf of \({{\bf {S}}_{\bf {\Sigma }}} = {\bf {Y}}{{\bf {Y}}^{\rm {T}}}\) is \[{w_p}({{\bf {S}}_{\bf {\Sigma }}}|{\bf {\Sigma }},n) = \frac {{\exp \{ - {\rm {tr}}({{\bf {\Sigma }}^{ - 1}}{{\bf {S}}_{\bf {\Sigma }}})/2\} |{{\bf {S}}_{\bf {\Sigma }}}{|^{(n - p - 1)/2}}}}{{{2^{np/2}}|{\bf {\Sigma }}{|^{n/2}}{\Gamma _p}(n/2)}}.\]

Proof 1 Consider the transformation \({\bf {T}} \to {{\bf {S}}_{\bf {\Sigma }}} = {\bf {BT}}{{\bf {T}}^{\rm {T}}}{{\bf {B}}^{\rm {T}}}\). The Jacobian is given by Lemma 5, when \({\bf {A}} = {{\bf {S}}_{\bf {\Sigma }}},\,\,{\bf {B}} = {\bf {B}}\) and \({\bf {C}} = {\bf {T}}\) with added restrictions \({b_{ii}} > 0\) and \({t_{ii}} > 0\,\,(i = 1,...,p)\) as \[J({\bf {T}} \to {{\bf {S}}_{\bf {\Sigma }}}) = \,|{\bf {B}}{|^{ - (p + 1)}}/\left ( {{2^p}\prod \nolimits _{i = 1}^p {t_{ii}^{p - i + 1}} } \right ) = \,|{\bf {\Sigma }}{|^{ - (p + 1)/2}}/\left ( {{2^p}\prod \nolimits _{i = 1}^p {t_{ii}^{p - i + 1}} } \right ).\] The pdf of \(\bf {T}\) denoted by \({f_p}({\bf {T}})\) was given by Theorem 1. Then, we have \[\begin {array}{l} {w_p}({{\bf {S}}_{\bf {\Sigma }}}|{\bf {\Sigma }},n) = {f_p}({\bf {T}})J({\bf {T}} \to {{\bf {S}}_{\bf {\Sigma }}})\\ = \cfrac {{\exp \{ - {\rm {tr}}({\bf {T}}{{\bf {T}}^{\rm {T}}})/2\} \prod \limits _{i = 1}^p {t_{ii}^{n - i}} }}{{{2^{(np/2) - p}}{\Gamma _p}(n/2)}}\,\cfrac {{|{\bf {\Sigma }}{|^{ - (p + 1)/2}}}}{{{2^p}\prod \nolimits _{i = 1}^p {t_{ii}^{p - i + 1}} }} \\ = \cfrac {{\exp \{ - {\rm {tr}}({\bf {T}}{{\bf {T}}^{\rm {T}}})/2\} |{\bf {\Sigma }}{|^{ - (p + 1)/2}}\prod \limits _{i = 1}^p {t_{ii}^{n - p - 1}} }}{{{2^{np/2}}{\Gamma _p}(n/2)}}\\ = \cfrac {{\exp \{ - {\rm {tr}}({{\bf {B}}^{ - 1}}{{\bf {S}}_{\bf {\Sigma }}}{{\bf {B}}^{{\rm {T}} - 1}})/2\} |{\bf {\Sigma }}{|^{ - (p + 1)/2}}|{{\bf {B}}^{ - 1}}{{\bf {S}}_{\bf {\Sigma }}}{{\bf {B}}^{{\rm {T}} - 1}}{|^{(n - p - 1)/2}}}}{{{2^{np/2}}{\Gamma _p}(n/2)}}\\ = \cfrac {{\exp \{ - {\rm {tr}}({{\bf {\Sigma }}^{ - 1}}{{\bf {S}}_{\bf {\Sigma }}})/2\} |{{\bf {S}}_{\bf {\Sigma }}}{|^{(n - p - 1)/2}}}}{{{2^{np/2}}|{\bf {\Sigma }}{|^{n/2}}{\Gamma _p}(n/2)}}, \end {array}\] where \({\rm {tr}}({{\bf {B}}^{ - 1}}{{\bf {S}}_{\bf {\Sigma }}}{{\bf {B}}^{{\rm {T}} - 1}}) = {\rm {tr}}({{\bf {B}}^{{\rm {T}} - 1}}{{\bf {B}}^{ - 1}}{{\bf {S}}_{\bf {\Sigma }}}) = {\rm {tr}}\{ {({\bf {B}}{{\bf {B}}^{\rm {T}}})^{ - 1}}{{\bf {S}}_{\bf {\Sigma }}}\} = {\rm {tr}}({{\bf {\Sigma }}^{ - 1}}{{\bf {S}}_{\bf {\Sigma }}})\) and \(|{{\bf {B}}^{ - 1}}{{\bf {S}}_{\bf {\Sigma }}}{{\bf {B}}^{{\rm {T}} - 1}}|\, = \,|{{\bf {S}}_{\bf {\Sigma }}}||{\bf {\Sigma }}{|^{ - 1}}\) are used. The last expression gives the required result. \(\sqcap \)\(\sqcup \)

Proof 2 Employ the two-step transformation \({\bf {T}} \to {\bf {S}} = {\bf {T}}{{\bf {T}}^{\rm {T}}} \to {{\bf {S}}_{\bf {\Sigma }}} = {\bf {BS}}{{\bf {B}}^{\rm {T}}}\). The first step was used by Theorem 1. The Jacobian \(J({\bf {T}} \to {\bf {S}} = {\bf {T}}{{\bf {T}}^{\rm {T}}})\) in the first step is given by Lemma 2 by taking the reciprocal of the last result of the lemma while \(J({\bf {S}} \to {{\bf {S}}_{\bf {\Sigma }}} = {\bf {BS}}{{\bf {B}}^{\rm {T}}})\) is obtained by Lemma 4. That is, \[\begin {array}{l} {w_p}({{\bf {S}}_{\bf {\Sigma }}}|{\bf {\Sigma }},n) = {f_p}({\bf {T}})J({\bf {T}} \to {\bf {S}})J({\bf {S}} \to {{\bf {S}}_{\bf {\Sigma }}})\\ = \cfrac {{\exp \{ - {\rm {tr}}({\bf {S}})/2\} |{\bf {S}}{|^{(n - p - 1)/2}}}}{{{2^{np/2}}{\Gamma _p}(n/2)}}J({\bf {S}} \to {{\bf {S}}_{\bf {\Sigma }}}) \\ = \cfrac {{\exp \{ - {\rm {tr}}({\bf {S}})/2\} |{\bf {S}}{|^{(n - p - 1)/2}}}}{{{2^{np/2}}{\Gamma _p}(n/2)}}|{\bf {B}}{|^{ - (p + 1)}}\\ = \cfrac {{\exp \{ - {\rm {tr}}({{\bf {\Sigma }}^{ - 1}}{{\bf {S}}_{\bf {\Sigma }}})/2\} |{{\bf {\Sigma }}^{ - 1}}{{\bf {S}}_{\bf {\Sigma }}}{|^{(n - p - 1)/2}}|{\bf {\Sigma }}{|^{ - (p + 1)/2}}}}{{{2^{np/2}}{\Gamma _p}(n/2)}}\\ = \cfrac {{\exp \{ - {\rm {tr}}({{\bf {\Sigma }}^{ - 1}}{{\bf {S}}_{\bf {\Sigma }}})/2\} |{{\bf {S}}_{\bf {\Sigma }}}{|^{(n - p - 1)/2}}}}{{{2^{np/2}}|{\bf {\Sigma }}{|^{n/2}}{\Gamma _p}(n/2)}}. \end {array}\] \(\sqcap \)\(\sqcup \)

Proof 3 (Anderson2003, Theorem 7.2.2) Anderson used an alternative two-step transformation \({\bf {T}} \to {{\bf {T}}^*} = {\bf {BT}} \to {{\bf {S}}_{\bf {\Sigma }}} = {{\bf {T}}^*}{{\bf {T}}^{*T}}\). The Jacobian \(J({\bf {T}} \to {{\bf {T}}^*})\) is given by the first result of Lemma 3 while \(J({{\bf {T}}^*} \to {{\bf {S}}_{\bf {\Sigma }}})\) is given by the reciprocal of the last result in Lemma 2 when \({\bf {T}} = {{\bf {T}}^*}\). That is, \[\begin {array}{l} {w_p}({{\bf {S}}_{\bf {\Sigma }}}|{\bf {\Sigma }},n) = {f_p}({\bf {T}})J({\bf {T}} \to {{\bf {T}}^*})J({{\bf {T}}^*} \to {{\bf {S}}_{\bf {\Sigma }}})\\ = \cfrac {{\exp \{ - {\rm {tr}}({\bf {T}}{{\bf {T}}^{\rm {T}}})/2\} \prod \limits _{i = 1}^p {t_{ii}^{n - i}} }}{{{2^{(np/2) - p}}{\Gamma _p}(n/2)}}{\left ( {\prod \nolimits _{i = 1}^p {b_{ii}^i} } \right )^{ - 1}}{\left ( {{2^p}\prod \nolimits _{i = 1}^p {t_{ii}^{*p - i + 1}} } \right )^{ - 1}}\\ = \cfrac {{\exp \{ - {\rm {tr}}({\bf {T}}{{\bf {T}}^{\rm {T}}})/2\} \prod \limits _{i = 1}^p {{{(t_{ii}^*/{b_{ii}})}^{n - i}}} }}{{{2^{np/2}}{\Gamma _p}(n/2)\left ( {\prod \nolimits _{i = 1}^p {b_{ii}^i} } \right )\prod \nolimits _{i = 1}^p {t_{ii}^{*p - i + 1}} }} = \cfrac {{\exp \{ - {\rm {tr}}({\bf {T}}{{\bf {T}}^{\rm {T}}})/2\} \prod \limits _{i = 1}^p {t_{ii}^{*n - p - 1}} }}{{{2^{np/2}}\left ( {\prod \nolimits _{i = 1}^p {b_{ii}^n} } \right ){\Gamma _p}(n/2)}}\\ = \cfrac {{\exp \{ - {\rm {tr}}({{\bf {\Sigma }}^{ - 1}}{{\bf {S}}_{\bf {\Sigma }}})/2\} |{{\bf {S}}_{\bf {\Sigma }}}{|^{(n - p - 1)/2}}}}{{{2^{np/2}}|{\bf {\Sigma }}{|^{n/2}}{\Gamma _p}(n/2)}}. \end {array}\] \(\sqcap \)\(\sqcup \)

Proof 4 Use Theorem 1 and Lemma 6 when \({{\bf {\Sigma }}_1} = {{\bf {I}}_p}\) and \({{\bf {\Sigma }}_2} = {{\bf {B}}_2}{\bf {B}}_2^{\rm {T}} = {{\bf {\Sigma }}^{1/2}}{({{\bf {\Sigma }}^{1/2}})^{\rm {T}}} = {\bf {\Sigma }}\). Then, we have
\[\begin {array}{l} {w_p}({{\bf {S}}_{\bf {\Sigma }}}|{\bf {\Sigma }},n) = {w_p}({{\bf {S}}_{\bf {\Sigma }}}|{{\bf {I}}_p},n)\cfrac {{{\phi _{p,n}}({\bf {Y}}|{\bf {0}},\,\,{\bf {\Sigma }})}}{{{\phi _{p,n}}({\bf {Y}}|{\bf {0}},\,\,{{\bf {I}}_p})}}\\ = \cfrac {{\exp \{ - {\rm {tr}}({{\bf {S}}_{\bf {\Sigma }}})/2\} |{{\bf {S}}_{\bf {\Sigma }}}{|^{(n - p - 1)/2}}}}{{{2^{np/2}}{\Gamma _p}(n/2)}}\cfrac {{\exp \{ - {\rm {tr}}({\bf {Y}}{{\bf {Y}}^{\rm {T}}}{{\bf {\Sigma }}^{ - 1}})/2\} /\{ {{(2\pi )}^{pn/2}}|{\bf {\Sigma }}{|^{n/2}}\} }}{{\exp \{ - {\rm {tr}}({\bf {Y}}{{\bf {Y}}^{\rm {T}}})/2\} /{{(2\pi )}^{pn/2}}}}\\ = \cfrac {{\exp \{ - {\rm {tr}}({{\bf {\Sigma }}^{ - 1}}{{\bf {S}}_{\bf {\Sigma }}})/2\} |{{\bf {S}}_{\bf {\Sigma }}}{|^{(n - p - 1)/2}}}}{{{2^{np/2}}|{\bf {\Sigma }}{|^{n/2}}{\Gamma _p}(n/2)}}. \end {array}\] \(\sqcap \)\(\sqcup \)

3 Remarks and Conclusion

For the general correlated cases, four proofs are shown in Theorem 2. The one-step first proof uses \({f_p}({\bf {T}})\) with \(J({\bf {T}} \to {{\bf {S}}_{\bf {\Sigma }}})\) given by Lemma 5, where \({{\bf {S}}_{\bf {\Sigma }}} = {\bf {BT}}{{\bf {T}}^{\rm {T}}}{{\bf {B}}^{\rm {T}}}\) with lower-triangular \(\bf {B}\) and \(\bf {T}\) is seen as a two-fold Bartlett (Cholesky) decomposition or a usual Bartlett (1933) \({{\bf {S}}_{\bf {\Sigma }}} = {\bf {BT}}{({\bf {BT}})^{\rm {T}}}\) in terms of lower-triangular \(\bf {BT}\). The two-step second proof uses \({f_p}({\bf {T}})\) with \(J({\bf {T}} \to {\bf {S}} = {\bf {T}}{{\bf {T}}^{\rm {T}}})\) and \(J({\bf {S}} \to {{\bf {S}}_{\bf {\Sigma }}} = {\bf {BS}}{{\bf {B}}^{\rm {T}}})\) obtained by Lemmas 2 and 4, respectively. Anderson (2003)’s two-step third proof uses \({f_p}({\bf {T}})\) with \(J({\bf {T}} \to {{\bf {T}}^*} = {\bf {BT}})\) and \(J({{\bf {T}}^*} \to {{\bf {S}}_{\bf {\Sigma }}})\) given by Lemmas 3 and 2, respectively. Among the four proofs, the first and fourth ones are relatively simple. The remaining two-step proofs seem to be comparable. It is found that in order to derive the final Jacobian by Proofs 2 and 3, Lemma 2 is firstly and secondly used, respectively. When only the pdf of \({\bf {S}}( = {{\bf {S}}_{{\bf {\Sigma }} = {{\bf {I}}_p}}})\) is focused on, Proof 2 may be the simplest though the same result is immediately obtained from the pdf of \({\bf {S}}_{\bf {\Sigma }}\) substituting \({\bf {\Sigma }} = {{\bf {I}}_p}\).

In each of the four proofs, \({f_p}({\bf {T}})\) is used. Two derivations for \({f_p}({\bf {T}})\) were shown. The first method using Lemma 1 is much simpler than that used by Anderson (2003) as detailed in Remark 1. The author believes that this simplification will reduce the difficulties frequently encountered when beginning students/researchers master the Wishart density. Note that when the Wishart density for \({w_p}({\bf {S}}|{{\bf {I}}_p},n)\) is given, \({f_p}({\bf {T}})\) is obtained using \(J({\bf {S}} \to {\bf {T}})\) in Lemma 2 as easily as the transformation \(J({\bf {T}} \to {\bf {S}})\), which is the reversed problem (see Bartlett1933; Muirhead1982, Theorem 3.2.14).

Remark 2 Lemma 1 gave the justification of \({\bf {X}}{{\bf {X}}^{\rm {T}}} = {\bf {T}}{{\bf {T}}^{\rm {T}}}\) with mutually independent normal \({t_{ij}}\,\,(p \ge i > j \ge )\) and chi-distributed \({t_{ii}}(i = 1,...,p)\). While the chi-square distribution of \(({\bf {T}}{{\bf {T}}^{\rm {T}}})_{ii}\) is obvious, the distribution of \({({\bf {T}}{{\bf {T}}^{\rm {T}}})_{ij}}\,\,(i > j)\) is that of the product sum of \(p\) pairs of independent normals (the product-sum normal for short). The pdf and mgf of the product-sum normal in the case of a possibly correlated single pair was given by Craig (1936) (see also Ogasawara2023a, Remarks S1-S4). For current developments of this issue, see e.g., Seijas-Macías and Oliveira (2012), Seijas-Macías, Oliveira, Oliveira, and Leiva (2020), and Gaunt (2022).

Remark 3 As addressed earlier, the complicated property found in many of the proofs of the Wishart density seems to be due partially to the associated Jacobians in e.g., Srivastava and Khatri (1979, Section 3.2) and Anderson (2003, Section 7.2). The proof of the Wishart density in Theorem 1 is similar to that in Srivastava and Khatri (1979, Section 3.2).Though the Jacobian in Lemma 2 was also used by Srivastava and Khatri, we did not use the Jacobian of \({\bf {X}} \to \{ {\bf {T}},\,\,{{\bf {V}}^*}\}\) in \({\bf {X}} = {\bf {T}}{{\bf {V}}^*}\), where \({\bf {V}}^*\) is a \(p \times n\) semi-orthonormal matrix with \({{\bf {V}}^*}{{\bf {V}}^{{\rm {*T}}}} = {{\bf {I}}_p}\) (see Srivastava & Khatri1979, Exercise 1.33). Instead, we used the marginal chi and normal distributions for \(\bf {T}\) as in Anderson (2003).

As shown earlier, in the three proofs of the Wishart density \({w_p}({{\bf {S}}_{\bf {\Sigma }}}|{\bf {\Sigma }},n)\), the Bartlett-like Cholesky decomposition \({\bf {\Sigma }} = {\bf {B}}{{\bf {B}}^{\rm {T}}}\) is used for non-stochastic \(\bf {\Sigma }\). Though this factorization gives simple results, other factorizations can also be used with \({\bf {\Sigma }} = {\bf {BG}}{{\bf {G}}^{\rm {T}}}{{\bf {B}}^{\rm {T}}}\) \(= {\bf {BG}}{({\bf {BG}})^{\rm {T}}} = {\bf {D}}{{\bf {D}}^{\rm {T}}}\), where \({\bf {G}}{{\bf {G}}^{\rm {T}}} = {{\bf {G}}^{\rm {T}}}{\bf {G}} = {{\bf {I}}_p}\) and \({\bf {D}} = {\bf {BG}}\). For illustration, Proof 5 using \({\bf {D}} = {{\bf {\Sigma }}^{1/2}}\) with \({({{\bf {\Sigma }}^{1/2}})^2} = {\bf {\Sigma }}\) will be shown in Appendix A for didactic purposes with associated remarks. The concise derivation of Khatri (1963) will be explained in Appendix B. The Bartlett decomposition \({\bf {S}} = {\bf {T}}{{\bf {T}}^{\rm {T}}}\) can also be replaced by other ones with the same number of random variables. The case called the exchanged Bartlett decomposition will be shown in Appendix C.

Conclusion Among Proofs 1 to 4 of the Wishart distribution given earlier and Proofs 5 to 7 to be shown in the appendix for expository purposes, Proof 4 using our Lemma 1 for the equivalence of the distributions of the product-sum normal and the product of the chi and standard normal as well as Lemma 6 for the factorization theorem given by Ghosh and Sinha (2002) is the simplest. Since Proof 4 uses elementary and self-contained methods, the proof may be understood by beginning students/researchers without much difficulty.

References

   Anderson, T. W. (2003). An introduction to multivariate statistical analysis (3rd ed.). New York: Wiley.

   Bartlett, M. S.  (1933). On the theory of statistical regression. Proceedings of the Royal Society of Edinburgh, 53, 260–283. doi: https://doi.org/10.1017/s0370164600015637

   Bodnar, T., & Okhrin, Y. (2008). Properties of the singular, inverse and generalized inverse partitioned Wishart distributions. Journal of Multivariate Analysis, 99(10), 2389–2405. doi: https://doi.org/10.1016/j.jmva.2008.02.024

   Chen, Y.-L., & Weng, L.-J. (2023). On Horn’s approximation to the sampling distribution of eigenvalues from random correlation matrices in parallel analysis. Current Psychology(online published). doi: https://doi.org/10.1007/s12144-023-04635-9

   Craig, C. C. (1936). On the frequency function of xy. The Annals of Mathematical Statistics, 7(1), 1–15. doi: https://doi.org/10.1214/aoms/1177732541

   Deemer, W. L., & Olkin, I. (1951). The jacobians of certain matrix transformations useful in multivariate analysis: Based on lectures of P. L. Hsu at the University of North Carolina, 1947. Biometrika, 38(3/4), 345–367. doi: https://doi.org/10.2307/2332581

   DLMF. (2021). NIST Digital Library of Mathematical Functions. National Institutes of Standards and Technology, U. S. Department of Commerce. Release 1.1.3 of 2021-09-15. Retrieved from http://dlmf.nist.gov/ (F. W. J. Olver, A. B. Olde Daalhuis, D. W. Lozier, B. I. Schneider, R. F. Boisvert, C. W. Clark, B. R. Miller, B. V. Saunders, H. S. Cohl, M. A. McClain (Eds))

   Fisher, R. A. (1915). Frequency distribution of the values of the correlation coefficient in samples from an indefinitely large population. Biometrika, 10(4), 507–521. doi: https://doi.org/10.2307/2331838

   Gaunt, R. E. (2022). The basic distributional theory for the product of zero mean correlated normal random variables. Statistica Neerlandica, 76(4), 450–470. doi: https://doi.org/10.1111/stan.12267

   Ghosh, M., & Sinha, B. K. (2002). A simple derivation of the Wishart distribution. The American Statistician, 56(2), 100–101. doi: https://doi.org/10.1198/000313002317572754

   Hsu, P. L. (1940). An algebraic derivation of the distribution of rectangular coordinates. Proceedings of the Edinburgh Mathematical Society, 6(3), 185–189. doi: https://doi.org/10.1007/978-1-4684-9324-5_14

   Khatri, C. G. (1963). (no title). Journal of the Indian Statistical Association, 1(Queries section), 52–54.

   Kshirsagar, A. M.  (1959). Bartlett decomposition and Wishart distribution. The Annals of Mathematical Statistics, 30(1), 239–241. doi: https://doi.org/10.1214/aoms/1177706379

   Kshirsagar, A. M. (1972). Multivariate analysis. New York: Marcel Dekker. doi: https://doi.org/10.2307/1267507

   Liu, H., Qu, W., Zhang, Z., & Wu, H. (2022). A new Bayesian structural equation modeling approach with priors on the covariance matrix parameter. Journal of Behavioral Data Science, 2(2), 23–46. doi: https://doi.org/10.35566/jbds/v2n2/p2

   Magnus, J. R., & Neudecker, H.  (1986). Symmetry, 0-1 matrices and Jacobians: A review. Econometric Theory, 2(2), 157–190. doi: https://doi.org/10.1017/s0266466600011476

   Magnus, J. R., & Neudecker, H. (1999). Matrix differential calculus with applications in statistics and econometrics (Rev. ed.). New York: Wiley. doi: https://doi.org/10.1002/9781119541219

   Marčenko, V. A., & Pastur, L. A. (1967). Distribution of eigenvalues for some sets of random matrices. Mathematics of the USSR-Sbornik, 1(4), 457-–483. doi: https://doi.org/10.1070/sm1967v001n04abeh001994

   Mathai, A. M., & Provost, S. B. (2022). On the singular gamma, Wishart, and beta matrix-variate density functions. Canadian Journal of Statistics, 50(4), 1143–1165. doi: https://doi.org/10.1002/cjs.11710

   Mathai, A. M., Provost, S. B., & Haubold, H. J. (2022). Multivariate statistical analysis in the real and complex domains. Cham, Switzerland: Springer Nature. doi: https://doi.org/10.1007/978-3-030-95864-0

   Muirhead, R. J. (1982). Aspects of multivariate statistical theory. New York: Wiley. doi: https://doi.org/10.2307/2288369

   Ogasawara, H. (2007). Asymptotic expansions of the distributions of estimators in canonical correlation analysis under nonnormality. Journal of Multivariate Analysis, 98(9), 1726–1750. doi: https://doi.org/10.1016/j.jmva.2006.12.001

   Ogasawara, H. (2016). Bias correction of the Akaike information criterion in factor analysis. Journal of Multivariate Analysis, 149, 144–159. doi: https://doi.org/10.1016/j.jmva.2016.04.003

   Ogasawara, H. (2022). A stochastic derivation of the surface area of the (n-1)-sphere. Preprint at ResearchGate. doi: https://doi.org/10.13140/RG.2.2.28827.95528

   Ogasawara, H. (2023a). Supplement to the paper “on some known derivations and new ones for the Wishart distribution: A didactic”. To appear in Economic Review (Otaru University of Commerce), 74(2 & 3, https://www.otaru-uc.ac.jp/˜emt-hogasa/, https://barrel.repo.nii.ac.jp/).

   Ogasawara, H. (2023b). The Wishart distribution with two different degrees of freedom. Statistics and Probability Letters, 200(109866, online published). doi: https://doi.org/10.1016/j.spl.2023.109866

   Olkin, I. (2002). The 70th anniversary of the distribution of random matrices: A survey. Linear Algebra and Its Applications, 354, 231–243. doi: https://doi.org/10.1016/s0024-3795(01)00314-7

   Schuberth, F. (2021). The Henseler-Ogasawara specification of composites in structural equation modeling: A tutorial. Psychological methods(online published). doi: https://doi.org/10.1037/met0000432

   Seijas-Macías, A., & Oliveira, A. (2012). An approach to distribution of the product of two normal variables. Discussiones Mathematicae Probability and Statistics, 32(1-2), 87–99. doi: https://doi.org/10.7151/dmps.1146

   Seijas-Macías, A., Oliveira, A., Oliveira, T. A., & Leiva, V. (2020). Approximating the distribution of the product of two normally distributed random variables. Symmetry, 12(8), 1201. doi: https://doi.org/10.3390/sym12081201

   Srivastava, M. S.  (2003). Singular Wishart and multivariate beta distributions. The Annals of Statistics, 31(5), 1537–1560. doi: https://doi.org/10.1214/aos/1065705118

   Srivastava, M. S., & Khatri, C. G. (1979). An introduction to multivariate statistics, 350. Amsterdam: Elsevier.

   Wijsman, R. A.  (1957). Random orthogonal transformations and their use in some classical distribution problems in multivariate analysis. The Annals of Mathematical Statistics, 28(2), 415–423. doi: https://doi.org/10.1214/aoms/1177706969

   Wilks, S. S. (1962). Mathematical statistics. New York: Wiley. doi: https://doi.org/10.2307/2311277

   Wishart, J. (1928). The generalised product moment distribution in samples from a normal multivariate population. Biometrika, 20A, 32–52. doi: https://doi.org/10.2307/2331939

   Wishart, J., & Bartlett, M. S. (1933). The generalised product moment distribution in a normal system. Mathematical Proceedings of the Cambridge Philosophical Society, 29(2), 260–270. doi: https://doi.org/10.1017/s0305004100011063

   Yao, J., Zheng, S., & Bai, Z. D. (2015). Sample covariance matrices and high-dimensional data analysis. New York: Cambridge University Press.

   Yonenaga, K. (2022). Functionals of a Wishart matrix and a normal vector and its application to linear discriminant analysis. Doctoral dissertation, Graduate School of Economics and Business, Hokkaido University, Japan.

   Yu, X., Schuberth, F., & Henseler, J. (2023). Specifying composites in structural equation modeling: A refinement of the Henseler–Ogasawara specification. Statistical Analysis and Data Mining: The ASA Data Science Journal(online published). doi: https://doi.org/10.1002/sam.11608

   Zhang, Z. (2021). A note on Wishart and inverse Wishart priors for covariance matrix. Journal of Behavioral Data Science, 1(2), 119–126. doi: https://doi.org/10.35566/jbds/v1n2/p2

A An alternative proof of the Wishart density for correlated cases

Let \({\bf {\Sigma }}^{1/2}\) be a symmetric matrix-square-root of \(\bf {\Sigma }\) satisfying \({({{\bf {\Sigma }}^{1/2}})^2} = {\bf {\Sigma }}\). Then, we have \({({\bf {Y}})_{ \cdot j}} = {({{\bf {\Sigma }}^{1/2}}{\bf {X}})_{ \cdot j}} \sim {{\rm {N}}_p}({\bf {0}},{\bf {\Sigma }})\) as \({({\bf {BX}})_{ \cdot j}} \sim {{\rm {N}}_p}({\bf {0}},{\bf {\Sigma }})\,\,(j = 1,...,n)\), which gives \({{\bf {S}}_{\bf {\Sigma }}} \equiv {\bf {Y}}{{\bf {Y}}^{\rm {T}}} = {{\bf {\Sigma }}^{1/2}}{\bf {X}}{{\bf {X}}^{\rm {T}}}{{\bf {\Sigma }}^{1/2}}\) \( = {{\bf {\Sigma }}^{1/2}}{\bf {S}}{{\bf {\Sigma }}^{1/2}}\), where \({\bf {S}}_{\bf {\Sigma }}\) is redefined using \({\bf {\Sigma }}^{1/2}\). Let \({{\bf {s}}_{\bf {\Sigma }}} = {({s_{{\bf {\Sigma }}11}},\,{s_{{\bf {\Sigma }}21}},{s_{{\bf {\Sigma }}22}},...,{s_{{\bf {\Sigma }}p1}},...,{s_{{\bf {\Sigma }}pp}})^{\rm {T}}}\) using redefined \({{\bf {S}}_{\bf {\Sigma }}} = \{ {s_{{\bf {\Sigma }}ij}}\} \,\,(i,j = 1,...,p)\). Then, \({{\bf {D}}_p}{{\bf {s}}_{\bf {\Sigma }}} = {\rm {vec}}({{\bf {S}}_{\bf {\Sigma }}})\) follows, where \({\bf {D}}_p\) of full column rank is the \({p^2} \times \{ p(p + 1)/2\}\) duplication matrix consisting of 0’s and 1’s (Magnus & Neudecker1999, Chapter 3, Section 8); and \({\rm {vec}}( \cdot )\) is the vectorizing operator stacking the columns of a matrix in parentheses sequentially with the first column on the top. Using the formula \({\rm {vec}}({\bf {ABC}}) = ({{\bf {C}}^{\rm {T}}} \otimes {\bf {A}}){\rm {vec}}({\bf {B}})\) (see Magnus & Neudecker1999, Chapter 2, Theorem 2), where \(\otimes \) denotes the direct or Kronecker product, we obtain \[{{\bf {D}}_p}{{\bf {s}}_{\bf {\Sigma }}} = {\rm {vec}}({{\bf {S}}_{\bf {\Sigma }}}) = {\rm {vec}}({{\bf {\Sigma }}^{1/2}}{\bf {S}}{{\bf {\Sigma }}^{1/2}}) = ({{\bf {\Sigma }}^{1/2}} \otimes {{\bf {\Sigma }}^{1/2}}){\rm {vec}}({\bf {S}}) = ({{\bf {\Sigma }}^{1/2}} \otimes {{\bf {\Sigma }}^{1/2}}){{\bf {D}}_p}{\bf {s}}.\] Pre-multiplying the above equation by \({({\bf {D}}_p^{\rm {T}}{{\bf {D}}_p})^{ - 1}}{\bf {D}}_p^{\rm {T}} \equiv {\bf {D}}_p^ +\), which is the left- or Moore-Penrose generalized inverse of \({\bf {D}}_p\) with \({\bf {D}}_p^ + {{\bf {D}}_p} = {{\bf {I}}_{p(p + 1)/2}}\) (see Magnus & Neudecker1999, Chapter 3, Section 8), we have \[{{\bf {s}}_{\bf {\Sigma }}} = {\bf {D}}_p^ + ({{\bf {\Sigma }}^{1/2}} \otimes {{\bf {\Sigma }}^{1/2}}){{\bf {D}}_p}{\bf {s}}.\] The Jacobian of the transformation \({{\bf {S}}_{\bf {\Sigma }}} \to {\bf {S}}\) or equivalently \({{\bf {s}}_{\bf {\Sigma }}} \to {\bf {s}}\) is given by \(|{\bf {D}}_p^ + ({{\bf {\Sigma }}^{1/2}} \otimes {{\bf {\Sigma }}^{1/2}}){{\bf {D}}_p}{|_ + } = \,\,|{\bf {\Sigma }}{|^{(p + 1)/2}}\), which is derived using the following lemma.

Lemma 7 (Magnus & Neudecker1986, Equation (7.11)). Let \(\bf {A}\) be a \(p \times p\) positive definite matrix with distinct eigenvalues. Then, \(|{\bf {D}}_p^ + ({\bf {A}} \otimes {\bf {A}}){{\bf {D}}_p}|\,\, = \,\,|{\bf {A}}{|^{p + 1}}\).

Proof. While Magnus and Neudecker (1986) used Shur’s theorem for the existence of a non-singular matrix \(\bf {V}\) satisfying \({{\bf {V}}^{ - 1}}{\bf {AV}} = {\bf {M}}\), where \(\bf {M}\) is an upper-triangular matrix for a general square matrix \(\bf {A}\), we use a familiar special case of the theorem as \({{\bf {L}}^{\rm {T}}}{\bf {AL}} = {\bf {\Lambda }}\) when \({\bf {A}} = {\bf {L\Lambda }}{{\bf {L}}^{\rm {T}}}\) with \({\bf {L}}{{\bf {L}}^{\rm {T}}} = {{\bf {L}}^{\rm {T}}}{\bf {L}} = {{\bf {I}}_p}\) and \({\bf {\Lambda }} = {\rm {diag}}({\lambda _1},...,{\lambda _p})\,\,({\lambda _1} > ... > {\lambda _p} > 0)\), where the columns of \(\bf {L}\) and \({\lambda _i}(i = 1,...,p)\) are the eigenvectors and eigenvalues of \(\bf {A}\), respectively. Note that \[\begin {array}{l} {\bf {D}}_p^ + ({{\bf {L}}^{\rm {T}}} \otimes {{\bf {L}}^{\rm {T}}}){{\bf {D}}_p}{\bf {D}}_p^ + ({\bf {A}} \otimes {\bf {A}}){{\bf {D}}_p}{\bf {D}}_p^ + ({\bf {L}} \otimes {\bf {L}}){{\bf {D}}_p}\\ = {\bf {D}}_p^ + ({{\bf {L}}^{\rm {T}}} \otimes {{\bf {L}}^{\rm {T}}})({\bf {A}} \otimes {\bf {A}})({\bf {L}} \otimes {\bf {L}}){{\bf {D}}_p}\\ = {\bf {D}}_p^ + \{ ({{\bf {L}}^{\rm {T}}}{\bf {AL}}) \otimes ({{\bf {L}}^{\rm {T}}}{\bf {AL}})\} {{\bf {D}}_p} = {\bf {D}}_p^ + ({\bf {\Lambda }} \otimes {\bf {\Lambda }}){{\bf {D}}_p}, \end {array}\] where \({{\bf {D}}_p}{\bf {D}}_p^ + ({\bf {A}} \otimes {\bf {A}}) = ({\bf {A}} \otimes {\bf {A}}){{\bf {D}}_p}{\bf {D}}_p^ +\) and \({{\bf {D}}_p}{\bf {D}}_p^ + {{\bf {D}}_p} = {{\bf {D}}_p}\) (Magnus & Neudecker1999, Chapter 3, Theorem 13) are used, followed by the transformation given by \(({\bf {A}} \otimes {\bf {B}})({\bf {C}} \otimes {\bf {D}}) = ({\bf {AC}}) \otimes ({\bf {BD}})\) when multiplications are defined.

Note that \({\bf {D}}_p^ + ({{\bf {L}}^{\rm {T}}} \otimes {{\bf {L}}^{\rm {T}}}){{\bf {D}}_p} = {\{ {\bf {D}}_p^ + ({\bf {L}} \otimes {\bf {L}}){{\bf {D}}_p}\} ^{ - 1}}\) since \[{\bf {D}}_p^ + ({{\bf {L}}^{\rm {T}}} \otimes {{\bf {L}}^{\rm {T}}}){{\bf {D}}_p}{\bf {D}}_p^ + ({\bf {L}} \otimes {\bf {L}}){{\bf {D}}_p} = {\bf {D}}_p^ + ({{\bf {L}}^{\rm {T}}} \otimes {{\bf {L}}^{\rm {T}}})({\bf {L}} \otimes {\bf {L}}){{\bf {D}}_p} = {\bf {D}}_p^ + {{\bf {D}}_p} = {{\bf {I}}_{p(p + 1)/2}}.\] Consequently, we can write as \[\begin {array}{l} {\bf {D}}_p^ + ({{\bf {L}}^{\rm {T}}} \otimes {{\bf {L}}^{\rm {T}}}){{\bf {D}}_p}{\bf {D}}_p^ + ({\bf {A}} \otimes {\bf {A}}){{\bf {D}}_p}{\bf {D}}_p^ + ({\bf {L}} \otimes {\bf {L}}){{\bf {D}}_p}\\ \equiv {{\bf {B}}^{ - 1}}{\bf {D}}_p^ + ({\bf {A}} \otimes {\bf {A}}){{\bf {D}}_p}{\bf {B}} = {\bf {D}}_p^ + ({\bf {\Lambda }} \otimes {\bf {\Lambda }}){{\bf {D}}_p}, \end {array}\] which shows that the eigenvalues of \({\bf {D}}_p^ + ({\bf {A}} \otimes {\bf {A}}){{\bf {D}}_p}\) are the same as those of \({\bf {D}}_p^ + ({\bf {\Lambda }} \otimes {\bf {\Lambda }}){\bf {D}}\) (see e.g., Magnus & Neudecker1999, Chapter 1, Theorem 5). Employ the double subscript notation as used earlier for the row numbers \(i\) and \(j\) \((p \ge i \ge j \ge 1)\) and column numbers \(k\) and \(l\) \((p \ge k \ge l \ge 1)\) of the \(\{ p(p + 1)/2\} \times \{ p(p + 1)/2\}\) matrix \({\bf {D}}_p^ + ({\bf {\Lambda }} \otimes {\bf {\Lambda }}){{\bf {D}}_p}\). These numbers correspond to the subscripts of the elements of e.g., the \(\{ p(p + 1)/2\} \times 1\) vector \({\bf {s}} = {({s_{11}},\,{s_{21}},{s_{22}},...,{s_{p1}},...,,{s_{pp}})^{\rm {T}}}\).

Consider \(({\bf {\Lambda }} \otimes {\bf {\Lambda }}){{\bf {D}}_p}\), where the \((k, k)\)th columns of \(({\bf {\Lambda }} \otimes {\bf {\Lambda }}){{\bf {D}}_p}\) \((k = 1,…,p)\) are unchanged from the corresponding ones of \({\bf {\Lambda }} \otimes {\bf {\Lambda }}\) while the \((k, l)\)th columns \((p \ge k > l \ge 1)\) of \(({\bf {\Lambda }} \otimes {\bf {\Lambda }}){{\bf {D}}_p}\) are combined ones as the sum of the \((k,l)\)- and \((l,k)\)-th columns of \({\bf {\Lambda }} \otimes {\bf {\Lambda }}\) such that e.g., \[({\bf {\Lambda }} \otimes {\bf {\Lambda }}){{\bf {D}}_2} = {\rm {diag}}(\lambda _1^2,\,\,{\lambda _1}\,{\lambda _2},\,\,{\lambda _2}\,{\lambda _1},\lambda _2^2)\left ( \begin {array}{l} 1\,\,0\,\,0\\ 0\,\,1\,\,0\\ 0\,\,1\,\,0\\ 0\,\,0\,\,1 \end {array} \right ) = \left ( \begin {array}{l} \lambda _1^2\,\,\,\,0\,\,\,\,\,\,\,\,\,\,0\\ 0\,\,\,\,{\lambda _1}\,{\lambda _2}\,\,\,0\\ 0\,\,\,\,{\lambda _2}\,{\lambda _1}\,\,\,0\\ 0\,\,\,\,\,\,\,0\,\,\,\,\,\,\,\lambda _2^2 \end {array} \right )\] when \(p\) = 2. For the second transformation \({\bf {D}}_p^ + ({\bf {\Lambda }} \otimes {\bf {\Lambda }}){{\bf {D}}_p}\), noting that \({\bf {D}}_p^ + = {({\bf {D}}_p^{\rm {T}}{{\bf {D}}_p})^{ - 1}}{\bf {D}}_p^{\rm {T}}\) consists of 1’s, 1/2’s and 0’s as \({\bf {D}}_2^ + = \left ( \begin {array}{l} 1\,\,\,\,\,0\,\,\,\,\,\,\,\,0\,\,\,\,\,\,\,0\\ 0\,\,1/2\,\,\,1/2\,\,\,0\\ 0\,\,\,\,0\,\,\,\,\,\,\,\,\,0\,\,\,\,\,\,\,1 \end {array} \right )\), we find that \({\bf {D}}_p^ + ({\bf {\Lambda }} \otimes {\bf {\Lambda }}){{\bf {D}}_p}\) is the \(\{ p(p + 1)/2\} \times \{ p(p + 1)/2\}\) diagonal matrix whose diagonal elements are \(\lambda _i^2\,\,(i = 1,...,p)\) and \({\lambda _i}{\lambda _j}\,\,(p \ge i > j \ge 1)\) as \({\bf {D}}_2^ + ({\bf {\Lambda }} \otimes {\bf {\Lambda }}){{\bf {D}}_2} = {\rm {diag}}(\lambda _1^2,\,\,{\lambda _2}\,{\lambda _1},\lambda _2^2)\). Then, we have \[\begin {array}{l} |{\bf {D}}_p^ + ({\bf {A}} \otimes {\bf {A}}){{\bf {D}}_p}|\,\, = \,|{\bf {D}}_p^ + ({\bf {\Lambda }} \otimes {\bf {\Lambda }}){{\bf {D}}_p}|\\ = \left ( {\prod \limits _{i = 1}^p {\lambda _i^2} } \right )\prod \limits _{p \ge i > j \ge 1} {{\lambda _i}{\lambda _j}} = {\left ( {\prod \limits _{i = 1}^p {{\lambda _i}} } \right )^{p + 1}} = \,\,|{\bf {A}}{|^{p + 1}}. \end {array}\] \(\sqcap \)\(\sqcup \)

Proof 5 of the Wishart density in Theorem 2 The Jacobian of the transformation \({{\bf {S}}_{\bf {\Sigma }}} \to {\bf {S}}\) or equivalently \({{\bf {s}}_{\bf {\Sigma }}} \to {\bf {s}}\) is given by Lemma 7 as \(|{\bf {D}}_p^ + ({{\bf {\Sigma }}^{1/2}} \otimes {{\bf {\Sigma }}^{1/2}}){{\bf {D}}_p}{|_ + } = \,\,|{\bf {\Sigma }}{|^{(p + 1)/2}}\). Consequently, \(J({\bf {s}} \to {{\bf {s}}_{\bf {\Sigma }}})\) becomes \(|{\bf {\Sigma }}{|^{ - (p + 1)/2}}\). Then, the pdf of \({\bf {S}}_{\bf {\Sigma }}\) is obtained by that of \({\bf {S}} = {{\bf {\Sigma }}^{ - 1/2}}{{\bf {S}}_{\bf {\Sigma }}}{{\bf {\Sigma }}^{ - 1/2}}\) in Theorem 1 and \(J({\bf {s}} \to {{\bf {s}}_{\bf {\Sigma }}}) = \,\,|{\bf {\Sigma }}{|^{ - (p + 1)/2}}\) as \[\begin {array}{l} {w_p}({{\bf {S}}_{\bf {\Sigma }}}|{\bf {\Sigma }},n) = \cfrac {{\exp \{ - {\rm {tr}}({\bf {S}})/2\} |{\bf {S}}{|^{(n - p - 1)/2}}}}{{{2^{np/2}}{\Gamma _p}(n/2)}}|{\bf {\Sigma }}{|^{ - (p + 1)/2}}\\ = \cfrac {{\exp \{ - {\rm {tr}}({{\bf {\Sigma }}^{ - 1/2}}{{\bf {S}}_{\bf {\Sigma }}}{{\bf {\Sigma }}^{ - 1/2}})/2\} |{{\bf {\Sigma }}^{ - 1/2}}{{\bf {S}}_{\bf {\Sigma }}}{{\bf {\Sigma }}^{ - 1/2}}{|^{(n - p - 1)/2}}|{\bf {\Sigma }}{|^{ - (p + 1)/2}}}}{{{2^{np/2}}{\Gamma _p}(n/2)}}\\ = \cfrac {{\exp \{ - {\rm {tr}}({{\bf {\Sigma }}^{ - 1}}{{\bf {S}}_{\bf {\Sigma }}})/2\} |{{\bf {S}}_{\bf {\Sigma }}}{|^{(n - p - 1)/2}}}}{{{2^{np/2}}|{\bf {\Sigma }}{|^{n/2}}{\Gamma _p}(n/2)}}. \end {array}\] \(\sqcap \)\(\sqcup \)

Remark 4 When Lemma 7 for the Jacobian of \({{\bf {S}}_{\bf {\Sigma }}} \to {\bf {S}}\) is given, Theorem 2 for the Wishart density for general correlated cases was immediately obtained. Conversely, when the Wishart densities for \(\bf {S}\) and \({\bf {S}}_{\bf {\Sigma }}\) are available, the Jacobian is easily given by comparing two densities using \({\bf {S}} = {{\bf {\Sigma }}^{ - 1/2}}{{\bf {S}}_{\bf {\Sigma }}}{{\bf {\Sigma }}^{ - 1/2}}\), which was employed by Anderson (2003, Theorem 7.3.3).

B On Khatri (1963)’s self-contained concise derivation

Khatri (1963) is referred to only by Kshirsagar (1972, p. 59) and, Srivastava and Khatri (1979, p. 76) to the author’s knowledge. The derivation depends on the integral \({\pi ^{k/2}}{q^{(k/2) - 1}}/\Gamma (k/2) = \int _{{{\bf {x}}^{\rm {T}}}{\bf {x}} = q} {{\rm {d}}{x_1} \cdots {\rm {d}}{x_k}}\), where \(q\) is a positive constant and \(x_i\)’s with \({\bf {x}} = {({x_1},...,{x_k})^{\rm {T}}}\) independently follow the standard normal. This integral is typically obtained in a proof of the chi-square distribution with \(k\) df using the surface area \({S_{k - 1}} = 2{\pi ^{k/2}}{r^{k - 1}}/\Gamma (k/2)\) of the \((k-1)\)-sphere with the radius \(r = {q^{1/2}}\) in the \(k\)-dimensional Euclidian space and \({\rm {d}}r = \{ 1/(2{q^{1/2}}{\rm {)\} d}}q\): \[\begin {array}{l} \left \{ {\prod \nolimits _{i = 1}^k {(1/\sqrt {2\pi } )\exp ( - x_i^2/2){|_{{{\bf {x}}^{\rm {T}}}{\bf {x}} = q}}} } \right \}\int _{{{\bf {x}}^{\rm {T}}}{\bf {x}} = q} {{\rm {d}}{x_1} \cdots {\rm {d}}{x_k}} \\ = \cfrac {1}{{{{(2\pi )}^{k/2}}}}\exp \left ( { - \frac {q}{2}} \right )\cfrac {{2{\pi ^{k/2}}{r^{k - 1}}}}{{\Gamma (k/2)}}\cfrac {{{\rm {d}}r}}{{{\rm {d}}q}} = \cfrac {1}{{{{(2\pi )}^{k/2}}}}\exp \left ( { - \frac {q}{2}} \right )\cfrac {{2{\pi ^{k/2}}{q^{(k - 1)/2}}}}{{\Gamma (k/2)}}\cfrac {1}{{2{q^{1/2}}}}\\ = \cfrac {1}{{{2^{k/2}}\Gamma (k/2)}}{q^{(k/2) - 1}}\exp \left ( { - \frac {q}{2}} \right ), \end {array}\] yielding \[\int _{{{\bf {x}}^{\rm {T}}}{\bf {x}} = q} {{\rm {d}}{x_1} \cdots {\rm {d}}{x_k}} = \frac {{2{\pi ^{k/2}}{q^{(k - 1)/2}}}}{{\Gamma (k/2)}}\frac {1}{{2{q^{1/2}}}} = \frac {{{\pi ^{k/2}}{q^{(k/2) - 1}}}}{{\Gamma (k/2)}}.\] Khatri (1963, p. 53) stated that \(\int _{{{\bf {x}}^{\rm {T}}}{\bf {x}} = q} {{\rm {d}}{x_1} \cdots {\rm {d}}{x_k}} = {\pi ^{k/2}}{q^{k/2}}/\Gamma (k/2)\) using our notation, where \(q^{k/2}\) rather than \(q^{(k/2) - 1}\) is probably a typo since otherwise the correct factor \(|{\bf {S}}{|^{(n - p - 1)/2}}\) corresponding to \(q^{(k/2) - 1}\) when \(k = n - p + 1\) in his subsequent expression of the Wishart density does not follow. An alternative short derivation of \(\int _{{{\bf {x}}^{\rm {T}}}{\bf {x}} = q} {{\rm {d}}{x_1} \cdots {\rm {d}}{x_k}}\) was given by Ogasawara (2022) as follows. Suppose that the pdf of the chi-square with \(k\) df, which is equal to that of the gamma with the shape parameter \(k/2\) and the scale parameter 2, is obtained by a different method using e.g., the property of the distribution that the sum of the independent gamma distributed variables with the same scale parameter becomes the gamma with the shape parameter being the sum of those of the gammas and the same scale. Note that the beta integral or the moment generating function can be used for the derivation of this property. Then, we have \[\left \{ {\prod \nolimits _{i = 1}^k {(1/\sqrt {2\pi } )\exp ( - x_i^2/2){|_{{{\bf {x}}^{\rm {T}}}{\bf {x}} = q}}} } \right \}\int _{{{\bf {x}}^{\rm {T}}}{\bf {x}} = q} {{\rm {d}}{x_1} \cdots {\rm {d}}{x_k}} = \frac {{{q^{(k/2) - 1}}\exp ( - q/2)}}{{{2^{k/2}}\Gamma (k/2)}},\] which gives \[\int _{{{\bf {x}}^{\rm {T}}}{\bf {x}} = q} {{\rm {d}}{x_1} \cdots {\rm {d}}{x_k}} = \frac {{{q^{(k/2) - 1}}\exp ( - q/2)/\{ {2^{k/2}}\Gamma (k/2)\} }}{{\prod \nolimits _{i = 1}^k {(1/\sqrt {2\pi } )\exp ( - x_i^2/2){|_{{{\bf {x}}^{\rm {T}}}{\bf {x}} = q}}} }}\, = \frac {{{\pi ^{k/2}}{q^{(k/2) - 1}}}}{{\Gamma (k/2)}}.\] We find that this derivation without using the area of the \((k-1)\)-sphere is similar to that by Anderson (2003) mentioned in Remark 4.

Proof 6 of the Wishart density in Theorem 2 (Khatri, 1963) Khatri’s 1.5-page short derivation is due partially to his concise description. Since the article is less well documented with no title, the citations mentioned earlier using the same incorrect page numbers and several possible typos including the above one for important points and other minor errors, the corrected proof is provided with some added explanations. The derivation consists of a \(p\)-step variable transformation with \(p\) Jacobians canceling most of them after multiplication.

Define the \(p \times n\) matrix \({\bf {X}}_\Sigma \), where each column independently follows \({{\rm {N}}_p}({\bf {0}},{\bf {\Sigma }})\). Partition \({{\bf {S}}_\Sigma } = {{\bf {X}}_\Sigma }{\bf {X}}_\Sigma ^{\rm {T}} = \left [ \begin {array}{l} {{\bf {S}}_{p - 1}}\,\,{{\bf {s}}_{p - 1}}\,\\ {\bf {s}}_{p - 1}^{\rm {T}}\,\,\,{s_{pp}} \end {array} \right ] = \left [ \begin {array}{l} {{\bf {X}}_{p - 1}}{\bf {X}}_{p - 1}^{\rm {T}}\,\,\,{{\bf {X}}_{p - 1}}{{\bf {x}}_p}\,\\ \,\,{\bf {x}}_p^{\rm {T}}{\bf {X}}_{p - 1}^{\rm {T}}\,\,\,\,\,\,\,\,{\bf {x}}_p^{\rm {T}}{{\bf {x}}_p}\, \end {array} \right ]\), where e.g., \(s_{pp}\) is temporarily used in place of \(s_{\Sigma pp}\) for simplicity. Define the \(n \times n\) matrix \({{\bf {P}}_n} = \left [ \begin {array}{l} {{\bf {X}}_{p - 1}}\\ {{\bf {Y}}_{n - p + 1}}\, \end {array} \right ]\), where the \((n - p + 1) \times n\) submatrix \({\bf {Y}}_{n - p + 1}\) is chosen such that \({{\bf {Y}}_{n - p + 1}}{\bf {X}}_{p - 1}^{\rm {T}} = {\bf {O}}\) and \({{\bf {Y}}_{n - p + 1}}{\bf {Y}}_{n - p + 1}^{\rm {T}} = {{\bf {I}}_{n - p + 1}}\). Then, we have \({{\bf {P}}_n}{\bf {P}}_n^{\rm {T}} = \left [ \begin {array}{l} {{\bf {S}}_{p - 1}}\,\,{\bf {O}}\,\\ {\bf {O}}\,\,\,\,\,\,\,{{\bf {I}}_{n - p + 1}} \end {array} \right ]\), which gives \(|{{\bf {P}}_n}|{\,_ + } = \,|{{\bf {P}}_n}{\bf {P}}_n^{\rm {T}}{|^{1/2}} = \,|{{\bf {S}}_{p - 1}}{|^{1/2}}\). Consider the variable transformation from \({\bf {x}}_p\) to \({{\bf {P}}_n}{{\bf {x}}_p}\) with \({({\bf {s}}_{p - 1}^{\rm {T}},\,\,{\bf {z}}_{n - p + 1}^{\rm {T}})^{\rm {T}}} = {{\bf {P}}_n}{{\bf {x}}_p}\), where \({{\bf {z}}_{n - p + 1}} \equiv {{\bf {Y}}_{n - p + 1}}{{\bf {x}}_p}\) and \(J({{\bf {x}}_p} \to {{\bf {P}}_n}{{\bf {x}}_p}) = \,|{{\bf {P}}_n}|_ + ^{ - 1} = \,|{{\bf {S}}_{p - 1}}{|^{ - 1/2}}\). Since \[\begin {array}{l} {s_{pp}} = {\bf {x}}_p^{\rm {T}}{{\bf {x}}_p} = ({\bf {s}}_{p - 1}^{\rm {T}},\,\,{\bf {z}}_{n - p + 1}^{\rm {T}}){\bf {P}}_p^{{\rm {T}} - 1}{\bf {P}}_p^{ - 1}{({\bf {s}}_{p - 1}^{\rm {T}},\,\,{\bf {z}}_{n - p + 1}^{\rm {T}})^{\rm {T}}}\\ \,\,\,\, = ({\bf {s}}_{p - 1}^{\rm {T}},\,\,{\bf {z}}_{n - p + 1}^{\rm {T}})\left [ \begin {array}{l} {\bf {S}}_{p - 1}^{ - 1}\,\,\,{\bf {O}}\\ {\bf {O}}\,\,\,\,\,\,\,\,{{\bf {I}}_{n - p + 1}} \end {array} \right ]\left [ \begin {array}{l} {{\bf {s}}_{p - 1}}\\ {{\bf {z}}_{n - p + 1}} \end {array} \right ] = {\bf {s}}_{p - 1}^{\rm {T}}{\bf {S}}_{p - 1}^{ - 1}{{\bf {s}}_{p - 1}} + {\bf {z}}_{n - p + 1}^{\rm {T}}{{\bf {z}}_{n - p + 1}}, \end {array}\] we have \({\bf {z}}_{n - p + 1}^{\rm {T}}{{\bf {z}}_{n - p + 1}} = {s_{pp}} - {\bf {s}}_{p - 1}^{\rm {T}}{\bf {S}}_{p - 1}^{ - 1}{{\bf {s}}_{p - 1}} = \,|{{\bf {S}}_\Sigma }|/|{{\bf {S}}_{p - 1}}|\).

Using the multivariate normal density, the joint marginal density of \({\bf {X}}_{p - 1}\), when a random matrix \({\bf {S}}_\Sigma \) at \({\bf {S}}_\Sigma \) is a fixed one, becomes \[\begin {array}{l} {f_{{{\bf {X}}_{p - 1}}}}({{\bf {X}}_{p - 1}}) \equiv {f_{{{\bf {X}}_{p - 1}}}}\\ = \int _{ - \infty }^\infty { \cdots \int _{ - \infty }^\infty {{{(2\pi )}^{ - np/2}}|{\bf {\Sigma }}{|^{ - n/2}}} } \exp \{ - {\rm {tr(}}{{\bf {\Sigma }}^{ - 1}}{{\bf {S}}_\Sigma })/2\} \\ \,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\, \times J\{ {{\bf {x}}_p} \to {({\bf {s}}_{p - 1}^{\rm {T}},\,\,{\bf {z}}_{n - p + 1}^{\rm {T}})^{\rm {T}}}\} {\rm {d}}{z_1} \cdots {\rm {d}}{z_{n - p + 1}}\\ = \int _{ - \infty }^\infty {{{(2\pi )}^{ - np/2}}|{\bf {\Sigma }}{|^{ - n/2}}} \exp \{ - {\rm {tr(}}{{\bf {\Sigma }}^{ - 1}}{{\bf {S}}_\Sigma })/2\} \,|{{\bf {S}}_{p - 1}}{|^{ - 1/2}}{\rm {d}}{{\bf {z}}_{n - p + 1}}, \end {array}\] where the integrand does not include \({\bf {z}}_{n - p + 1}\). Then, the above integral becomes \[\begin {array}{l} {f_{{{\bf {X}}_{p - 1}}}} = {(2\pi )^{ - np/2}}|{\bf {\Sigma }}{|^{ - n/2}}\exp \{ - {\rm {tr(}}{{\bf {\Sigma }}^{ - 1}}{{\bf {S}}_\Sigma })/2\} \,|{{\bf {S}}_{p - 1}}{|^{ - 1/2}}\\ \,\,\,\,\,\,\,\,\,\,\,\,\, \times \int _{{\bf {z}}_{n - p + 1}^{\rm {T}}{{\bf {z}}_{n - p + 1}} = \,|{{\bf {S}}_\Sigma }|/|{{\bf {S}}_{p - 1}}|} {{\rm {d}}{{\bf {z}}_{n - p + 1}}} \\ = {(2\pi )^{ - np/2}}|{\bf {\Sigma }}{|^{ - n/2}}\exp \{ - {\rm {tr(}}{{\bf {\Sigma }}^{ - 1}}{{\bf {S}}_\Sigma })/2\} \,|{{\bf {S}}_{p - 1}}{|^{ - 1/2}}\\ \,\,\,\,\,\,\,\,\,\,\,\, \times \cfrac {{{\pi ^{(n - p + 1)/2}}\,{{(|{{\bf {S}}_\Sigma }|/|{{\bf {S}}_{p - 1}}|)}^{\{ (n - p + 1)/2\} - 1}}}}{{\Gamma \{ (n - p + 1)/2\} }}\\ = \cfrac {{{\pi ^{(n - p + 1)/2}}|{{\bf {S}}_\Sigma }{|^{(n - p - 1)/2}}\exp \{ - {\rm {tr(}}{{\bf {\Sigma }}^{ - 1}}{{\bf {S}}_\Sigma })/2\} }}{{{{(2\pi )}^{np/2}}|{\bf {\Sigma }}{|^{n/2}}\Gamma \{ (n - p + 1)/2\} }}\cfrac {1}{{|{{\bf {S}}_{p - 1}}{|^{(n - p)/2}}}}, \end {array}\] where Khatri’s (p. 54) expression \(|{{\bf {S}}_{p - 1}}|(n - p - 2)/2\) using our notation in place of \(|{{\bf {S}}_{p - 1}}{|^{(n - p)/2}}\) is incorrect. Define \({{\bf {X}}_{p - i}}\{ (p - i) \times n\}\), \({{\bf {Y}}_{n - p + i}}\{ (n - p + i) \times n\}\), \({{\bf {S}}_{p - i}}\{ (p - i) \times (p - i)\} ,\) \({{\bf {s}}_{p - i}}\{ (p - i) \times 1\}\) and \({{\bf {z}}_{n - p + i}}\{ (n - p + i) \times 1\} \,\,(i = 2,...,p - 1)\) similarly to those when \(i = 1\), respectively. Then, using these matrices and vectors in similar manners, we have the successive transformations as \[\begin {array}{l} {f_{{{\bf {X}}_1}}} = \cfrac {{{\pi ^{(n - p + 1)/2}}|{{\bf {S}}_\Sigma }{|^{(n - p - 1)/2}}\exp \{ - {\rm {tr(}}{{\bf {\Sigma }}^{ - 1}}{{\bf {S}}_\Sigma })/2\} }}{{{{(2\pi )}^{np/2}}|{\bf {\Sigma }}{|^{n/2}}\Gamma \{ (n - p + 1)/2\} }}\cfrac {1}{{|{{\bf {S}}_{p - 1}}{|^{(n - p)/2}}}}\\ \,\,\,\,\,\,\,\,\,\,\,\, \times \prod \nolimits _{i = 2}^{p - 1} {\int _{{\bf {z}}_{n - p + i}^{\rm {T}}{{\bf {z}}_{n - p + i}} = \,|{{\bf {S}}_{p - i + 1}}/|{{\bf {S}}_{p - i}}|} {{\rm {d}}{{\bf {z}}_{n - p + i}}} } \\ = \cfrac {{{\pi ^{(n - p + 1)/2}}|{{\bf {S}}_\Sigma }{|^{(n - p - 1)/2}}\exp \{ - {\rm {tr(}}{{\bf {\Sigma }}^{ - 1}}{{\bf {S}}_\Sigma })/2\} }}{{{{(2\pi )}^{np/2}}|{\bf {\Sigma }}{|^{n/2}}\Gamma \{ (n - p + 1)/2\} }}\cfrac {1}{{|{{\bf {S}}_{p - 1}}{|^{(n - p)/2}}}}\\ \,\,\,\, \times \prod \limits _{i = 2}^{p - 1} {\cfrac {{{\pi ^{(n - p + i)/2}}\,\,|{{\bf {S}}_{p - i + 1}}{|^{(n - p + i - 2)/2}}/|{{\bf {S}}_{p - i}}{|^{\{ (n - p + i - 1)/2}}}}{{\Gamma \{ (n - p + i)/2\} }}} \\ = \cfrac {{{\pi ^{[(n - p)(p - 1) + \{ p(p - 1)/2\} - np]/2}}|{{\bf {S}}_\Sigma }{|^{(n - p - 1)/2}}\exp \{ - {\rm {tr(}}{{\bf {\Sigma }}^{ - 1}}{{\bf {S}}_\Sigma })/2\} }}{{{2^{np/2}}|{\bf {\Sigma }}{|^{n/2}}\prod \nolimits _{i = 1}^{p - 1} {\Gamma \{ (n - p + i)/2\} } }}|{{\bf {S}}_1}{|^{ - (n - 2)/2}}\\ = \cfrac {{|{{\bf {S}}_\Sigma }{|^{(n - p - 1)/2}}\exp \{ - {\rm {tr(}}{{\bf {\Sigma }}^{ - 1}}{{\bf {S}}_\Sigma })/2\} }}{{{2^{np/2}}|{\bf {\Sigma }}{|^{n/2}}{\pi ^{p(p - 1)/4}}\prod \nolimits _{i = 1}^p {\Gamma \{ (n - p + i)/2\} } }} \times \cfrac {{{{({{\bf {X}}_1}{\bf {X}}_1^{\rm {T}})}^{ - (n - 2)/2}}}}{{{\pi ^{n/2}}/\Gamma (n/2)}}. \end {array}\] Noting that \({({{\bf {X}}_1}{\bf {X}}_1^{\rm {T}})^{ - (n - 2)/2}} = \,|{{\bf {S}}_1}{|^{ - (n - 2)/2}} = s_{11}^{ - (n - 2)/2}\) is a fixed quantity, the last step is the integral with respect to the row vector \({\bf {X}}_1\): \[\begin {array}{l} {w_p}({{\bf {S}}_{\bf {\Sigma }}}|{\bf {\Sigma }},n) = {f_{{{\bf {X}}_1}}}\int _{{{\bf {X}}_{\rm {1}}}{\bf {X}}_{\rm {1}}^{\rm {T}}{\rm { = }}{{\rm {s}}_{{\rm {11}}}}} {\rm {d}} {{\bf {X}}_1} = {f_{{{\bf {X}}_1}}}{\pi ^{n/2}}s_{11}^{(n/2) - 1}/\Gamma (n/2) \\ = \cfrac {{|{{\bf {S}}_\Sigma }{|^{(n - p - 1)/2}}\exp \{ - {\rm {tr(}}{{\bf {\Sigma }}^{ - 1}}{{\bf {S}}_\Sigma })/2\} }}{{{2^{np/2}}|{\bf {\Sigma }}{|^{n/2}}{\pi ^{p(p - 1)/4}}\prod \nolimits _{i = 1}^p {\Gamma \{ (n - p + i)/2\} } }}. \end {array}\] \(\sqcap \)\(\sqcup \)

C The exchanged Bartlett decomposition

The Bartlett decomposition \({\bf {S}} = {\bf {T}}{{\bf {T}}^{\rm {T}}}\) has been used in this paper as well as in literatures. Let \({\bf {S}} = {\bf {U}}{{\bf {U}}^{\rm {T}}}\), where \({\bf {U}}( \ne {{\bf {T}}^{\rm {T}}})\) is the upper-triangular matrix whose non-zero elements are random variables. Note that \(\bf {U}\) can be obtained by rotating \(\bf {T}\) as \({\bf {U}} = {\bf {TV}}\) using an orthonormal matrix \(\bf {V}\). Define the upper-triangular matrix \(\bf {C}\) satisfying \({\bf {\Sigma }} = {\bf {C}}{{\bf {C}}^{\rm {T}}}\) with \({c_{ii}} > 0\,\,(i = 1,...,p)\), where \(\bf {C}\) is obtained by \({\bf {C}} = {\bf {B}}{{\bf {V}}^*}\) and \({\bf {V}}^*\) is another orthonormal matrix. Recall that the Cholesky decomposition \({\bf {\Sigma }} = {\bf {B}}{{\bf {B}}^{\rm {T}}}\) was used earlier. The form \({\bf {\Sigma }} = {\bf {C}}{{\bf {C}}^{\rm {T}}}\) is also called the exchanged (reversed) Cholesky or upper-lower (UL) decomposition in this paper.

Remark 5 Consider the distribution of \({u_{ij}}(i = 1,...,p;j = i,...,p)\), which are assumed to be mutually independent. As in the case of the usual Bartlett, Lemma 1 shows that when \(u_{ii}\) is chi-distributed with \(n - p + i\) df \((i = 1,...,p)\) and \(u_{ij}\) is standard normal \((i = 1,...,p;j = i + 1,...,p)\), the distribution of \({\bf {S}} = {\bf {X}}{{\bf {X}}^{\rm {T}}}( = {\bf {T}}{{\bf {T}}^{\rm {T}}})\) is the same as that of \({\bf {U}}{{\bf {U}}^{\rm {T}}}\). Note that \(t_{ii}\) is chi-distributed with \(n - i + 1\) df rather than \(n - p + i\). The joint pdf of \(\bf {U}\) denoted by \({f_p}({\bf {U}})\) becomes \[\begin {array}{l} {f_p}({\bf {U}}) = \left [ {\prod \limits _{i = 1}^p {\cfrac {{u_{ii}^{n - p + i - 1}\exp ( - u_{ii}^2/2)}}{{{2^{\{ (n - p + i)/2\} - 1}}\Gamma \{ (n - p + i)/2\} }}} } \right ] \\ \,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\, \times \frac {1}{{{{(\sqrt {2\pi } )}^{({p^2} - p)/2}}}}\left \{ {\prod \limits _{1 \le i < j \le p} {\exp \left ( { - u_{ij}^2/2} \right )} } \right \}\\ = \cfrac {{\left \{ {\prod \limits _{i = 1}^p {u_{ii}^{n - p + i - 1}\exp ( - u_{ii}^2/2)} } \right \}\left \{ {\prod \limits _{1 \le i < j \le p} {\exp \left ( { - u_{ij}^2/2} \right )} } \right \}}}{{{2^{\frac {{(n - p)p}}{2} + \frac {{p(p + 1)}}{4} - p}} \times {2^{\frac {{p(p - 1)}}{4}}}{\pi ^{\frac {{p(p - 1)}}{4}}}\prod \limits _{i = 1}^p {\Gamma \{ (n - p + i)/2\} } }} \\ = \cfrac {{\left ( {\prod \limits _{i = 1}^p {u_{ii}^{n - p + i - 1}} } \right )\exp \{ - {\rm {tr}}({\bf {U}}{{\bf {U}}^{\rm {T}}})/2\} }}{{{2^{\frac {{np}}{2} - p}}{\Gamma _p}(n/2)}}. \end {array}\]

Proof 7 of the Wishart density in Theorem 2 Consider the one-step transformation from \(\bf {U}\) to \({{\bf {S}}_\Sigma } = {\bf {CX}}{{\bf {X}}^{\rm {T}}}{{\bf {C}}^{\rm {T}}} = {\bf {CS}}{{\bf {C}}^{\rm {T}}} = {\bf {CU}}{{\bf {U}}^{\rm {T}}}{{\bf {C}}^{\rm {T}}}\), where it is found that \({\bf {C}}{({\bf {X}})_{ \cdot j}}\mathop \sim \limits ^{{\rm {i}}{\rm {.i}}{\rm {.d}}{\rm {.}}} {{\rm {N}}_p}({\bf {0}},{\bf {\Sigma }})\) \((j = 1,...,n)\). Redefine the vector of the non-duplicated elements in \({\bf {S}}_\Sigma \) as \({{\bf {s}}_\Sigma } = {({s_{\Sigma 11}},...,{s_{\Sigma 1p}},{s_{\Sigma 22}},...,{s_{\Sigma 2p}},...,{s_{\Sigma pp}})^{\rm {T}}}\) whose elements are lexicographically ordered Similarly, define the \(\{ p(p + 1)/2\} \times 1\) vectors \(\bf {c}\) and \(\bf {u}\) using the corresponding elements of \(\bf {C}\) and \(\bf {U}\), respectively.

The proof is similar to Proof 1 of Lemma 5. Since \(\bf {C}\), \(\bf {U}\) and \(\bf {CU}\) are upper-triangular, the Jacobian matrix \(\partial {{\bf {s}}_\Sigma }/\partial {{\bf {u}}^{\rm {T}}} = \left \{ {\partial {s_{\Sigma ij}}/\partial {u_{kl}}} \right \}\) \((1 \le i \le j \le p;1 \le k \le l \le p)\) becomes upper-triangular, whose diagonal elements are \[\frac {{\partial {s_{\Sigma ij}}}}{{\partial {u_{ij}}}} = {\{ {\bf {C}}({{\bf {E}}_{ij}}{{\bf {U}}^{\rm {T}}} + {\bf {U}}{{\bf {E}}_{ji}}){{\bf {C}}^{\rm {T}}}\} _{ij}} = {({\bf {C}}{{\bf {E}}_{ij}}{{\bf {U}}^{\rm {T}}}{{\bf {C}}^{\rm {T}}})_{ij}} = {c_{ii}}{u_{jj}}{c_{jj}}\,\,(1 \le i < j \le p)\] and \[\frac {{\partial {s_{\Sigma ii}}}}{{\partial {u_{ii}}}} = {\{ {\bf {C}}({{\bf {E}}_{ii}}{{\bf {U}}^{\rm {T}}} + {\bf {U}}{{\bf {E}}_{ii}}){{\bf {C}}^{\rm {T}}}\} _{ii}} = 2c_{ii}^2{u_{ii}}\,\,(i = 1,...,p).\] Since the determinant of the Jacobian matrix or \(J({{\bf {S}}_\Sigma } \to {\bf {U}})\) becomes \[\begin {array}{l} \prod \nolimits _{i = 1}^p {\prod \nolimits _{j = i}^p {\frac {{\partial {s_{\Sigma ij}}}}{{\partial {u_{ij}}}}} } = \left ( {\prod \nolimits _{i = 1}^p {\prod \nolimits _{j = i + 1}^p {\frac {{\partial {s_{\Sigma ij}}}}{{\partial {u_{ij}}}}} } } \right )\prod \nolimits _{i = 1}^p {\frac {{\partial {s_{\Sigma ii}}}}{{\partial {u_{ii}}}}} = {2^p}\prod \nolimits _{i = 1}^p {\prod \nolimits _{j = i}^p {{c_{ii}}{u_{jj}}{c_{jj}}} } \\ = {2^p}\left ( {\prod \nolimits _{i = 1}^p {c_{ii}^{p - i + 1}} } \right )\prod \nolimits _{j = 1}^p {u_{jj}^jc_{jj}^j} = {2^p}\prod \nolimits _{i = 1}^p {c_{ii}^{p + 1}u_{ii}^i} = {2^p}|{\bf {C}}{|^{p + 1}}\prod \nolimits _{i = 1}^p {u_{ii}^i} \\ = {2^p}|{\bf {\Sigma }}{|^{(p + 1)/2}}\prod \nolimits _{i = 1}^p {u_{ii}^i} , \end {array}\] \(J({\bf {U}} \to {{\bf {S}}_\Sigma })\) is given by the reciprocal of the above quantity.

The Wishart density is given by \({f_p}({\bf {U}})\) and \(J({\bf {U}} \to {{\bf {S}}_\Sigma })\): \[\begin {array}{l} {w_p}({{\bf {S}}_{\bf {\Sigma }}}|{\bf {\Sigma }},n) = {f_p}({\bf {U}})J({\bf {U}} \to {{\bf {S}}_{\bf {\Sigma }}})\\ = \cfrac {{\exp \{ - {\rm {tr}}({\bf {U}}{{\bf {U}}^{\rm {T}}})/2\} \prod \limits _{i = 1}^p {u_{ii}^{n - p + i - 1}} }}{{{2^{(np/2) - p}}{\Gamma _p}(n/2)}}\,\cfrac {{|{\bf {\Sigma }}{|^{ - (p + 1)/2}}}}{{{2^p}\prod \nolimits _{i = 1}^p {u_{ii}^i} }}\\ = \cfrac {{\exp \{ - {\rm {tr}}({\bf {U}}{{\bf {U}}^{\rm {T}}})/2\} |{\bf {\Sigma }}{|^{ - (p + 1)/2}}\prod \limits _{i = 1}^p {u_{ii}^{n - p - 1}} }}{{{2^{np/2}}{\Gamma _p}(n/2)}}\\ = \cfrac {{\exp \{ - {\rm {tr}}({{\bf {C}}^{ - 1}}{{\bf {S}}_{\bf {\Sigma }}}{{\bf {C}}^{{\rm {T}} - 1}})/2\} |{\bf {\Sigma }}{|^{ - (p + 1)/2}}|{{\bf {C}}^{ - 1}}{{\bf {S}}_{\bf {\Sigma }}}{{\bf {C}}^{{\rm {T}} - 1}}{|^{(n - p - 1)/2}}}}{{{2^{np/2}}{\Gamma _p}(n/2)}}\\ = \cfrac {{\exp \{ - {\rm {tr}}({{\bf {\Sigma }}^{ - 1}}{{\bf {S}}_{\bf {\Sigma }}})/2\} |{{\bf {S}}_{\bf {\Sigma }}}{|^{(n - p - 1)/2}}}}{{{2^{np/2}}|{\bf {\Sigma }}{|^{n/2}}{\Gamma _p}(n/2)}} \end {array}\] as expected. \(\sqcap \)\(\sqcup \)

Remark 6 Though \({\bf {U}} \ne {{\bf {T}}^{\rm {T}}}\) as noted earlier, \(\bf {U}\) is obtained by reversing the row indexes of \(\bf {T}\) followed by the similar reversal of the column ones. When \(p\) = 3, this transformation proceeds as \[{\bf {T}} = \left [ \begin {array}{l} {t_{11}}\,\,\,0\,\,\,\,\,\,\,0\,\\ {t_{21}}\,\,\,{t_{22}}\,\,\,0\\ {t_{31}}\,\,\,{t_{32}}\,\,\,{t_{33}} \end {array} \right ] \to \left [ \begin {array}{l} {t_{31}}\,\,\,{t_{32}}\,\,\,{t_{33}}\\ {t_{21}}\,\,\,{t_{22}}\,\,\,0\\ {t_{11}}\,\,\,\,0\,\,\,\,\,\,0\, \end {array} \right ] \to \left [ \begin {array}{l} {t_{33}}\,\,\,{t_{32}}\,\,\,{t_{31}}\\ 0\,\,\,\,\,\,\,{t_{22}}\,\,\,{t_{21}}\\ 0\,\,\,\,\,\,\,\,0\,\,\,\,\,\,{t_{11}}\, \end {array} \right ] \equiv \left [ \begin {array}{l} {u_{11}}\,\,\,{u_{12}}\,\,\,{u_{13}}\\ 0\,\,\,\,\,\,\,\,{u_{22}}\,\,\,{u_{23}}\\ 0\,\,\,\,\,\,\,\,\,\,0\,\,\,\,\,\,\,{u_{33}}\, \end {array} \right ] = {\bf {U}}.\] The above example indicates other decompositions \({\bf {S}} = {{\bf {T}}^*}{{\bf {T}}^{{\rm {*T}}}} = {{\bf {U}}^*}{{\bf {U}}^{{\rm {*T}}}}\) with the unchanged distribution of \({\bf {S}} = {\bf {X}}{{\bf {X}}^{\rm {T}}}\), where \({{\bf {T}}^*}({{\bf {U}}^*})\) is a lower (upper)-triangular matrix defined with the non-zero elements on and below (above) the minor diagonals. Note that \({\bf {T}}^*\) and \({\bf {U}}^*\) are obtained by \(\bf {T}\) and \(\bf {U}\) by reversing the row or column indexes. When \(p\) = 3, \({\bf {T}}^*\) and \({\bf {U}}^*\) are \(\left [ \begin {array}{l} 0\,\,\,\,\,\,0\,\,\,\,\,\,{t_{11}}\\ 0\,\,\,\,\,\,{t_{22}}\,\,{t_{21}}\\ {t_{33}}\,\,{t_{32}}\,\,{t_{31}}\, \end {array} \right ] \equiv \left [ \begin {array}{l} 0\,\,\,\,\,\,0\,\,\,\,\,\,t_{33}^*\\ 0\,\,\,\,\,t_{22}^*\,\,\,t_{23}^*\\ t_{31}^*\,t_{32}^*\,\,\,t_{33}^*\, \end {array} \right ]\) and \(\left [ \begin {array}{l} {u_{13}}\,\,\,{u_{12}}\,\,\,{u_{11}}\\ {u_{23}}\,\,\,{u_{22}}\,\,\,0\\ {u_{33}}\,\,\,\,\,\,0\,\,\,\,\,\,0\, \end {array} \right ] \equiv \left [ \begin {array}{l} u_{11}^*\,\,\,u_{12}^*\,\,\,u_{13}^*\\ u_{21}^*\,\,\,u_{22}^*\,\,\,0\\ u_{31}^*\,\,\,\,\,0\,\,\,\,\,\,\,0\, \end {array} \right ]\), respectively.

Actually, we have infinitely many transformations with the unchanged distribution of \(\bf {S}\), including the above ones, using various orthonormal \(p \times p\) matrices denoted by \(\bf {V}\)’s since each column of \(\bf {VX}\) independently follows \({{\rm {N}}_p}({\bf {0}},{{\bf {I}}_p})\) (see e.g., Anderson2003, Theorem 3.3.1). In other words, the distributions of \(\bf {VX}\) and \(\bf {X}\) are the same. Then, \({\bf {S}} = {\bf {X}}{{\bf {X}}^{\rm {T}}}\) can be replaced by \({\bf {S}} = {\bf {VX}}{{\bf {X}}^{\rm {T}}}{{\bf {V}}^{\rm {T}}}\). Note that one of the decomposed matrices e.g., \({\bf {T}},\,\,\,{{\bf {T}}^*},\,\,{\bf {U}}\) and \({\bf {U}}^*\) are given by other ones using \(\bf {V}\) as \({\bf {T}} = {\bf {V}}{{\bf {U}}^{\bf {*}}}\). This indeterminacy of transformation is similar to the rotational indeterminacy in orthogonal rotation in factor analysis and canonical correlation analysis or more generally transformations in structural equation modeling (Ogasawara2007Schuberth2021Yu, Schuberth, & Henseler2023).