For what it is worth, I personally believe it is very likely that the BMW calculation that matches the experimentally measured value is the correct one, and that the apparent muon g-2 anomaly hinting at a variety of possible "new physics" is, in fact, merely due to a flaw in the Theory Initiative determination of the Standard Model predicted value of muon g-2.
The Theory Initiative determination might be flawed because the experimental data it is using to substitute for some lattice QCD calculations is itself flawed. This is something that was found previously to have caused the muonic proton radius problem.
If that is the problem, it could be resolved by redoing the electron collider experiments at the Linear Electron-Positron Collider experiment (LEP) from 1989-2000, upon which the Theory Initiative is mostly relying, with the greater precision and quality control methods that the subsequent two decades of high energy physics has made possible.
On the other hand, if the problem with the Theory Initiative calculation is the way that this experimental data was incorporated into the overall calculation has some subtle flaw, a new theoretical paper could point out the source of the error. This task would be advanced by a better understanding of what part of the Theory Initiative determination is most likely to be flawed allowing scientists to better focus on what kind of methodological error might be involved, which is what this new paper helps to do.
Perspective On The Precision Of These Measurements
In order to maintain perspective it is also important to note that both experimental measurements of muon g-2, and both leading Standard Model theoretical predictions, are identical to the first six significant digits. Thus, they are in perfect agreement up to the one part per million level. Only at the parts per ten million level do discrepancies emerge.
This actually underestimates the precision, because the full value of the magnetic moment of the muon which is actually measured called g(µ) , as opposed to the merely anomalous component of the magnetic moment of the muon called muon g-2, is approximately 2.00233184(1) (i.e. double the anomalous magnetic moment plus two). This adds three more significant digits to its value, making it a parts per billion agreement, with discrepancies arising only at the parts per tens of billions level.
This is greater precision, for example, than the theoretically much easier tasks than the empirically determined precision of a first round counting ballots cast in a statewide or national election, or the count of the number of people residing in the United States on a particular day every ten years in the decennial census.
The discrepancies are arising at a precision equivalent to one millimeter per ten kilometers.
Footnote Regarding Statistical Significance
In physics (and most fields) a discrepancy of less than two sigma (i.e. two standard deviations in a "normal" distribution of data) is considered statistically insignificant and constitutes results that are "consistent" with each other.
In physics, a discrepancy with a global statistical significance of five sigma or more that is replicated and has some plausible theoretical reason is the standard for a definitive scientific discovery.
The focus on "global significance" is due to the "look elsewhere effect" which observed that is you do enough experiments of the same kind, you expect some of the results to be statistical flukes that would be statistically significant if you were only doing on experiment. For example, if you do twenty experiments of the same kind, you expect to have, on average, on outlier that is more than two sigma from the true value. But, correctly calculating global significance is a matter that is more art than science because you need to determine how to count the total number of experiments you have taken which are measuring the same thing, and this turns out to be very hard to define in any complex multifaceted context like particle collider experiments that do millions and billions of collisions or more over the lifetime of the experiment, not all of which are comparable to each other.
In physics, a discrepancy of more than two but less than five sigma, or a discrepancy that hasn't been replicated, or doesn't have any plausible theoretical explanation, is considered a "tension" between theory and experiment, that is stronger if the number of sigma differing between experiment and theory is larger, but doesn't constitute a definitive scientific discovery. Scientists spend a lot of time seeing if tensions that they observe go away with further research, or solidify into higher significance scientific discoveries.
Why Is The QCD Calculation So Difficult?
The QCD calculation is much more difficult than the QED and EW calculations for two main reasons.
Coupling constant strength
One is that all of the calculations involve terms for each power of the coupling constant (a dimensionless number) of the force in question, and the magnitude of these coupling constants is very different for the respective forces.
In other words, they take the form:
a*g + b*g^2 + c*g^3 . . .
where a, b, and c are real numbers that come from adding up the calculations for the terms with the same power of the coupling constant for the force in question, and g is the coupling constant for the force in question.
The strong force coupling constant at the muon mass is in the ballpark of:
0.7 to 0.9
and even at the fourth power is it still about 0.24.
It gets significantly weaker at higher energy scales reaching: 0.1184 at the energy scale of the Z boson mass (of about 91.1 GeV).
The QED coupling constant is, in the low energy limit:
0.007 297 352 569 3(11)
and at the fourth power is it about 0.000 000 002 8 which is about one hundred million times smaller than the fourth power of the strong force coupling constant at the muon mass.
The QED couple constant gets slightly stronger at higher energy scales, reaching about 0.00787 at the energy scale of the Z boson mass.
The weak force coupling constant is on the order of:
0.000001
and at the fourth power is it about 10^-24.
Converted in comparable terms to the coupling constants above at the electron mass, the gravitational coupling constant is about
6 * 10^-39
which is comparable in magnitude to the sixth power of the weak force coupling constant, the thirteenth power of the QED coupling constant, and a far higher power of the QCD coupling constant.
As a result, terms with higher powers of the QCD coupling constant can't be ignored (especially in low energy interactions where the methodology used for QED and weak force calculations call perturbative methods break down and different methods called lattice QCD need to be used), while higher order terms in QED (typically calculated to the fifth power of the QED coupling constant) and the weak force calculation can ve ignored.
Gluon self-interactions
Let's return to our formula for each contribution in the form
a*g + b*g^2 + c*g^3 . . .
where a, b, and c are real numbers that come from adding up the calculations for the terms with the same power of the coupling constant for the force in question, and g is the coupling constant for the force in question.
This formula is really the sum of terms for every possible way that a process can happen (which is described by a Feynman diagram), and the power of the coupling constant is a function of how many interactions there are with the force in question in a possible way that something can happen.
In the case of QED, the electromagnetic force is carried by the photon, which interacts with electromagnetically charged particles, but not with other photons.
In the case of the weak force, which is carried by W and Z bosons, these force carrying particles can interact with each other, but have very weak interactions making interactions between them very small and necessary to consider only at the first or second order level.
But, in the case of the strong force, which is carried by gluons, gluons interact with each other with a strength on the same order of magnitude as interactions between gluons and quarks in the strong force. This means that at each power of the strong force coupling constant, there are far more terms to be considered than in the QED or EW calculations, and that the rate at which the number of terms grows with each additional power of the strong force coupling constant is much greater than in the QED or EW calculations.
Conclusion
So, the bottom line is that to get comparable precision, you need to consider far higher powers of the coupling constant to do strong force calculations than QED or EW calculations, and the number of terms that have to be calculated at each power of the coupling constant in strong force calculations is also profoundly greater than in QED or EW calculations, with the disparity getting worse with each additional power of the coupling constant you try to consider.
One the calculations are set up for the QED or EW cases, those calculations can be made to maximal precision in less than a day with an ordinary desktop computer with a single processor, and the limiting factor on the number of calculations you do is the precision of the coupling constant measurement which leaves you with spurious accuracy beyond the fifth power of that coupling constant in QED and sooner in the EW calculation.
In contrast, the strong force calculations done to three or four powers of the strong force coupling constant, which still aren't very precise, take weeks of non-stop calculations with the equivalent of millions of single processor desk top computers working together.