Dispatches From Turtle Island: A New W Boson Mass Measurement From CDF

Friday, April 8, 2022

A New W Boson Mass Measurement From CDF

The CDF collaboration, one ofd the two main experimental groups, together with D0, at Fermilab's Tevatron collider which ceased operations in 2011, released a paper making a new measurement of the W boson mass in the journal Science yesterday, which has attracted attention because it is in strong tension the the current global average of experimental measurements of the W boson mass and with the global electroweak fit expectation for the W boson mass.

But, the paper greatly exaggerates what the paper shows, inaccurately asserting that their result "is in significant tension with the standard model expectation."

There Is No Standard Model W Boson Mass Prediction

The 80,357 ± 6 MeV value to which the paper compares its new measurement is not a "prediction of the Standard Model" as the paper claims.

Instead, it is a global electroweak fit of the Standard Model physical constants utilizing data points like the Higgs boson mass and top quark mass, neither of which have a direct functional relationship to the W boson mass in the electroweak portion of the Standard Model of Particle Physics. See, e.g., this 2018 global electroweak fit paper.

The same global electroweak fit procedure suggested that the Higgs boson had a mass of 90,000 ± 20,000 MeV, with contributing estimates from data used in that fit that ranged from 35,000 MeV to 463,000 MeV, each with huge error bars, when the current inverse error weighted global average of the measured real value of the Higgs boson mass is 125,250 ± 170 MeV. A global electroweak fit is not analogous to a Standard Model physics calculation or prediction.

The W boson's mass is an experimentally determined free parameter of the Standard Model (in other words, it is an input to the model, not an output).

More precisely, the W boson mass, the Z boson mass, the electromagnetic coupling constant, the weak force coupling constant, and the Higgs vacuum expectation value are five experimentally determined Standard Model physical constants related to each other in the electroweak portion of the Standard Model that have three degrees of freedom. You can take your pick to some extent which of them you treat as input parameters that are measured, and which you treat as derived values.

The W boson mass is the least precisely determined of these five electroweak constants, but all five of these related Standard Model parameters are known quite precisely (note that the table below which I put together uses the Particle Data Group global averages).

The global electroweak fit process is not part of the Standard Model and is really not all that much more than a sophisticated informed guessing game.

Calling a global electroweak fit the "standard model expectation" is nothing more or less than misleading, and the fact that the results were spun this way suggests that the authors want to direct attention away from the real story which is that their measurement is an outlier with respect to other experimental measurements, just as one of their original measurements in 2001 was. If I were a peer reviewer of the Science article article that was published yesterday, I would have objected strenuously to that assertion.

Likewise, the paper's discussion early on of the mysteries of the Higgs mechanism, dark matter, and extensions of the Standard Model, while not quite as problematic, is likewise gratuitous window dressing and doesn't belong in a paper that is merely reporting an update of a Standard Model constant measurement from 11 year old data.

Additional Details From The New Paper

The body text of the newly announced CDF result clarifies that the bottom line number for their new W boson mass measurement is 80,433.5 ± 6.4 statistical ± 6.9 systemic MeV (a combined uncertainty of ± 9.4 MeV).

According to the paper this implies a combined Tevaton of 80,427.4 ± 8.9 MeV, and a combined Tevatron and LEP of 80,424.2 ± 8.7 MeV. The new result is is exactly the same as one of the the 2001 measurement by CDF (which was also an outlier that was included in but diluted in the current global average) but with a claimed uncertainty of 9.4 MeV instead of 79 MeV.

According to Fermilab's press release related to the paper:

This result uses the entire dataset collected from the Tevatron collider at Fermilab. It is based on the observation of 4.2 million W boson candidates, about four times the number used in the analysis the collaboration published in 2012.

But, to be honest, my intuition is that a claim to shift the combined average up by 50.4 MeV using four times as much data (all at least 11 years old and 25% of it exactly the same data) from the very same machine while reducing the uncertainty by 44% (7 MeV) raises yellow flags.

It is harder to tell than it should be if the newly calculated CDF number included both the D0 experiment data and the CDF data from Tevatron (as the press release seems to imply), or just the CDF data (as the way the data is talked about in the paper itself seems to imply), but as best as I can tell, except in the combined Tevatron number noted above, only the CDF data from Tevatron is being used.

The paper also provides an updated the Z boson measurement of:

91,192.0 ± 6.4 stat ± 4.0 syst MeV [ed. combined error 7.5 MeV] (stat, statistical uncertainty; syst, systematic uncertainty), which is consistent with the world average of 91,187.6 ± 2.1 MeV.

This is also a source of doubt, rather than confirmation as claimed in the paper. My intuition is that the Z boson uncertainty should be lower than the W boson measurement uncertainty to a larger extent than it is, and instead it was only slightly smaller.

Rather than overturning the Standard Model, all this result should do, at most, is replace the old combined Tevatron value of 80,387 ± 16 MeV with a new combined Tevatron value of 80,427.4 ± 8.9 MeV which will pull the global average a little higher than it used to be and tweak the old global electroweak fit.

But, in addition to shifting up the global average, this result will probably actually increase rather than decrease the uncertainty in the overall global average because the contributing data points are now a lot less tightly clustered than they were before relative to their claimed uncertainties, which again undermines the credibility of the assertion that the claimed uncertainties of the new CDF value are correct.

Prior Experimental Data Compared

The disagreement with prior experiments is real. See the Particle Data Group's W Boson Mass entry. See also their narrative explanation.

If I were inclined to attribute bad motives, which to some extent I am in this case, I'd say that spinning this result as a deviation from the Standard Model is an attempt to distract attention away from how badly their result deviated from other experimental measurements, which is the real story here.

When your result which claims to have only modestly less uncertainty than the prior measurements of the same quantity by multiple independent groups is a huge outlier with respect to everyone else; it is more likely that you or the scientists who are the source of your data, have done something wrong, than it is that you are right and they are wrong. Perhaps, for example, CDF is underestimating the true uncertainty of their measurement, which is very easy to do even for the most sophisticated High Energy Physics (HEP) scientists, since estimating systemic error is as much an art as it is a science (even though estimating statistical error is almost perfect except for issues related to your assumption that the true distribution of error is Gaussian when it in reality usually has fatter tails in studies of past HEP data gathering).

The inverse error weighted global average of best nine most recent independent measurements of the W boson mass prior to this paper is 80,379 ± 12 MeV.

Where does that come from?

Two of those nine measurements are from CDF (80,433 ± 79 MeV in 2001 and 80,387 ± 19 in 2012) and two more are from CDF's sister experiment from Tevatron called D0 (80,483 ± 84 from 2002 and 80,375 ± 23 from 2014), with the older values (in each case) made at 1.8 TeV and the newer values (in each case) made at 1.96 TeV. The four data point inverse error weighted combined Tevatron average was 80387 ± 16 MeV. Three more superseded W boson masses from CDF and D0 were ignored in the global average and ranged from 80,367 MeV to 80,413 MeV.

Another four measurements are from the defunct LEP (linear electron positron collider) from 2006 to 2008 at energies from 161-209 GeV with an error weighted average of 80,376 ± 33 MeV. The range of the LEP measurements was 80,270 MeV to 80,440 MeV.

Many far less precise measurements from 1983 to 2018 were ignored in determining the inverse error weighted world average.

One of the measurements is from ATLAS at the Large Hadron Collider (LHC) is 80,370 ± 18 MeV at an energy of 7 TeV and shares 7 MeV of systemic uncertainty with the Tevatron average.

We should be seeing a Run-2 W boson mass determination from ATLAS, and both Run-1 and Run-2 W boson mass determinations from CMS before too long.

My predisposition is to expect that those results will be more credible than this lagging Tevatron value because the actual experimental apparatus is more state of the art at LHC than it was at Tevatron. Also, fairly or not, the best scientists with the most rigorous quality control get assigned to the new shiny data, rather than analysis of eleven year old archived data from an experiment that is no longer operating.

Chart via this blog which also has quality commentary, which notes that:

The main problem . . . is that the new measurement is in disagreement with all other available measurements. I think this could have been presented better in their paper, mainly because the measurements of the LEP experiments have not been combined, secondly because they don't show the latest result from LHCb. Hence I created a new plot (below), which allows for a more fair judgement of the situation. I also made a back-of-envelope combination of all measurements except of CDF, yielding a value of 80371 ± 14 MeV. It should be pointed out that all these combined measurements rely partly on different methodologies as well as partly on different model uncertainties. The likelihood of the consistency of such a (simple) combination is 0.93. Depending (a bit) on the correlations you assume, this value has a discrepancy of about 4 sigma to the CDF value.

In fact, there are certainly some aspects of the measurement which need to be discussed in more detail (Sorry, now follow some technical aspects, which most likely only people from the field can fully understand): In the context of the LHC Electroweak Working Group, there are ongoing efforts to correctly combine all measurements of the W boson mass; in contrast to what I did above, this is in fact also a complicated business, if you want to do it really statistically sound. My colleague and friend Maarten Boonekamp pointed out in a recent presentation, that the Resbos generator (which was used by CDF) has potentially some problems when describing the spin-correlations in the W boson production in hadron collisions. In fact, there are remarkable changes in the predicted relevant spectra between the Resbos program and the new version of the program Resbos2 (and other generators) as seen in the plot below. On first sight, the differences might be small, but you should keep in mind, that these distributions are super sensitive to the W boson mass. I also attached a small PR plot from our last paper, which indicates the changes in those distributions when we change the W boson mass by 50 MeV, i.e. more than ten times than the uncertainty which is stated by CDF. I really don't want to say that this effect was not yet considered by CDF - most likely it was already fixed since my colleagues from CDF are very experienced physicists, who know what they do and it was just not detailed in the paper. I just want to make clear that there are many things to be discussed now within the community to investigate the cause of the tension between measurements.

Difference in the transverse mass spectrum between Resbos and Resbos2 (left); impact of different W boson mass values on the shapes of transverse mass.

And this brings me to another point, which I consider crucial: I must admit that I am quite disappointed that it was directly submitted to a journal, before uploading the results on a preprint server. We live in 2022 and I think it is by now good practice to do so, simply because the community could discuss these results beforehand - this allows a scientific scrutiny from many scientists which are directly working on similar topics.

More commentary from Matt Strassler is available at his blog (also here). The money quote is this one (emphasis in the original, paragraph breaks inserted editorially for ease of reading):

A natural and persistent question has been:

“How likely do you think it is that this W boson mass result is wrong?”

Obviously I can’t put a number on it, but I’d say the chance that it’s wrong is substantial.

Why?

This measurement, which took several many years of work, is probably among the most difficult ever performed in particle physics. Only first-rate physicists with complete dedication to the task could attempt it, carry it out, convince their many colleagues on the CDF experiment that they’d done it right, and get it through external peer review into Science magazine. But even first-rate physicists can get a measurement like this one wrong. The tiniest of subtle mistakes will undo it.

Physicist Tommaso Dorigo chimes in at his blog and he is firmly in the camp of measurement error and identifies some particularly notable technical issues that could cause the CDF number to be too high.

I already answered the question of whether in my opinion the new CDF measurement of the W boson mass, standing at seven standard deviations away from the predictions of the Standard Model, is a nail in the SM coffin. Now I will elaborate a little on part of the reasons why I have that conviction. I cannot completely spill my guts on the topic here though, as it would take too long - the discussion involves a number of factors that are heterogeneous and distant from the measurement we are discussing. Instead, let us look at the CDF result.

One thing I noticed is that the result with muons is higher than the result with electrons. This may be a fluctuation, of course (the two results are compatible within quoted uncertainties), but if for one second we neglected the muon result, we would get a much better agreement with theory: the electron W mass is measured to 80424.6+-13.2 MeV, which is some 4.5 sigmaish away from theory prediction of 80357+-6 MeV. Still quite a significant departure, but not yet an effect of unheard-of size for accidentals.

Then, another thing I notice is that CDF relied on a custom simulation for much of the phenomena involving the interaction of electrons and muons with the detector. That by itself is great work, but one wonders why not using the good old GEANT4 that all of us know and love for that purpose. It's not like they needed a fast simulation - they had the time!

A third thing I notice is that the knowledge of backgrounds accounts for a significant systematic effect - it is estimated in the paper as accounting for a potential shift of two to four MeV (but is that sampled from a Gaussian distribution or can there be fatter tails?). In fact, there is one nasty background that arises in the data when you have a decay of a Z to a pair of muons, and one muon gets lost for some reconstruction issue or by failing some quality criteria. The event, in that case, appears to be a genuine W boson decay: you see a muon, and the lack of a second leg causes an imbalance in transverse momentum that can be interpreted as the neutrino from W decay. This "one-legged-Z" background is of course accounted for in the analysis, but if it had been underestimated by even only a little bit, this would drive the W mass estimate up, as the Z has a mass larger than the W (so its muons are more energetic).

Connected to that note is the fact that CDF does show how their result can potentially shift significantly in the muon channel if they change the range of fitted transverse masses - something which you would indeed observe if you had underestimated the one-legged Z's in your data. This is shown in the two graphs below, where you see that the fitted result moves down by quite a few MeV if you change the upper and lower boundaries:

A fourth thing I notice is that the precision of the momentum scale determination, driven by studies of low-energy resonances (J/psi and Upsilon decays to muon pairs) is outstanding, but the graph that CDF shows to demonstrate it is a bit suspicious to my eyes - it is supposed to demonstrate a flat response as a function of inverse momentum, but to my eyes it in fact shows the opposite. Here is what I am talking about:

I took the liberty to take those data points and fit them with a different assumption - not a constant, but a linear slope, and not the full spectrum, but only up to inverse momenta of 0.3; and here is what I get:

Of course, nobody knows what the true model of the fitting function should be; but a Fisher F test would certainly prefer my slope fit to a constant fit. Yes, I have neglected the points above 0.3, but who on earth can tell me that all these points should line up in the same slope? So, what I conclude from my childish exercise is that the CDF calibration data points are not incompatible (but IMHO better compatible) with a slope of (-0.45+-0.1)*10^-3 GeV.

What that may mean, given that they take the calibration to be -1.4 from a constant fit, is to get the scale wrong by about a part in ten thousand at the momentum values of relevance for the W mass measurement in the muon channel. This is an effect of about 8 MeV, which I do not see accounted for in the list of systematics that CDF produced. [One caveat is that I have no idea whether the data points have correlated uncertainties among the uncertainty bars shown, which would invalidate my quick-and-dirty fit result.]

I could go on with other things I notice, and you clearly see we would not gain much. My assessment is that while this is a tremendously precise result, it is also tremendously ambitious. Taming systematic uncertainties down to effects of a part in ten thousand or less, for a subnuclear physics measurement, is a bit too much for my taste. What I am trying to say is that while we understand a great deal about elementary particles and their interaction with our detection apparatus, there is still a whole lot we don't fully understand, and many things we are assuming when we extract our measurements. . . .

I can also say that the CDF measurement is slamming a glove of challenge on ATLAS and CMS faces. Why, they are sitting on over 20 times as much data as CDF was able to analyze, and have detectors built with a technology that is 20 years more advanced than that of CDF - and their W mass measurements are either over two times less precise (!!, the case of ATLAS), or still missing in action (CMS)? I can't tell for sure, but I bet there are heated discussions going on at the upper floors of those experiments as we speak, because this is too big a spoonful of humble pie to take on.

He also reminds us of the fact that real world uncertainties don't have a Gaussian (i.e. "normal") distribution and instead have "fat tails" with extreme deviations from expected values being more common than expected in a Gaussian distribution.

Likewise physicist Sabine Hossenfelder tweets:

Could this mean the standard model is wrong? Yes. But more likely it's a problem with their data analysis. Fwiw, I don't think this is a case for theorists at all. Theorists will explain whatever data you throw at them.

Footnote Regarding Definitional Issues

The CDF value and all of the other values (except the global electroweak fits) are probably also all about 20 MeV too high due to a definitional issue in how the W boson mass is extracted from the experimental data. See Scott Willenbrock, "Mass and width of an unstable particle" arXiv:2203.11056 (March 21, 2022).

9 comments:

andrew said...: A systemic uncertainty of 19.7 MeV for the new CDF result and a combined uncertainty of 20.7 MeV for the new CDF result would make this is roughly 90th percentile outlier experimentally (as it should be as the outlier of 11 data points) and perhaps 3.1 sigma or so from global electroweak fit (which isn't far from the mark for a one in 11 data point fit).

Thus, if 12.8 MeV of systemic uncertainty is overlooked in the paper, you get a result that makes sense. Overlooking sources of systemic uncertainty with this combined uncertainty is very easy to do.; April 14, 2022 at 4:50 AM
andrew said...: A new post from Jester notes that the Particle Data Group does in situations like this is to basically what I did in my previous comment (i.e. to inflate error bars to a point where the results can be reconciled, but unlike me, inflating all error bars of all experiments proportionately). http://resonaances.blogspot.com/2022/04/how-large-is-w-boson-anomaly.html

"Applying this procedure to the W mass measurements, it is necessary to inflate the errors by the factor of S=2.1, which leads mW = 80.410(15) GeV."

This is a tension with the global electroweak fit, but three sigma instead of seven sigma.; April 19, 2022 at 4:41 PM
andrew said...: Jester also notes in the same post:

"The tension between CDF and the combination of the remaining mW measurements is whopping 4.1 sigma. What value of mW should we then use in the Standard Model fits and new physics analyses? Certainly not the CDF one, some 6.5 away from the Standard Model prediction, because that value does not take into account the input from other experiments. At the same time we cannot just ignore CDF. In the end we do not know for sure who is right and who is wrong here. While most physicists tacitly assume that CDF has made a mistake, it is also conceivable that the other experiments have been suffering from the confirmation bias. Finally, a naive combination of all the results is not a sensible option either. Indeed, at face value the Gaussian combination leads to mW = 80.410(7) GeV. This value is however not very meaningful from the statistical perspective: it's impossible to state, with 68 percent confidence, that the true value of the W mass is between 80.403 and 80.417 GeV. That range doesn't even overlap with either of the most precise measurements from CDF and ATLAS! . . . Due to the disagreement between the experiments, our knowledge of the true value of mW is degraded, and the combination should somehow account for that.

The question of combining information from incompatible measurements is a delicate one, residing at a boundary between statistics, psychology, and arts. Contradictory results are rare in collider physics, because of a small number of experiments and a high level of scrutiny. However, they are common in other branches of physics, just to mention the neutron lifetime or the electron g-2 as recent examples. To deal with such unpleasantness, Particle Data Group developed a totally ad hoc but very useful procedure. The idea is to penalize everyone in a democratic way, assuming that all experimental errors have been underestimated. More quantitatively, one inflates the errors of all the involved results until the χ^2 per degree of freedom in the combination is equal to 1."; April 19, 2022 at 4:44 PM
andrew said...: I added this comment to his post which I think makes more sense than treating a global electroweak fit as a "Standard Model prediction":

"Is there something to be said for incorporating a global electroweak fit that doesn't include any of the mW measurements as as one additional low uncertainty "indirect measurement" in computing a global average value, rather than using it as a set of goalposts with which to judge the other experimental measurements?"; April 19, 2022 at 4:52 PM
andrew said...: Woit says:

"There has been a lot of coverage in the press of claims by a group analyzing old CDF data to have come up with a dramatically better value for the W mass (one seven sigma away from the SM value). While this would be really wonderful if it were true, unfortunately that doesn’t seem very likely. There isn’t a well-motivated theoretical reason for this discrepancy, this is a very challenging measurement, and the new value seriously disagrees with several previous measurements at CERN. For an informed discussion of this from someone who was on CDF and has worked on these sorts of analyses, see Tommaso Dorigo’s blog post." https://www.math.columbia.edu/~woit/wordpress/?p=12775; April 25, 2022 at 12:44 PM
andrew said...: A powerpoint look at the error estimation by the latest CDF result (probably too low). https://indico.cern.ch/event/1108518/contributions/4691380/attachments/2392473/4090175/combi_160222_EWWG.pdf; April 25, 2022 at 12:48 PM
andrew said...: Commentary from 4gravitons:

https://4gravitons.com/2022/04/22/w-is-for-why/

"This result shakes my faith in that a little. Probably, the analysis team got something wrong. Possibly, all previous analyses got something wrong. Either way, a lot of very careful smart people tried to estimate their precision, got very confident…and got it wrong. . . .

If some future analysis digs down deep in precision, and finds another deviation from the Standard Model, should we trust it? What if it’s measuring something new, and we don’t have the prior experiments to compare to? . . .

Statistics are supposed to tell us whether to trust a result. Here, they’re not doing their job. And that creates the scary possibility that some anomaly shows up, some real deviation deep in the sigmas that hints at a whole new path for the field…and we just end up bickering about who screwed it up. Or the equally scary possibility that we find a seven-sigma signal of some amazing new physics, build decades of new theories on it…and it isn’t actually real.

We don’t just trust statistics. We also trust the things normal people trust. Do other teams find the same result? (I hope that they’re trying to get to this same precision here, and see what went wrong!) Does the result match other experiments? Does it make predictions, which then get tested in future experiments?"; April 25, 2022 at 4:12 PM
andrew said...: More from Matt Strassler discussing some BSM options.

https://profmattstrassler.com/2022/04/13/the-simplest-way-to-shift-the-w-boson-mass/

and his first post on the topic that I didn't notice before:

https://profmattstrassler.com/2022/04/07/the-w-boson-isnt-behaving/; April 25, 2022 at 4:16 PM
andrew said...: A new post from Sabine Hossenfelder:
http://backreaction.blogspot.com/2022/04/did-w-boson-just-break-standard-model.html

"Is it correct? I don’t know. It could be. But in all honesty, I am very skeptical that this result will hold up. More likely, they have underestimated the error and their result is actually compatible with the other measurements."; May 3, 2022 at 7:26 PM

Friday, April 8, 2022

A New W Boson Mass Measurement From CDF

9 comments:

Subscribe To