Wednesday, October 27, 2021

Tag Rather Than Boxes: A Good Widely Applicable Big Data Methodology

The concept of using tags rather than boxes is a broadly applicable and valuable way of categorizing data that has been rarely used historically, but should be used widely. With tags, a data point can be put in multiple categories each of which constitutes a set that may overlap with other categories to a greater or lesser degree.

The concept has been popularized by blogging software and social media platforms, so what used to be a novel and revolutionary idea that would be hard to explain to the average person is now familiar to almost everyone who regularly reads blogs or uses social media to even a modest degree.

There is a place for fitting data into boxes when the data genuinely have that inherent structure, but scientists should refrain from trying to judge the data early on with incomplete analysis to set those boxes before confirming that the data really fit into boxes, by letting the data establish the scope of those boxes by forming natural clusters.

This approach is also an excellent way to allow scientists with very different big picture approaches to a field to collaborate with each other using common data sets, without creating unresolvable standoffs over issues that only more data can resolve, and without impairing the usefulness of the data for each collaborator's purposes.

This paper suggests this method for radio astronomy galaxy data sets, but the potential is much greater than that in a variety of fields.
After six decades of studying radio galaxies, we are now being delightfully overwhelmed by their exponentially expanding numbers, and the complexity of their structures. 
Similarly, the ways we classify radio galaxies have exploded, often leading to conflicting terminology, ambiguous classifications, and historical schemes that may or may not match with our current physical understanding. 
After discussions with more than 100 radio astronomers over the last several years, listening to their ideas and aspirations, I propose that we reconceptualize the classification of radio galaxies. Instead of trying to put them into "boxes", we should assign them #tags, a system that is easy to understand and apply, flexible and evolving, and can accommodate conflicting ideas about what is relevant and important. Here, I outline the basis of such a #tag system; the rest is up to the community.
Lawrence Rudnick, "Radio Galaxy Classification: #Tags, not Boxes" arXiv:2110.13733 (October 26, 2021) (accepted for publication in special issue of Galaxies, the conference proceedings from Radio Galaxies in the Cosmic Web, March, 2021).

3 comments:

neo said...

MicroBooNE experiment’s first results show no hint of a sterile neutrino

so majorna?

andrew said...

"MicroBooNE experiment’s first results show no hint of a sterile neutrino"

It appears so, I've only skimmed the papers from it released today but didn't seen any claims of such hints at any meaningful significance.

"so majorna?"

No. Or, more accurately, MicroBooNE's results don't tell us anything about that.

andrew said...

Commentary (more optimistic about new physics than I am): https://twitter.com/joachimkopp/status/1453618106950160385