Transition to YOLOv8, always improving the algorithm, collecting more labels…

3 minute read

Published:

Bonjour, everyone! Time feels like it’s slipping away faster than ever these days. With paternity leave in full swing, finding work hours here and there has been a challenge (and honestly, it’s hard not to get distracted by Baby Finn’s cuteness). Despite the chaos, I’ve managed to make a major leap forward in the project: transitioning from Detectron2 Mask R-CNN to YOLOv8. Let me tell you—it’s been a game-changer.

From the start, YOLOv8 results were much crisper and more reliable than what I was achieving with Mask R-CNN. One of the issues I faced with Mask R-CNN was the finicky tuning of anchor sizes, which felt like a never-ending battle. I experimented endlessly, defining anchors in different ways, but the results always plateaued. In contrast, YOLOv8 just worked. The transition itself wasn’t too bad, though I had to tweak the input formats for images and labels since the two frameworks handle masks differently. Once that hurdle was cleared, training a YOLOv8 model on the dataset published in my previous article was a breeze.

Expanding the Dataset with Fresh Lunar Impact Craters

Focusing on the freshest impact craters on the Moon, I used YOLOv8 to run predictions around these sites and quickly added new labels based on the output. Of course, this meant revisiting my edge-artifact problem from earlier (again!). This time, I found an excellent GitHub repository—SAHI (https://github.com/obss/sahi) —and with a few tweaks, I implemented a much better method for handling predictions over large images. Problem solved!

With this setup, I could efficiently map boulders around fresh lunar impact craters, review the predictions, clean up false positives, and add any missing boulders. After a few months of refining this process, I now have 1,988,087 boulders mapped across 82 young and fresh lunar impact structures. Let that sink in—nearly 2 million boulders! To put that into perspective, manually mapping this dataset would have taken 5–10 years at best. This level of automation is truly a luxury, and it’s exciting to push the boundaries of what’s possible in planetary science.

While there are other great groups out there, like the team in Freiburg, Germany (credit where it’s due—they’re doing fantastic work too), I’m proud of how far we’ve come. This dataset is groundbreaking in its scale, and I’ve already re-trained the YOLOv8 model with these new labels to make it even better for future use by others.

The Next Challenge: Finding Patterns

The difficult part begins now: extracting meaningful patterns from this vast database. I’m investigating how target properties (like surface composition or crater size) influence boulder ejection patterns. It’s a challenging task to untangle these relationships, and progress has been slow. I’m currently drafting the manuscript and working on figures, but juggling parenthood and research has proven more complicated than I’d anticipated. Efficiency has never been more crucial!

Wrapping Things Up

The end of the project is looming—just around the corner, in fact. Thanks to an extension for my paternity leave, I’ll have until January 2025 to wrap everything up. These final months are going to be hectic, but I’m determined to finish strong.

That’s it for now—back to the manuscript (or diaper duty… it’s a toss-up). Take care, everyone! Cheerios! 😊