EXPLORING THE SONIC UNIVERSE: WHEN FM SYNTHESIS MEETS DATA SCIENCE
For years, we listened to Sega Genesis music as memories. Songs. Levels. Moments.
But what if, instead of hearing them as isolated tracks, we explored them as a complete landscape? What if we treated presets not as sounds, but as data that tells stories?
DAFMExplorer was born from that question: the idea that FM synthesis and data science can coexist—not to explain the past, but to open new creative doors. This project transforms 93,000+ presets from the Sega Genesis era into an interactive universe where every sound is a point on a map, every cluster a territory to discover, and every parameter a thread in a larger narrative. Suitable for Desktop or Tablet (landscape). Do not use with the mobile phone.
Before jumping in, I wanted to make sure that you know that DAFMExplorer’s notebooks, methodology, and web app are completely open source. You can find everything on GitHub—the Jupyter notebooks we’ll discuss, the interactive web application you just saw above, and the complete dataset. This project exists thanks to the community and is for the community. We invite everyone to try it, work with it, modify it, and make it your own.
Why We Built This
I often get questions like «How did you start a project like this?». For DAFMExplorer, it wasn’t nostalgia. It was curiosity.
We wanted to understand how these sounds worked not as individual presets, but as a system. What patterns emerged when you looked at thousands of them together? Could you hear the invisible connections between a bass from Streets of Rage and a lead from Sonic? Was there something in the parameters themselves that told a story about the composers, the tools they used, the regions they came from?
The YM2612/YM3438 chip—the heart of the Sega Genesis—wasn’t just a sound generator. It was a creative instrument that shaped an entire generation of game music. Every preset is a snapshot of a composer’s choices: which algorithm to use, how much feedback, what attack rate, what detune. These aren’t random numbers. They’re creative decisions frozen in time.
But to see those patterns, you need to step back. You need to look at the forest, not just the trees. That’s where data science comes in—not as a replacement for listening, but as a new way of hearing.
From Presets to Questions
The journey begins with extraction. Not just parsing files, but making conscious decisions about what to keep and what to leave behind.
The data came from VGM files—recordings of every command sent to the YM2612/YM3438 chip during gameplay. Think of them as digital spies that captured every register write, every parameter change, every moment of sound. From these recordings, we extracted OPM files—presets that could be played back, analyzed, and understood.
But raw data is messy. Duplicates exist because volume changes create new presets. Game names are inconsistent. Composer information is scattered across websites, forums, and archives. The first notebook—Data Extraction—isn’t just about scraping and cleaning. It’s about making choices: what counts as a duplicate? How do we normalize game names? What do we do when information is missing?
These aren’t technical questions. They’re philosophical ones. Every decision shapes what the data can tell us. Leave too much noise, and patterns disappear. Clean too aggressively, and you lose the texture that makes the data interesting.
We enriched the dataset with metadata scraped from Sega Retro, VGMrips, and Wikipedia. Composer names, nationalities, which games used GEMS (the sound driver that shaped so many Genesis soundtracks). Each piece of context transforms a number into a story.
The result? A dataset that’s not just clean, but meaningful. 93,000+ presets, each with 58 parameters, each connected to a game, a composer, a moment in gaming history.
Listening With Data
The second notebook—Data Analysis—is where the magic happens. But it’s not magic. It’s mathematics applied to sound.
Before diving into the details, it’s useful to have a basic understanding of what we’re doing. PCA reveals which parameters matter most. Dimensionality reduction collapses 58 dimensions into 2, creating a map where similar sounds cluster together. t-SNE and UMAP create different landscapes—one emphasizes local neighborhoods, the other preserves global structure. Both are valid. Both reveal different truths.
Clustering with KMeans finds the natural groups. Seven clusters emerged—not because we forced them, but because the data wanted to organize itself that way. We named them: Raw Signals, Neon Action, Polished Arcade, High-Speed Chiptune, Deep Space FM, Fantasy Atmospheres, Experimental Playgrounds.
These aren’t rigid categories. They’re territories. Places to get lost. Starting points for new musical ideas.
Embeddings create a space where distance means similarity. Click on a preset, and the system finds its neighbors—sounds that share something in their parameter space. Sometimes the connections are obvious. Sometimes they’re surprising. A bass from one game might be closer to a lead from another than to other basses. That’s not a bug. It’s a feature. It’s the data revealing relationships we couldn’t hear with our ears alone.
Feature engineering transforms raw parameters into musical concepts. Brightness Index measures how much high-frequency content a preset has. Complexity Score captures how many parameters are doing interesting things. These aren’t just numbers. They’re new ways of describing sound.
The 7 Sonic Realms
The clusters aren’t categories. They’re creative territories.
Raw Signals (Cluster 0) – The foundation. Simple waveforms, minimal processing. The building blocks that everything else is made from.
Neon Action (Cluster 1) – Bright, punchy, energetic. The sound of arcade action games. Fast attack rates, high feedback, algorithms that create immediate impact.
Polished Arcade (Cluster 2) – Refined and balanced. The sound of games that had time to perfect their audio. Careful parameter choices, sophisticated algorithms.
High-Speed Chiptune (Cluster 3) – Fast, rhythmic, driving. The sound of games that move. High multipliers, quick envelopes, sounds that cut through the mix.
Deep Space FM (Cluster 4) – Atmospheric and evolving. Slow attacks, long decays, sounds that breathe and change over time. The soundtracks that create mood.
Fantasy Atmospheres (Cluster 5) – Ethereal and otherworldly. Complex algorithms, unusual parameter combinations. The sounds that transport you to another world.
Experimental Playgrounds (Cluster 6) – The outliers. The presets that don’t fit anywhere else. The creative experiments, the happy accidents, the sounds that break the rules.
Each realm is a starting point. A place to explore. A palette of sounds that share something in common, but each with its own character. The web app lets you navigate these realms visually—click on a point, hear the sound, see its neighbors, understand its place in the larger landscape.
An Invitation to Explore
This project is open source. The notebooks are on GitHub. The web app is live. The data is available.
But more importantly, the methodology is transparent. Every decision is documented. Every assumption is visible. You can see how we extracted the data, how we cleaned it, how we analyzed it. You can change the parameters. Try different clustering algorithms. Create your own visualizations. Connect other datasets.
The notebooks are educational. They teach FM synthesis through data science, and data science through FM synthesis. They show how to scrape websites, clean data, apply machine learning, create visualizations. But they also show how to ask questions, how to make decisions, how to balance technical rigor with creative curiosity.
The web app is interactive. It’s not just a visualization—it’s a synthesizer. You can play the presets, hear them in real-time, understand them as sounds, not just data points. The 6 memory slots let you build your own bank. Download it as a DMP file. If you happen to have one of our DAFM SYNTH GENESIS modules (which feature the YM2612/YM3438 chip), you can load these DMP files directly via SD card and use them right away—making DAFMExplorer an ideal companion for exploring and discovering new sounds on hardware.
But the real invitation is to go deeper. To ask your own questions. To find your own patterns. To discover connections we missed. To use this as a starting point, not an endpoint.
Wrapping Up
So that’s most of the important bits of DAFMExplorer. I hope that this has been helpful for you.
DAFMExplorer is a point of departure, not a destination. The Sega Genesis sound library is a universe. 93,000+ presets is just one slice of it. There are other chips, other consoles, other eras. There are other ways to analyze, other questions to ask, other stories to tell. We’re already working on future updates that will bring this same data science approach to ARCADE YM2151 presets and BLASTER YMF262 (OPL3) presets—expanding the exploration to even more corners of the FM synthesis universe.
Data science is a creative tool. It’s not about replacing intuition with algorithms. It’s about augmenting our ability to see patterns, to understand relationships, to discover connections we couldn’t perceive otherwise. It’s about using mathematics to listen more deeply.
The YM2612/YM3438 chip is still alive. Not just in emulators and preservation projects, but in the sounds it created. Every preset is a snapshot of that chip’s creative potential. Every analysis is a new way of understanding that potential.
This project exists to open doors. To show that technical analysis and creative exploration aren’t opposites—they’re partners. To demonstrate that data can be beautiful, that mathematics can be musical, that science can be art.
The universe of FM synthesis is vast. We’ve mapped a small corner of it. The rest is waiting to be explored.
Once again, if you read this and wanted to know more about something or felt something was unclear, please reach out! You can find the full notebooks, schematics, and source code on GitHub.
DAFMExplorer is an open-source project by Kasser Synths. Explore the notebooks, use the web app, contribute to the codebase, or simply get lost in the sounds. The universe is yours to discover.
Take your time to discover the sonic universe of the SEGA Genesis / Megadrive
Este sitio web utiliza cookies propias y de terceros para el correcto funcionamiento, análisis de la navegación y visualización por parte del usuario. Si continúa navegando, entenderemos que acepta su uso.
Se recomienda al usuario que lea atentamente esta Política de Cookies para informarse sobre el uso responsable que la página web hace de ellas y sobre las opciones que el usuario tiene para configurar su navegador y gestionarlas. Leer política de cookies
Cookies necesarias
Las cookies necesarias son absolutamente esenciales para que el sitio web funcione correctamente. Esta categoría solo incluye cookies que garantizan funcionalidades básicas y características de seguridad del sitio web. Estas cookies no almacenan ninguna información personal.
Las cookies necesarias que se utilizan en esta web son:
_PHPSESSID: Esta Cookie es usada por el lenguaje de encriptado PHP para permitir que las variables de sesión sean guardadas en el servidor web. Esta cookie es esencial para el funcionamiento del sitio web.
cmoove_gdpr_popup: Cookie técnica y necesaria que contiene el valor de si se ha aceptado la política de cookies.
mfn-builder: Esta cookie técnica y necesaria es usada por el tema de la web para cargar los contenidos de manera estructurada.
Si desactivas estas cookies no podremos guardar tus preferencias. Esto significa que cada vez que visites esta web tendrás que activar o desactivar las cookies de nuevo.
Cookies no necesarias
Cualquier cookie que no sea particularmente necesaria para que el sitio web funcione y se use específicamente para recopilar datos personales del usuario a través de análisis, anuncios y otros contenidos integrados se denominan cookies no necesarias. Es obligatorio obtener el consentimiento del usuario antes de ejecutar estas cookies en su sitio web.
Las cookies no necesarias que se utilizan en esta web son:
_ga: se utiliza para distinguir los usuarios que acceden al sitio web.
_ga_*: se utiliza para generar datos estadísticos acerca de cómo utiliza el usuario el sitio web.
¡Por favor, activa primero las cookies estrictamente necesarias para que podamos guardar tus preferencias!