Interview: Music, AI and medicine with Daniel Kvak

Behind the idea to create an app that detects pneumonia or covid was a desire to help. And so, in 2021, Carebot was founded, a company focused on developing software using artificial intelligence methods with a focus on clinical practice. It was founded by Daniel Kvak and his wife Karolina. He has won numerous awards for his work and his extensive knowledge in the field of artificial intelligence means that he is in charge of the entire technology side of the company. In addition to running Carebot, Daniel Kvak is also studying his PhD at the Faculty of Medicine and Arts at Masaryk University. His research interests include computational creativity, machine learning, generative modelling, as well as electronic music and music composition. What does music have to do with medicine? Read the interview in which Daniel Kvak talks about recurrent neural networks and their use in the field of music science.

13 Jun 2025 Natálie Čornyjová Kateřina Hendrychová

No description

You are pursuing your doctoral studies at two faculties of MU, Arts and Medicine. How do you manage to combine music and medicine? Is artificial intelligence the intersection?

Artificial intelligence has fascinated me since I was a child. At first, it was sci-fi comics and TV shows, of course, but the vision of a technological helper assisting us in our daily tasks completely absorbed me. From the beginning, I directed my studies in the Faculty of Arts towards what fed me: production music. It wasn't until the COVID-19 pandemic that my wife and I considered taking AI in a medical direction. Technically, there's not much difference between recognizing art directions in paintings and recognizing pathologies on X-rays. But neither are there significant differences across the domain, for example, in accountability or the impact of technology on individuals and society. When we look at deepfake recordings, the effect on the public good is not dissimilar to the situation in healthcare.

In your master's thesis, you focused on modeling musical transcription using deep learning. What led you to use artificial intelligence in music?

I've spent many years creating background tracks for commercials, films, and sound banks. I always knew it was not a great art, so I was interested in using AI to generate musical compositions. Fifteen-second snippets of these background tracks are often not even noticed by the listener, but their presence is subliminally perceived. If they were missing, they would know it immediately. My undergraduate thesis focused on the Spotify platform, which, a decade ago, was already using generative AI to create simple "lift" tracks. But it was clear that this was the absolute beginning in the industry.

How did you go about creating the autonomous generative model? What challenges did you encounter during the development process?

When I started in the generative AI segment, the most popular application was to generate Shakespearean texts using recurrent neural networks (RNNs). We are talking about the same approach that helps us today: ChatGPT, which is used daily with almost every conceivable task. When did the rapid change come that allowed us to turn amateur projects for generating music composition or poetry into something that completely changed the world? While RNNs have been with us in their current form since sometime in 2007, the biggest problem of generative AI has always been in long-term dependencies, or, put simply, making neural networks sustain attention over more extended periods. Anyone interested in natural language processing or time series modeling has faced similar difficulties. A significant shift came in 2017, when a proposal for an attention mechanism (the "attention" technique) was introduced that largely solved the problem.

What opportunities do deep neural networks offer today in music science?

MIR (music information retrieval) is a broad field that includes, among other things, recommender systems that offer us similar songs in the case of Spotify, systems for automating the composition and mixing process, but also, for example, tracking systems for distribution companies or copyright societies. The possibilities for the use of AI today are extensive. As in the case of the relatively recent success of image generators (Dall-E, Midjourney), generators are now emerging in music composition that can produce exceptionally high-quality compositions based on text input.

Are recurrent neural networks suitable for generating music, or are other models replacing them, and why?

Today's neural networks using the attention mechanism are not significantly different in their logic from the original, say, simple recurrent neural networks. The prediction of models must be primarily contextual, which has received the most attention in recent years. In the case of musical composition, however, specific rules come into play: some are genre-specific, and the violation of others results in cacophony. In contrast, others are actively violated in the context of improvisation and creativity. There is no consensus in musical composition on what approaches should be universally applied. We know from relatively recent history examples of cellular automata that generate compositions by combining simple patterns into abstractly complex probabilities, as well as the use of generative adversarial networks that have significantly impacted image generation. But a much more fruitful question is research into how we should evaluate the outputs of such models. The topic of automatic music generation (musical metacreation), defined by Philippe Pasquier et al. in 2017, is relatively peripheral and is outweighed by the attention that text and image generators receive.

What role does human interaction and feedback play in developing and tuning recurrent neural networks aimed at music production?

A big one. Most projects that have boasted compelling artificially generated music in recent years have worked with simple symbolic transcription of songs, i.e., only the basic "skeleton" was generated. At the same time, an experienced team of musicians had already taken care of the rest. The question is whether this is wrong or if we have inappropriately set expectations. This is where natural language processing (NLP) helps us to understand better, where experience with machine translation and now generative models is much broader. If I translate a text using DeepL, how often do I edit it? What if it is a technical text? If I generate text through ChatGPT, how domain-specific is it? Would you like me to include it in the generated form without changes in my thesis? If I interfere with the text, does that mean the model I'm using "doesn't work," or is it because my preferences are set differently? These are the questions we need to ask ourselves.

You founded Carebot as an arts student interested in modeling music transcription using deep learning. How did you get out of the music field and into the healthcare field?

During the COVID-19 pandemic, my wife Caroline and I had a vision to help healthcare professionals through the barrage of examinations they faced. Our original idea was to evaluate various image data, but the situation at the time indicated where the potential for AI might be most significant. While the pandemic gradually faded into obscurity, it became clear that the problem we wanted to solve was systemic. Having previously been more involved in music and text processing, I found the transition to computer vision challenging. Time has shown that the issue of artificial intelligence in medicine is very complex; there is a bit of the saying, "we do these things not because they are easy, but because we thought they would be easy." After three years of experience, we managed to get approval from the European regulator, and today we are proud to be one of the few in Europe to have this approval.

How does Carebot use artificial intelligence?

We work primarily with pattern recognition in image data. We have a team of more than 80 radiologists from all over Europe who work with us to help annotate training data or participate in validation. Transparency is key for us, not just about what models we use or how many training images we have, but more importantly, how clearly and verifiably we can demonstrate the actual clinical benefit of these models in independent tests.

What are your future plans for the app (and the company)?

We are expanding our focus further into mammography screening and bone X-rays to detect fractures and bone lesions. Thanks to regulatory approval, we are also venturing into international markets with our system for detecting abnormalities on chest x-rays. Above all, our vision is to ensure an even quality of care across regions, whether a large teaching hospital or a small hospital in a rural area.

X-ray image: Carebot

More articles

All articles

You are running an old browser version. We recommend updating your browser to its latest version.

More info