StrumStart: A Dataset of 1-on-1 Guitar Lessons for Speech-Music Co-reasoning

Abstract

Recent attempts to integrate speech, music, and audio processing into Large Audio Language Models (LALMs) have relied on the combination of large datasets which were typically designed or collected for a particular audio sub-domain. In these datasets, it is often difficult to find data that interacts with multiple domains simultaneously. To help introduce datasets that specifically target co-reasoning abilities across speech, music, and audio, we focus on collaborative-music production as a natural speech-music co-reasoning environment and introduce StrumStart, a collection of ~3.5 hours of 1-on-1 guitar lessons and a preliminary set of question-answering pairs. Furthermore, we evaluate how two Large Audio Language Models (LALMs), LTU and LTU-AS handle speech-music co-reasoning questions derived from the StrumStart dataset, demonstrating that model performance decreases on questions that specifically target Speech-Music Co-reasoning. We end with a discussion of future plans to extend StrumStart as a training dataset through synthetic data generation and additional data collection and labeling.

Samples

Below we provide examples from the StrumStart Dataset. Along with the audio clip and Whisper-generated transcript, we also provide the question and answer that were created for the given audio. Our questions were split into two categories: Speech-Only Reasoning and Speech-Music Co-reasoning. Examples are provided for both categories.

Examples of Speech-Only Reasoning Data Points

Audio Clip Transcript Question Correct Answer
And so just go ahead and hit the sixth string first. This will also be a good review. That is the sixth string, yeah. That's good, that one's tuned. Try hitting it a little bit harder. There you go. Based on the dialogue, is the sixth string tuned? Yes, it is correctly tuned.
So we have F on the first fret of the sixth string, G, A, and B. And then on the fifth string, starting on the third fret, we have C, D, and E on the seventh. What is being played on the guitar based on the dialogue? The teacher is playing F, G, A, and B on the sixth string, and then C, D, E on the fifth string.
We're back to the C, D, E. Where is the C on the third string? Oh, it's on the fourth. The fifth. Or the fifth. That's close. Yeah, B is on the fourth. B is on the fourth, yeah, okay. That is correct. Does the student answer the question correctly? No, the student does not answer correctly.
So you notice how this sounds different than this. Because the thing that this third string in the E minor is the minor third. So without it, you lose the minor sound. So it's really important with E minor to get that third string to ring out because otherwise you don't have E minor. You just have a power chord. What string is important for making sure an E minor chord is played correctly? The third string is important for the E minor chord.
And so my question for you is. My question for you is. Why are we using this B7 chord? There's two different answers that we can get rid of. Why are we using this B7 chord here? It like, it sounds like musically nicer with the other one in comparison to using another chord. That's one, yeah. Because when we say musically nicer, it's more about relieving tension. Based on the dialogue, why is the B7 chord used? The B7 chord sounds musically nicer, meaning it relieves tension.
Yep. You got to make sure that your left hand is off of the strings because otherwise it will be a muted sound. What is the teacher's response to the guitar in the beginning? The teacher says that the student should take off the left hand to avoid a muted sound
My fingers are colliding now. Versus if we bring it out. If you also notice your middle and ring are on the same finger, or on the same string. There you go. That's good. Based on the teacher's feedback, does the student play correctly? Yes, in the end, the student plays correctly.
The only reason why I want to like be insistent on learning the fretboard early is that it's just better to know it. Because you don't need to actually like just stick to learning the basic chords, especially since you're familiar with stuff. Based on the dialogue, why is the teacher insistent on the student learning about the fretboard. It can allow the student to go past basic chords.
So where's your thumb right now? It's like, like, just like behind, like the seventh fret. Cool. For me, it's like in between, I always forget what it's called, the knuckle or whatever. For me, it's in between the sixth and the seventh fret with my hand. What is the student's answer to the teacher's question? The student answers that their thumb is behind the seventh fret.
Okay, let's go ahead and tune it. Cool, that's easy enough. Why is the guitar being played based on the dialogue? The guitar is being played to make sure it is tuned.

Examples of Speech-Music Co-reasoning Data Points

Audio Clip Transcript Question Correct Answer
This is an F sharp diminished chord. And the notes that we have in this, the reason why I'm pausing is because we have an F sharp, which, if we were just following along with the exact progression that I was just doing, doesn't make any sense. Is the guitar playing at the end directly related to the dialogue? No, it is not related to the dialogue.
Let's go ahead and just, what I want you to do is I just want you to move back and forth between E minor and A minor. Three, four, one, two, three, four, one, two, three, four, one, two. Why is the teacher speaking while the student is playing guitar? The teacher is trying to help the student keep the time and rhythm.
So let's try making all those ring out. You're not going to want to hit the sixth string because E is not in this chord. I think you're doing the thing right where you're hitting that string below it. Why does the teacher speak while the student is playing? The student is hitting a string below which is incorrect.
So that's C. Remember, when you play the note, you want to be right up against the fretboard, so the fret so it rings out. What note is that gonna be? This one starts at C, and then this one is gonna be D. D? Yep. E? Yep. Does the student play every note correctly? Yes, the student plays all three notes correctly.
Let's try it one more time with just doing fifth to eighth. Based on the dialogue, who is playing the guitar first? he teacher plays the guitar first and the student plays the same thing second.
Better Does the student improve? Yes, the student improves.
And the reason we do that is so that we don't touch any of the other strings. Okay, hold on. It's still not ringing out really well. Okay, now it's ringing out. So I think it must be something with my middle finger. Is the student playing correctly consistently? No, the student plays incorrectly and then correctly.
a high E, what is the shape, what is the mnemonic that we're going to want to use? I guess it's not a mnemonic, but what's the series of notes that we want to use? Is it like F, G, E, B? Yep. So where is the F on the high E? Also on the first fret? That is correct. So then it's F on the first, G on the third, A on the fifth, and then B on the seventh. Is the student or teacher playing the notes at the end? The student is playing at the end.
So you skipped your middle finger. Oh yeah. Does your thumb move at all? It moves like a little bit like, like, like that, but it doesn't move like other directions. You ever take it off the back of the board? Not, not that time, no. Why does the teacher interupt the student's playing in the beginning? The student skipped their middle finger.
The simple, I believe it was 1, 4, 6, 5 progression, right? So to just go over that again, that was E minor, A minor, and then because we're going to 6, the 6 in E minor is? Um... Is it B? 6 is a major, yeah. 6 is, 6 is a G major. What is the teacher playing during the dialogue? The teacher is playing each of the chords that are mentioned.