It's even more complicated than that when speaking and hearing between two people with vastly different base languages when one is trying to learn the language of the other.
The problem is our brain is a powerful tool and it learns phonemes. It operates in some ways (when recognizing speech) as soundex software does.
So if you say something in your native language to me (and mine is very different) and I say it back to you, often you will say 'No,no,no! That's not what I said!'. Then there will be cycles of me repeating it to me, me hearing it, and me saying it back, and you telling me I still haven't got it.
Why? Because not all languages have the same set of phonemes. That matters because your brain, when it hears a word, will try to decode it using the *existing* set of phonemes it has. It will give you the best version it can, but it can be missing phonemes or accents or stresses, etc. because it is not familiar with them.
So you say X to me.
I hear X in a way that is governed by my normal language.
I then try to repeat X as I heard it back to you.
You hear my X and it sounds wrong to you.
What you said and what I heard, if we could record both the audible word and what your brain made it sound like to you via audio processing, would not be the same.
In order to get past this, you basically have to focus very hard at hearing the sounds as sounds and disengage your mental audio post-processing that is mucking up your 'heard' word. If you can get to just picking up the audible sound, as alien as it might be, you can then try to pronounce it correctly back.
Of course, you'll still fail. Because you don't likely have the lip/tongue/breathe etc. combo to get it out. But you are a bit closer.
Then you learn to form the particular missed sound and say it enough time, your ear hears it enough time, and you get told to improve it enough times... and then you now have that new phoneme with accent/stress/breathing/etc as part of your raw audio postprocessing.
This isn't an easy process if the two languages are far apart in range of phonemes. You can miss many bits of the communication unfortunately.
I've tried to get a sense of some languages that use tonality for key information and it kills me. I just have never developed the sense of tonality that most asian languages used and frankly, a lot of the pitches and tonalities make me actively have that 'nail on chalkboard' response. I find them grating or annoying. It's nothing other than my brain's reaction to the unfamiliarity and the pitches. They come off as angry, sharp, or hectoring... not because they are, but because of pace, cadence, and tone. In English, someone going really fast, in sharp pitches, and with sounds like a cat fight come across as hostile, angry, bellicose... and that's the way my ears hear it. I know that is totally the result of having grown up in a slower, less tonality dependent language. It's just not easy to decouple that response (as compared to trying to learn Broad Scots, Latin, French, Spanish, or Russian) for me.