Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about how stereo samples were made #4

Open
arseniiv opened this issue Oct 10, 2023 · 4 comments
Open

Question about how stereo samples were made #4

arseniiv opened this issue Oct 10, 2023 · 4 comments

Comments

@arseniiv
Copy link

In the readme file, you write:

The stereo effect is a mild pitch-shift doubling to create a stereo image, applied in mid-side effect so that it cancels out when summed to mono.

How had it been done, precisely? I first thought about chorus, but chorus doesn’t usually sum to the original mono (I may misremember); also I thought about pitch-shifting side channel but there is zero in side channel for mono signal. Then if even one adds a pitch-shifted size channel, it’d be a good idea to pitch-shift mid channel the same in the other direction (to balance perceived pitch)—and then again it won’t sum to the original mono. At first I thought I understood at least an idea—now I think I have no clue at all.

I want to try to recreate this effect as live processing and save a bit of space on stereo samples. 😄 And just to experiment. (Though in my case I think mono summing to original isn’t essential at all, so I’ll probably just compare how chorus behaves.)

@jlearman
Copy link
Collaborator

jlearman commented Oct 20, 2023

I used different tools at different times. My hope was to recreate the effect produced by an MXR Pitch Shift Doubler, which I used back in the 80's. I didn't quite achieve it, because delay-based pitch shifting (the MXR) ends up sounding quite a bit different from FFT-based pitch-shifting (modern digital processing.)

Regardless of the specific tools, the approach is to use the original (dry) sample as the Mid channel and a slightly pitch-shifted version as the Side channel, and feed that into a mid-side decoder. (Or, more simply, put the original signal in both sides, add the shifted to left and subtract it from right.)

Of course, that process causes issues for low frequencies, causing dead-zones and high peaks. So, I applied a high-pass filter to the effect channel. I don't remember the cutoff frequency but I'd guess around 100-200 Hz.

Reasons I prefer having the effect baked into the samples rather than added as an effect:

  • When using chorus, if it's enough effect for low notes to sound good, high notes sound terrible. Baking in the effect allows tailoring the effect to the note. However, I didn't need to do this for this particular effect.
  • Pitch shifting in real-time just doesn't sound as good as when done off-line. Now, this contradicts the whole idea of using samples as non-root-note samples, so I must be wrong about this somehow. Further research required. But so far I haven't had success getting this effect on-the-fly.
  • It allows me to adjust the stereo width in real time (well, that's just a function of using mid-side, where mid is dry and FX is side.)

So, my reasons seemed good at the time but I suspect there's a better way, and someday maybe I'll find it.

Regardless, these days, memory and drive space is cheap, so it hardly matters -- reducing the file size by 50% just doesn't matter for something that's well under 100MB already.

Don't underestimate the value of FX that work well when summed to mono. Well, it depends on the reason for the FX. In this case, the intent wasn't to change timbre, but to just create a spatial image. (As it worked out, it does do a bit of timbre change, but I'd prefer that it didn't.) Regardless, if the reason for the effect is purely for stereo image, and you don't want the squishy/squashy effect you get from chorus, then mid-side is particularly important, as whenever the result gets summed to mono, the side channel cancels out. While you may think we don't use mono much, it happens in reality all the time, such as when you happen to hear the music in another room, through a door.

BTW, mid-side and clean summing to mono is a nice trick, but not without potential compromises. With normal stereo FX (that don't sum well to mono) you double check them by occasionally monitoring in mono. But when you're using mid-side (whether it's mid-side miking, or mid side artificial effects), you check by listening to just one side. Either case can have unpleasant comb filtering, you just have to check differently. So, it's not a magic bullet.

@jlearman
Copy link
Collaborator

If you want specifics, I can find the SoX code I use to process the samples.

@arseniiv
Copy link
Author

Thanks much!

If you want specifics, I can find the SoX code I use to process the samples.

Why not, in case it won’t be much work for you. I hope it’s not bothering much, because I don’t know if it’ll end up of use but I’m curious a bit.

@jlearman
Copy link
Collaborator

Sorry for the delay. It might be tricky to follow this since it has my workflow and directory arrangements baked in.

CENTS=3                 # cents pitch shift for stereoized sampes

            for ORIG in $OLDSFDIR/$FMT/*.$FMT ; do
                BASE=`basename -s .$FMT $ORIG`

                # To create the stereo effect, pitch-shift and
                # apply as mid-side (add shifted to left, subtract
                # from right.)  Also, to avoid clipping and moving
                # the apparent image left, highpass-filter.  The
                # highs give the image cues.

                SHIFTED=$NEWSFDIR/$FMT/$BASE$SHIFTED_SUFFIX.$FMT
                echo "+ sox $ORIG $SHIFTED pitch $CENTS highpass 800"
                        sox $ORIG $SHIFTED pitch $CENTS highpass 800

                MID=$ORIG
                MS=$NEWSFDIR/$FMT/$BASE-MS.$FMT
                LR=$NEWSFDIR/$FMT/$BASE-ST.$FMT

                # merge to temporary mid-side stereo

                echo "+ sox -M $MID $SHIFTED $MS"
                        sox -M $MID $SHIFTED $MS
            
                # convert to LR, attenuate original M by 3dB, and result by 6dB
                echo "+ sox $MS $LR gain -6 remix 1,2p-3 1,2i-3"
                        sox $MS $LR gain -6 remix 1,2p-3 1,2i-3
                rm $SHIFTED
                rm $MS
            done

FMT is either wav or flac.
I think it'd work better with a higher cutoff than 800 Hz. The lower the Hz, the more the sound seems to be shifted to one side, and also the more likely to get clipping. I should have fiddled with that more at the time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants