True, the first one took hundreds of screens and thousands of shifts. A little bit annoying to test too.
I guess a little programming would be in place, if using more combinations than this (thousand of screens and tens of thousands of shifts). It is easy to end up with too many possible combinations where even a million screens wouldn't suffice.
But.. You only get three shifts.
Per level you have maximally three shifts, yes. But by using many screens, three shifts are in fact more than enough. Two shifts would be sufficient. The key is to use shifts to steer into the right goal-screen in the end. If we have a keyboard with 9 keys for example, they can be divided into three groups with three keys in each group, represented by shift A, B, and C. Juni plays key 2, which activates shift A, and shifts to a new screen that is a copy of the first. In the second screen, key number 1 has shift A, key number 2 has shift B, and key number 3 has shift C. Shift B will therefore be activated, and this leads to a new screen for key 2, playing this sound. In this way, any special activation can be made, turning the behavior to a special screen. If there are more than nine choices, more intervening screens have to be used though.
So, you can use shifts as if there were as many different shifts as you need. The drawback is that it takes a little time between each shift activation, so Juni can appear a little bit jumpy. I hope this explained hove to simulate 'more than three shifts in one screen'.