Of course, a good mix starts with a good recording. Buzzes, clicks, mouth noise, pronunciation problems, phrasing, irritating transients can all spoil the core of your song. For me in the home studio it’s just a question of starting over again, but for you it might mean booking more studio hours. Always hoping to get the same setup as before and the same flow.
After many years of working in the studio I think I have all the tools ready to fix all of the problems I described above and more. I had one article before about Fixing phase problems in a mix. Another time I got to fix the audio of a precious video recording that was completely blown out and clipped. After having fixed maybe thousands of instrument and vocal recordings, I can now truly say:
The start of this year is already well on it’s way and I wanted to start it right with an upgrade to the studio. As you know I am into making music, but also video content that goes with the music. In the end video clips, but I like to think more about it as “visuals for the music”. A way to tell the story of the music again, but different. Working with 4K content is quite normal for me now, even though the end result might simply be an HD 1920×1080 YouTube video, or even a 1080×1080 Instagram post. In the end 4K can really make the difference and will also affect the quality of your lower resolution end result.
A 4K display has now become a no-brainer. I invested in a 32 inch ergonomic screen with good, but not high-end, color reproduction. The LG Ergo 32UN88A also fitted nicely on my desk. Immediately after connecting the screen to both my studio PC and a Thunderbolt laptop dock the problems started. Blackouts. Every minute or so the screen would just blackout on both devices. Both should be able to drive a 4K screen, but nonetheless it seemed to fail. Maybe you immediately know what happened, but I was stuped.
My fault was that I was just too new to 4K upgrades like this. So I had to find out the hard way that there is more to hooking up a higher end display like this. Yes, there are limits to driving a 4K screen. One part of the chain is the video output, but the other is the cabling. I had to learn the hard way now that HDMI cables have specifications. Up to now I only had a 1920×1440 to drive maximum and that turned out to be easy. I had to run to the shop and buy new cables. Cables with specs that could meet 3840×2160 and 60Hz.
After connecting that only the laptop dock kept flickering and I had to turn down the refresh rate to 30Hz. A dock like this is not the same as a video card. I do have a Thunderbolt external video card, but I only want to start that up when playing games. It makes quite some noise and is not suited for studio use. So just as I found out in live streaming that not any PC USB bus can drive multiple HD cameras, using 4K displays is a good way to tax any connected PC or device and the cabling. So if you are thinking about upgrading your studio workhorse, be prepared!
Another thing might be that the picture I shot above is from editing video in Blackmagic DaVinci Resolve Studio. The moment I started Resolve for the first time on a 4K screen the UI was microscopic small! It was completely useable, but totally not how I expected to work with Resolve. After some googling I found out that in order to see the normal layout on a 4K screen, you need to make the following changes to your system environment variables:
Last Friday I had a Halloween themed livestream in OBS. I wanted point the viewers to my upcoming song release and I wanted to play back a video with an interview I had with Pi.A about our collaboration on the new song. I tested everything in the afternoon and in the evening it turned out that there was no sound on the stream from my audio system. Bummer. After restarting the stream seemed to work again. Just a touch of real Halloween horror? I hope I can help you troubleshoot or make more advanced use of audio mixing and routing in OBS.
I spent this whole afternoon checking and rechecking my audio mixing and routing in OBS and I found no problem at all. It turns out that these things just happen. Let’s say that that is the charm of performing live, lol. However, it was something that had been long on my list to check out thoroughly, because I also had problem earlier with audio. So here is a recap of everything I know now.
Since you are probably a musician reading this, I won’t bore you with mixer basics like dBs and I’ll assume that you know how clipping sounds on audio output. I will also assume that you have no interest in Twitch Video On Demand stuff that is baked into OBS, because it is more geared towards gaming streamers.
It all starts with the sources
Every OBS source that outputs sound will appear in your center Audio Mixer panel. You can adjust the levels to your needs. Green is the safe area. Yellow is where your speech and music should ideally be. Red is the danger zone where the dreaded clipping might occur. You can mute a source to make sure it will not be recorded or streamed. By default all audio sources will be mixed to your stream or recordings.
OBS for now misses a master output level indicator. It might be added in the future. So if you mix in many different sources you end up guessing if the output will be OK. For now I exclusively mix the ASIO source, so that makes it easy to make sure the output is right. I mix in the Desktop audio just in case I have a sound from the PC I want to quickly mix in.
Just to be sure I added the Limiter as a filter. It’s not impossible to get an overcooked sound in the livestream, but the risk of clipping or overcooking goes down.
On the right track
If you right click in the Audio Mixer section you can go to the Advanced Audio Properties. Here you see all your sources, that is to say you can still choose to see only active sources. I have more complex scenes where I choose per song I play live which sources are active. On the right side you will see a block of six Tracks. These are stereo tracks you can mixdown to, so in fact you have a mixer before you with six stereo tracks.
Your live stream will use only one track. You can choose which one in the Advanced Streaming Settings for Output. By default it will be track 1. Now what is the use of having separate tracks to mix to? These tracks can be used for recording video with OBS. The tracks will end up written separately in the video file it records.
Gamers use this to have a track with the sound of the game and a track with their voice-over. Maybe also a separate track for sound effects and one for music. Then when they stream to Twitch they can leave out copyrighted music and still upload to YouTube with music. In the Advanced Recording Settings for Output you can choose which tracks will be written into the video output file.
Monitoring the output
If you leave everything set up as it is by default, all audio sources will output to your stream or recordings as you have just set it up. However there will not be any monitoring of the streamed or recorded output. This makes it heard to find the balance between different sources, if you have them. If you want to start monitoring you will have to select a Monitor option in the Audio Mixer settings. So there is also a stereo monitoring channel next to the six tracks.
In the Audio Settings, you will find the monitoring section. Here you can choose where to output the monitoring to. Please be aware of the latency that OBS introduces on the monitor output. You can’t use it live. You can just use it to find the right balance between different audio sources.
So what if there is no sound?
If you look in the monitoring options for an audio source in the Audio Mixer, there is even option to output sound to the monitoring channel, but not to the output tracks! So there is the option to set the level of an audio source to -inf, to mute the output and the option to switch off output to any of the output tracks. On top of that you could choose a track that does not get mixed to your stream. Very flexible, but it can also make it hard to find out why you have no sound on your stream.
There are only two ways to test your audio before really going live. The first one is to make sure you output the same track as your streaming track on a recording and then record a short video. The second one is to stream to an unlisted (YouTube) or private (FB) stream and then check the result. In short it is a miracle that this was the first time in maybe 50 livestreams that I had audio problems. Because it is live it is hard to know what the audience is hearing. I always ask at the start of a stream. Audio is really a tricky thing in a livestream to get right.
So here I am. I’m back from a short vacation. Just one and a half weeks. Long enough to have this feeling of really having traveled, but probably short enough to pick up the daily routines in no time. Usually I need a full week slow down to idle mode and then a week to start up again. One and a half week does not really cut it then, but it will have to do.
The choice for me was to just step out of the daily routines of practicing and playing live and then step in again, or bring some gear and practice on the road. Actually the only thing you need is anything between a phone and a laptop to be able to sing here and there, but if you also want to practice playing keys and singing there are some choices to make.
This time I chose to bring an iPad, a (Windows) laptop, the Zoom U-24 audio interface and a mini keyboard, the Komplete Kontrol M32. It gave me several options of practicing singing and playing the keyboard. The good thing about the M32 is its build quality and the playability given its limitations.
Is a midi keyboard with 32 mini keys something you can play on? Maybe. I found out that it is just a little too cramped and limited for my songs, but it was close to having a keyboard most of the time. It does fit into a backpack. Maybe it’s more suited to just playing around then practicing full songs? Jamming along some new song ideas? I brought it along, so it would have to make do. Your mileage may vary.
The most lightweight option is the iPad and the M32, but I had to bring a small USB-C hub to connect the two. Once connected and loading up Garageband, I was practicing a few songs in no time. Perfect for a few songs I really wanted to practice on. The iPad speaker audio quality is reasonable.
Then there is the option to scale up a little. Bringing the laptop allowed me to load up Ableton Live and the full live sets, or just load a basic setup to play piano sounds with the M32. A Windows laptop however only gives you Windows audio output which is notoriously slow and gives you latency. Unless you load Asio4All drivers of course. I tried it and it worked fine. The laptop speaker audio quality was not very special maybe even a bit too soft.
The full scale option was also at my disposal. By connecting the audio interface I had my full live set and low latency audio and I could practice any song just like always. Except of course for being limited to the 32 mini keys. The full set was also great for writing songs, or just some playing around. This time I needed a headset to hear something or portable speakers. The audio quality was outstanding.
All in all the experiment was a success. I have practiced a few songs. My vocal coach assured me that a short vacation is actually good for your voice, so I did not practice every day. I must admit I accepted the risk of using the mini keyboard also because I use a microKorg in my live setup. Mini keys are not a no-go area for me. I hope you can use these experiences to choose your own on-the-road-practice-setup.
For more than a year now I have embraced Blackmagic Davinci Resolve as my go to video editor. Slowly and gradually I found out how to do a bit of color grading. Its an art form that I do not claim to have mastered, but I know what happens if I turn the dials and it really brings consistency in a video. This then in turn helps to tell a story without distractions. The video editing itself, I just took for granted and I found a way that works in the Edit page of Resolve.
After a year it became clear that it would be also necessary to dive into the full Davinci Resolve Studio product and I found out that the right way to do this would be to buy the Davinci Resolve Speed Editor that comes free with a license. I thought it was a just a keyboard with shortcuts to help you navigate the editing process faster. How wrong could I have been?
This keyboard showed me that I had mistakenly skipped one step in the editing process. The process of sorting and selecting source material and trimming it to fill the timeline. It All Happens In The Cut Page. This was the page I always skipped over, because I thought it was just intended to cut stuff. Sorry, you knew this maybe all along. I had to learn because I bought the license and the keyboard came with it for free.
This changes everything. The Cut page is the start of the editing process. The Edit page is only for finetuning the main work done in the Cut page. The Speed Editor keyboard makes the start of the editing process a breeze. The complete edit above was done without touching the mouse or another keyboard. I can tell you, you need this keyboard even though you thought you didn’t. I’m bummed that I found this out late. For now I am just happy that I found the right way to use Davinci Resolve.
In a previous post I discussed how I try to have good audio quality for my livestream with OBS, by linking up a mixing desk I use for all live performances with a studio audio interface that I use for live streaming. So the idea is that when I know how to mix my live performance I can also livestream that mix with good audio quality. OBS supports high quality audio with an ASIO plugin, so all is great.
The mixing desk I use for live shows and streaming is the Zoom LiveTrak L-12. Lately I started using a separate laptop to do the livestreaming, not hooked up to the studio. For a livestream I would switch over the interface cable to the laptop. Only a few days ago I realized that the L-12 itself is an audio interface and I slapped my forehead.
Sure enough, when installing the L-12 driver software and starting up OBS with the ASIO plugin, I could find the Zoom device. After assigning the master output channels to the OBS inputs it worked immediately. So now the setup is way simpler. The livestreaming laptop is hooked up directly to the mixing desk. The master mix now is hooked up directly to OBS.
Now I asked myself, can I use the same trick to hook the L-12 directly to an iPad or iPhone to do livestreaming on Instagram, or other phone based streaming platforms? The L-12 can connect as a class compliant interface, so its no problem to hook it up to iOS devices. Software like Garageband will find its way in the Zoom inputs and outputs. You have to set a switch for this on the back next to the USB port.
However, the master outputs are not output channels 1 and 2, so iOS devices cannot pick it up as the default audio input. So no easy live streaming on the iPad or iPhone directly from the L-12 unfortunately. For this you will need to hook up another class compliant interface that picks up the mix desk outputs and does output the master mix on channels 1 and 2.
I am happy to report that setting everything up now is a breeze. Looking back, everything started to work straight out of the box with version Ableton Live 10.0.5. More good news, it still works straight out of the box in Ableton Live versions 11+. Support has become integrated now. From the corner of my eye I did see that there might be problems with Komplete Kontrol S series and Ableton Live 11+ versions, but I am not able to verify that. So, what does the support mean? It means that you can immediately start working with your Komplete Kontrol A series keyboard by selecting it as a control surface in the Preferences > Midi > Control Surface section by selecting the Komplete Kontrol Surface and the corresponding DAW input and output.
This is just the start. If you downloaded and activated the Komplete Kontrol software from Native Instruments (through Native Access), you will find the Komplete Kontrol VST instrument as a Plug-ins intstrument. Drag it into a MIDI track and you will have instant Kontakt instrument browsing from your track. Now that takes some getting used to I must admit. Please note the following. Your A series keyboard display browse much more responsive then the Komplete Kontrol VST, so ignore the screen and focus on the tiny A series display when browsing. Click the Browse button on the A series keyboard to jump back to browsing at any point.
When browsing Kontakt instruments, nudge the browse button left or right to step deeper and back into the levels of browsing process. So at the top level you choose your either Kontakt instruments, loops or one shots. At the deepest level you choose your sounds. You will hear the selection audition a sound as you browse. If you push (don’t nudge) the browse button down as a button it will select the auditioned sound. This might take a while, so be patient. After that remember that you can click the Browser button again and nudge left several times to back to the top level. Keep your eye on the tiny display to see where you are browsing.
Once you inside the Plug-in MIDI button will light up and you will notice that the controls on your A series keyboard will automatically control the instrument macro’s. Again, touch the knob to see on the tiny display which parameter or macro is controlled and tweak and turn to get the perfect sound. This is how your keyboard should have worked from the start of course, but I’m happy to see how it has progressed. For all other plain MIDI control use you can still use the method of placing your instrument in a rack and MIDI mapping the controls to your instrument.
Running a live stream with OBS can be tough if you want to have a little bit of a show and you’re making music at the same time. In OBS you can dynamically change whole scenes or switch individual sources (video clips, images, text, cameras, audio) on and off. In my case I want to launch different video clips for different songs. And I have a panic scene without camera and audio to just show that I’m busy fixing something. Fortunately OBS is full of neat little tricks to allow you to run the show with just keyboard shortcuts, or if you want to, with a push of a button on a remote control. No need to wander around with the mouse to try to hit the right spot.
One of the many options is an Elgato Stream Deck but I always hesitated to buy it. The different sizes cater for different needs, but I find it a bit pricey for just the single purpose of controlling OBS remotely. As a musician it seemed more logical to use a Novation Launchpad or other MIDI controller. The Stream Deck is the Rolls Royce option, no doubt, but it is an investment.
Unfortunately there is no standard MIDI support for OBS and also it needs to be two-way. When you push a button this sends a message and to light up the button to confirm your choice OBS needs to report back. That is what I would expect to happen on a Launchpad. You also need to map OBS events to MIDI keys. A Launchpad has very specific key sequence per row of buttons. Up to now I did not find an acceptable plugin or solution for MIDI. If you have please let me know.
Maybe you noticed the mobile option in the Elgato line-up above? For the past months I controlled OBS remotely using a phone or the iPad, but I didn’t use the Elgato Mobile app. You can opt for Elgato mobile, but it has a monthly or yearly subscription model. Again there are more affordable options. All starting with installation of the websocket plug-in in OBS. There are several to choose from. I’ve used the StreamControl app the longest, but eventually reached its limits. If you have just a set of scenes to control it’s perfect. It couldn’t handle my 20+ video sources to choose from during a live stream.
Please note that if you use Streamlabs OBS you can remote control with the Streamlabs Deck app. The Streamlabs Deck app can be paired with a QR code. I have the pure OBS version running, so this is why I need the websocket plug-in and I can’t use the Deck app.
Eventually I stepped up to Touch Portal and that is what I use now. It can do your dishes, the laundry and reserve a table for your next dinner and also controls OBS. It also needs the OBS websocket plugin by the way. It has a companion Touch Portal app that you can install. I use the paid version on an iPad so I can use the full surface of the iPad to remote control so in total I invested 14 euro. I saw no other way to be fully in control and up to know it hasn’t failed me. If you have found a better way to remote control OBS during a live stream show, please let me know in the comments!
This is my first adventure with Deepfake technology. This blog is intended to show you how to get you started. In short its actually a technology that has a very dark side where it seems to be possible to make photo’s that show faces of people in videos or photo’s they’ve never appeared in by swapping faces. It can be done very fast and usually very unconvincingly by some apps on your phone.
The full blown and latest software can actually let politicians or your neighbor do and say crazy things very realistically and this way can corrupt your believe of what is truth or fake. Very scary. It also has a very creative side. Why can’t you be a superhero in a movie? I experimented with this creative side.
A new song for me is a new story to tell. Then a second way to tell the story is with a video clip and I like to tinker around with new ideas for video clips. Most musicians leave it at just a pretty cover picture and dump it on YouTube, but I like to experiment with video. There is a new song that is in the making now and I already found beautiful footage with a girl and a boy. The first step I take is to make a pilot with the footage and ask people if they like the concept of the clip.
Then I bumped into someone very creative on Instagram and when I showed the video it triggered some crazy new ideas. Why not make the story stronger with flashbacks? And there I thought why not swap myself in those flashbacks? The idea to use Deepfake technology was born. But how to get going with Deepfake?
First investigations led to two different tools: DeepfaceLab and Faceswap. There are many more tools, but in essence its probably all the same. Extraction tools to find faces in pictures. A machine learning engine like Tensorflow to train a model to swap two faces and converter tools to generate the final video. For you machine learning may be magic, but I already knew it from earlier explorations. Simply said its possible to mimic the pattern recognizing (read: face and voice recognizing here) that we humans are so good at.
Machine learning in the form that we have now in Tensorflow requires at least somewhere in the range of 1000 examples of something to recognize and the correct response to output when something is recognized. By feeding this into the machine learning engine it uses it can be trained to output a picture with a face replaced when recognizing the original face. To be able to make a reliable replacement the original and replacement data have to be formatted and lined up to make automated replacement possible. One aspect of the machine learning process is that it benefits a lot by GPU processing i.e. a powerful video card in your PC. This is important because current training mechanisms need around a million training cycles.
I chose Faceswap, because for DeepfaceLab it was harder to get all the runtimes. Faceswap has a simple setup tool and nice graphic user interface. The technology is complex but maybe I can help you getting started. By the time you read this there are probably many other good tools, but the idea remains the same. The Faceswap setup first installs a Conda Python library tool. Then all the technology gets loaded and a nice UI can be launched. There is one more step you need to do. You need to find out which GPU tooling you can use to accelerate machine learning. For a NVidia graphics card you will need to have CUDA installed.
Step 1: Extraction
The first step is actually getting suitable material to work with. The machine learning process needs lots of input and desired output in the form of images. At least around 1000 is a good start. This could mean 40 seconds of video at 25 fps, but 10 minutes of video will work even better of course. You can expect the best results if these match up as closely as possible. Even to the point of lighting, beards, glasses etc. If you know the target to do the face swap on you should find source material that matches as close as possible
Then its extraction time. This means already applying machine learning to find faces in the input and then extract these as separate images. These images contain only the faces, straightened up and formatted to get them ready to be used for the face swap training process. You need to extract faces from both the target and source video. For every face image the extraction process also records where the extracted image is found and how to crop and rotate the face to place it back. These are stored in Alignment files.
After extraction you need to single out only the faces that you’re interested in in case there are multiple faces in either source or target. From that point you can go to the next step, but the quality of the end result depends very much on the extraction process. Check the extracted images and check them again. Weed out all images that the learning process should not use. Then regenerate the associated Alignment files. Faceswap has a separate tool for this.
Step 2: Training
By passing in the locations of the target (A) and source (B) images and Alignment files you are ready for the meat of the face swap process, the machine learning training. Default settings dictate that training should involve 1.000.000 cycles of matching faces in target images to be replaced by faces in the source images. By default for all machine learning the software hopes that you have a powerful video card. In my case I have an NVidia card and CUDA and this works by default. If you don’t have a video card you can work without one. I found it slows the process down by a factor 7. My GPU went from 35% usage to 70% usage.
In my experiments I had material that took around 8 hours to train 100.000 times, so it would take 80 hours to train 1.000.000 times. Multiply that times 7 and you know its a good idea to have a powerful video card in your PC. During training you can see previews of the swap process and indicators for the quality of the swaps. These indicators should show improvement and the previews should reflect that. Note that the previews show face swaps vice versa. So even at this point you can switch source and target.
I saw indicators going up and down again, so at some point I thought that it was a good time to stop training. I quickly found out that the training results, the models, where absolutely useless. Bad matches and bad quality. At that point I went back to fixing the extractions again and rerunning the training. Much simpler, if the previews show fuzziness of the swap, the final result will also be fuzzy. So keeping track of the previews gives you a good idea of the quality of the final result. The nice thing about Faceswap is that it allows you to save an entire project. This makes it easier to go back and forth in the process.
Step 3: Converting
This is the fun part. The training result, the model, will be used to swap the faces in the target video. Faceswap generates the output video in the form of a folder with the image sequences. You will need a tool to convert this to a video. The built-in tool to convert images to video didn’t work for me. I used stop motion functionality from Corel VideoStudio. If the end results disappoints, its time to retrace steps in extraction or training. Converting is not as CPU/GPU intensive as training. You can at any point stop the training and try conversion out. Then when you start training again it builds on the last saved state of the model. If the model is crap, delete it and start over.
Here is a snip of the first fuzzy results. The final end result is not yet ready. Mind you, the song for the video clip is not yet ready. I will share the results here if it is all done. I hope now this is start for you to try this technology out now for your video’s! Please note that along the way there are many configuration options and alternative extraction and training models to choose from. Experimenting is time consuming, but worth it.
One more thing. Don’t use it to bend the truth. Use it artistically.
So this is what one of the interviewers said when I visited the local radio station here: “why not a cartoon video?” It was a passing remark when going over my video channel after the radio interview. Its something that this person, working with lots of creatives at the art academy in Den Haag, can easily say. But what if you’re just this guy in the attic? How to make a cartoon video? Not easy. This is how I got close to the result I was looking for with my video release for Perfect (Extended Remix).
A go to place is of course Fiverr. Here you can find animation artists and have your cartoon video in no time. There are actually animation sites that allow you to make your own animated video with stock figures and objects and I tried it. The first results where promising, but you need to go on a payed subscription to have maximum freedom. Even then you’ll find its mostly targeted towards business animations and infographics. A fun video clip animation is still hard to make. If you want you can try it: Animaker.
Eventually I stumbled upon this Video Cartoonizer. Its not free, but it seemed like it could do some pretty amazing stuff with “cartoonizing” existing video. You can see parts of the original video material here. Its quite funky and in many ways old fashioned software. It takes agonizing days to process video recordings like this, but the end result was quite amazing. Model Sara was also pretty pleased with the result. So there you have it. My first “cartoon” video.
I will never sell your personal information. I'm in it for the music!