In a previous post I discussed how I try to have good audio quality for my livestream with OBS, by linking up a mixing desk I use for all live performances with a studio audio interface that I use for live streaming. So the idea is that when I know how to mix my live performance I can also livestream that mix with good audio quality. OBS supports high quality audio with an ASIO plugin, so all is great.
The mixing desk I use for live shows and streaming is the Zoom LiveTrak L-12. Lately I started using a separate laptop to do the livestreaming, not hooked up to the studio. For a livestream I would switch over the interface cable to the laptop. Only a few days ago I realized that the L-12 itself is an audio interface and I slapped my forehead.
Sure enough, when installing the L-12 driver software and starting up OBS with the ASIO plugin, I could find the Zoom device. After assigning the master output channels to the OBS inputs it worked immediately. So now the setup is way simpler. The livestreaming laptop is hooked up directly to the mixing desk. The master mix now is hooked up directly to OBS.
Now I asked myself, can I use the same trick to hook the L-12 directly to an iPad or iPhone to do livestreaming on Instagram, or other phone based streaming platforms? The L-12 can connect as a class compliant interface, so its no problem to hook it up to iOS devices. Software like Garageband will find its way in the Zoom inputs and outputs. You have to set a switch for this on the back next to the USB port.
However, the master outputs are not output channels 1 and 2, so iOS devices cannot pick it up as the default audio input. So no easy live streaming on the iPad or iPhone directly from the L-12 unfortunately. For this you will need to hook up another class compliant interface that picks up the mix desk outputs and does output the master mix on channels 1 and 2.
I am happy to report that setting everything up now is a breeze. Looking back, everything started to work straight out of the box with version Ableton Live 10.0.5. More good news, it still works straight out of the box in Ableton Live versions 11+. Support has become integrated now. From the corner of my eye I did see that there might be problems with Komplete Kontrol S series and Ableton Live 11+ versions, but I am not able to verify that. So, what does the support mean? It means that you can immediately start working with your Komplete Kontrol A series keyboard by selecting it as a control surface in the Preferences > Midi > Control Surface section by selecting the Komplete Kontrol Surface and the corresponding DAW input and output.
This is just the start. If you downloaded and activated the Komplete Kontrol software from Native Instruments (through Native Access), you will find the Komplete Kontrol VST instrument as a Plug-ins intstrument. Drag it into a MIDI track and you will have instant Kontakt instrument browsing from your track. Now that takes some getting used to I must admit. Please note the following. Your A series keyboard display browse much more responsive then the Komplete Kontrol VST, so ignore the screen and focus on the tiny A series display when browsing. Click the Browse button on the A series keyboard to jump back to browsing at any point.
When browsing Kontakt instruments, nudge the browse button left or right to step deeper and back into the levels of browsing process. So at the top level you choose your either Kontakt instruments, loops or one shots. At the deepest level you choose your sounds. You will hear the selection audition a sound as you browse. If you push (don’t nudge) the browse button down as a button it will select the auditioned sound. This might take a while, so be patient. After that remember that you can click the Browser button again and nudge left several times to back to the top level. Keep your eye on the tiny display to see where you are browsing.
Once you inside the Plug-in MIDI button will light up and you will notice that the controls on your A series keyboard will automatically control the instrument macro’s. Again, touch the knob to see on the tiny display which parameter or macro is controlled and tweak and turn to get the perfect sound. This is how your keyboard should have worked from the start of course, but I’m happy to see how it has progressed. For all other plain MIDI control use you can still use the method of placing your instrument in a rack and MIDI mapping the controls to your instrument.
Running a live stream with OBS can be tough if you want to have a little bit of a show and you’re making music at the same time. In OBS you can dynamically change whole scenes or switch individual sources (video clips, images, text, cameras, audio) on and off. In my case I want to launch different video clips for different songs. And I have a panic scene without camera and audio to just show that I’m busy fixing something. Fortunately OBS is full of neat little tricks to allow you to run the show with just keyboard shortcuts, or if you want to, with a push of a button on a remote control. No need to wander around with the mouse to try to hit the right spot.
One of the many options is an Elgato Stream Deck but I always hesitated to buy it. The different sizes cater for different needs, but I find it a bit pricey for just the single purpose of controlling OBS remotely. As a musician it seemed more logical to use a Novation Launchpad or other MIDI controller. The Stream Deck is the Rolls Royce option, no doubt, but it is an investment.
Unfortunately there is no standard MIDI support for OBS and also it needs to be two-way. When you push a button this sends a message and to light up the button to confirm your choice OBS needs to report back. That is what I would expect to happen on a Launchpad. You also need to map OBS events to MIDI keys. A Launchpad has very specific key sequence per row of buttons. Up to now I did not find an acceptable plugin or solution for MIDI. If you have please let me know.
Maybe you noticed the mobile option in the Elgato line-up above? For the past months I controlled OBS remotely using a phone or the iPad, but I didn’t use the Elgato Mobile app. You can opt for Elgato mobile, but it has a monthly or yearly subscription model. Again there are more affordable options. All starting with installation of the websocket plug-in in OBS. There are several to choose from. I’ve used the StreamControl app the longest, but eventually reached its limits. If you have just a set of scenes to control it’s perfect. It couldn’t handle my 20+ video sources to choose from during a live stream.
Please note that if you use Streamlabs OBS you can remote control with the Streamlabs Deck app. The Streamlabs Deck app can be paired with a QR code. I have the pure OBS version running, so this is why I need the websocket plug-in and I can’t use the Deck app.
Eventually I stepped up to Touch Portal and that is what I use now. It can do your dishes, the laundry and reserve a table for your next dinner and also controls OBS. It also needs the OBS websocket plugin by the way. It has a companion Touch Portal app that you can install. I use the paid version on an iPad so I can use the full surface of the iPad to remote control so in total I invested 14 euro. I saw no other way to be fully in control and up to know it hasn’t failed me. If you have found a better way to remote control OBS during a live stream show, please let me know in the comments!
This is my first adventure with Deepfake technology. This blog is intended to show you how to get you started. In short its actually a technology that has a very dark side where it seems to be possible to make photo’s that show faces of people in videos or photo’s they’ve never appeared in by swapping faces. It can be done very fast and usually very unconvincingly by some apps on your phone.
The full blown and latest software can actually let politicians or your neighbor do and say crazy things very realistically and this way can corrupt your believe of what is truth or fake. Very scary. It also has a very creative side. Why can’t you be a superhero in a movie? I experimented with this creative side.
A new song for me is a new story to tell. Then a second way to tell the story is with a video clip and I like to tinker around with new ideas for video clips. Most musicians leave it at just a pretty cover picture and dump it on YouTube, but I like to experiment with video. There is a new song that is in the making now and I already found beautiful footage with a girl and a boy. The first step I take is to make a pilot with the footage and ask people if they like the concept of the clip.
Then I bumped into someone very creative on Instagram and when I showed the video it triggered some crazy new ideas. Why not make the story stronger with flashbacks? And there I thought why not swap myself in those flashbacks? The idea to use Deepfake technology was born. But how to get going with Deepfake?
First investigations led to two different tools: DeepfaceLab and Faceswap. There are many more tools, but in essence its probably all the same. Extraction tools to find faces in pictures. A machine learning engine like Tensorflow to train a model to swap two faces and converter tools to generate the final video. For you machine learning may be magic, but I already knew it from earlier explorations. Simply said its possible to mimic the pattern recognizing (read: face and voice recognizing here) that we humans are so good at.
Machine learning in the form that we have now in Tensorflow requires at least somewhere in the range of 1000 examples of something to recognize and the correct response to output when something is recognized. By feeding this into the machine learning engine it uses it can be trained to output a picture with a face replaced when recognizing the original face. To be able to make a reliable replacement the original and replacement data have to be formatted and lined up to make automated replacement possible. One aspect of the machine learning process is that it benefits a lot by GPU processing i.e. a powerful video card in your PC. This is important because current training mechanisms need around a million training cycles.
I chose Faceswap, because for DeepfaceLab it was harder to get all the runtimes. Faceswap has a simple setup tool and nice graphic user interface. The technology is complex but maybe I can help you getting started. By the time you read this there are probably many other good tools, but the idea remains the same. The Faceswap setup first installs a Conda Python library tool. Then all the technology gets loaded and a nice UI can be launched. There is one more step you need to do. You need to find out which GPU tooling you can use to accelerate machine learning. For a NVidia graphics card you will need to have CUDA installed.
Step 1: Extraction
The first step is actually getting suitable material to work with. The machine learning process needs lots of input and desired output in the form of images. At least around 1000 is a good start. This could mean 40 seconds of video at 25 fps, but 10 minutes of video will work even better of course. You can expect the best results if these match up as closely as possible. Even to the point of lighting, beards, glasses etc. If you know the target to do the face swap on you should find source material that matches as close as possible
Then its extraction time. This means already applying machine learning to find faces in the input and then extract these as separate images. These images contain only the faces, straightened up and formatted to get them ready to be used for the face swap training process. You need to extract faces from both the target and source video. For every face image the extraction process also records where the extracted image is found and how to crop and rotate the face to place it back. These are stored in Alignment files.
After extraction you need to single out only the faces that you’re interested in in case there are multiple faces in either source or target. From that point you can go to the next step, but the quality of the end result depends very much on the extraction process. Check the extracted images and check them again. Weed out all images that the learning process should not use. Then regenerate the associated Alignment files. Faceswap has a separate tool for this.
Step 2: Training
By passing in the locations of the target (A) and source (B) images and Alignment files you are ready for the meat of the face swap process, the machine learning training. Default settings dictate that training should involve 1.000.000 cycles of matching faces in target images to be replaced by faces in the source images. By default for all machine learning the software hopes that you have a powerful video card. In my case I have an NVidia card and CUDA and this works by default. If you don’t have a video card you can work without one. I found it slows the process down by a factor 7. My GPU went from 35% usage to 70% usage.
In my experiments I had material that took around 8 hours to train 100.000 times, so it would take 80 hours to train 1.000.000 times. Multiply that times 7 and you know its a good idea to have a powerful video card in your PC. During training you can see previews of the swap process and indicators for the quality of the swaps. These indicators should show improvement and the previews should reflect that. Note that the previews show face swaps vice versa. So even at this point you can switch source and target.
I saw indicators going up and down again, so at some point I thought that it was a good time to stop training. I quickly found out that the training results, the models, where absolutely useless. Bad matches and bad quality. At that point I went back to fixing the extractions again and rerunning the training. Much simpler, if the previews show fuzziness of the swap, the final result will also be fuzzy. So keeping track of the previews gives you a good idea of the quality of the final result. The nice thing about Faceswap is that it allows you to save an entire project. This makes it easier to go back and forth in the process.
Step 3: Converting
This is the fun part. The training result, the model, will be used to swap the faces in the target video. Faceswap generates the output video in the form of a folder with the image sequences. You will need a tool to convert this to a video. The built-in tool to convert images to video didn’t work for me. I used stop motion functionality from Corel VideoStudio. If the end results disappoints, its time to retrace steps in extraction or training. Converting is not as CPU/GPU intensive as training. You can at any point stop the training and try conversion out. Then when you start training again it builds on the last saved state of the model. If the model is crap, delete it and start over.
Here is a snip of the first fuzzy results. The final end result is not yet ready. Mind you, the song for the video clip is not yet ready. I will share the results here if it is all done. I hope now this is start for you to try this technology out now for your video’s! Please note that along the way there are many configuration options and alternative extraction and training models to choose from. Experimenting is time consuming, but worth it.
One more thing. Don’t use it to bend the truth. Use it artistically.
So this is what one of the interviewers said when I visited the local radio station here: “why not a cartoon video?” It was a passing remark when going over my video channel after the radio interview. Its something that this person, working with lots of creatives at the art academy in Den Haag, can easily say. But what if you’re just this guy in the attic? How to make a cartoon video? Not easy. This is how I got close to the result I was looking for with my video release for Perfect (Extended Remix).
A go to place is of course Fiverr. Here you can find animation artists and have your cartoon video in no time. There are actually animation sites that allow you to make your own animated video with stock figures and objects and I tried it. The first results where promising, but you need to go on a payed subscription to have maximum freedom. Even then you’ll find its mostly targeted towards business animations and infographics. A fun video clip animation is still hard to make. If you want you can try it: Animaker.
Eventually I stumbled upon this Video Cartoonizer. Its not free, but it seemed like it could do some pretty amazing stuff with “cartoonizing” existing video. You can see parts of the original video material here. Its quite funky and in many ways old fashioned software. It takes agonizing days to process video recordings like this, but the end result was quite amazing. Model Sara was also pretty pleased with the result. So there you have it. My first “cartoon” video.
This is a glance in my kitchen where I will tell you my kitchen secret: the sauce. You will find it somewhere on almost every song I released, the Molekular effects inside a Reaktor FX chain. This is an effect powerhouse that I use to bring life to otherwise maybe repetitive or otherwise uninteresting sounds. It’s well hidden somewhere in the infinite sound and effect library of Native Instruments. However, if you use Reaktor as part of your workflow, you might already know it. It’s sound experimentation to the max.
It’s hard to dive into the features of Molekular, because its really overflowing with possibilities. Just a look at the interface can already make your brain explode. Imagine that underneath that interface all kinds of wires are running to connect everything with anything. Reaktor users will be used to it, because it will be just a set of modules like all other modules. Please check out all video’s explaining the Molekular effects chain on the Native Instruments site.
I will try to make a start though. It starts with putting a Reaktor FX plugin in your effects chain. Then inside the FX plugin you load Molekular. Then in essence it starts on the bottom row. There you will see a chain of effects, that you can start modulating. The chain connections are depicted in the top right section. Effects can be chained one after the other, or parallel, or a combination of serial and parallel mixed. Then in the top left and middle you can choose how to modulate all the effect parameters.
The effects are just plain awesome. Hard filters, delays, reverbs, pitch shifters. Everything you need to bring bland sounds to life. You can make a rhythmic track tonal, or vice versa. You can drown sounds in distorted delays or otherwise alienating effects, or bring subtle life to a sound.
On the left side there are LFO’s, Envelopes, a step sequencer and a complex form of logic modulation. The modulation methods kind of overlap here and there and can then be interconnected to multiply or randomize the modulation of the effect chain. Then in the middle is a center piece, an X-Y modulator that can be set in motion by logic or the step sequencer, or by you.
The greatest power of this all is that if you replay your song you will have all modulations, no matter how complex, take place exactly the same way. The modulation can have complexity, but also repeatability in time. If you are a fan of totally random every time, this is always an option. For me the magic is the repeatability.
It means that I can just try some alchemy in effect chains and mess around with the modulation. If I find something that sounds cool, I can let it sound as cool every time. Assuming that you, like me, start the render from the same point every render time, the modulation of the effects will be the same. I find it inviting for experimentation, because it is rewarding if I find something that works.
There is only one problem now. With my luck, now that I tell you about it, it will probably jinx everything and it will be discontinued or stop functioning soon. This will really mean that I will have to freeze a machine software wise to allow it to keep running Molekular. With this in mind I will just tell you about it, so you can do the same.
Yesterday I did a live stream with a new head microphone or headset mic and for the first time since using it, something went wrong. Kind of spoiling an hour long live stream. Before this I used my old faithful AKG D330 on a microphone stand, but when streaming, visually this was kind of a pole with a big thing in my face. So, enter the Samson Wireless Concert 88x I chose this mic because it was affordable and suited for singing. Worth an experiment.
A lot of these affordable headset are for sport instructors, so more intended for the frequency range of the spoken word. Also a lot of the smaller, more invisible, headset mics have an omnidirectional sensitivity. I was worried that such a mic would pick up the key clicks and foot pedal stomps. This mic has cardoid sensitivity that seems to only pick up my voice and not any of the noise from playing. Comfort while wearing is also an aspect and adjustability. On most aspects this mic is fine for me. Audio quality is a little less transparent then the AKG, but acceptable.
The first reactions on the looks in the live stream are positive. Visually this is an improvement over a big round mic on a stand. One aspect of these mics is that, because they’re stuck to your face, you can’t vary the distance to the mic anymore. Any intention or emotion you want to add, by yelling with the mic far away, or whispering with the mic close by is impossible. Some singers that want to belt with the mic far away will feel limited. In my dreamy pop songs I am missing it a little, but not a lot.
The first real pitfall I fell in was yesterday. Because I wanted to drink some water before going live a moved the mic a little bit from my face. Then in the live stream someone remarked that my voice volume was so low. I started fiddling with the faders for the mic, but only after watching back the live stream I saw that it was too far from my face. Caught by the cardoid sensitivity!
Some other downsides are when I breathe through my nose, the wind blows straight into the mic. Resulting in a rumbling sound. Also, one of my songs starts with a part where it’s like i’m calling a friend and speaking into the answering device. The design of this mic more that ever makes me look like a call center employee hahaha.
Another aspect is that it is a wireless model. I chose this because eventually I want to play really live again and it would be convenient. It means however that I now have to rely totally on a set of batteries. When you buy an inexpensive set like this, there is no battery indicator. For now it seems reliable in battery life and there have been no problems with the wireless connection. I’ve had maybe 6 hours of operation from the first set of batteries. I hope it won’t fail on my while playing live. Knock on wood.
I’m also the kind of person that immediately starts using a new gadget like this. Tossing aside the manual. But browsing through it after some days I found out that you should not skip reading it. Here in the studio it works out of the box on the default frequency. Live however you and I will undoubtedly have to fiddle around to find the best frequency and you need instructions from the manual to set up right.
For now this little and affordable gadget sounds good enough, really adds convenience and just looks better.
A vocal pitch trainer. Any guitarist can get a very pocketable guitar tuner for just a few bucks. So why wouldn’t a singer be able to use the same? Well actually would you as a singer want one? The voice, like a violin can play any note in any tuning. Why would you want to sing a perfect 440 Hz A when other instruments around you are not in tune? Another one is that sometimes you put some ’emotion’ and ‘glides’ in. your singing. That would be lost if you would sing perfectly pitched.
To set you up right. I’m now in the vocal coaching program of Tiffany van Boxtel. I wanted to improve my live singing. Her main goal is to give you confidence while singing. Singing in tune is just one aspect and in her program its NOT the main focus. Better sing with confidence and connect with your audience than sing totally in tune is the motto. The coaching program is awesome for me.
Enter the Korg VPT-1. Its not very expensive, but then again its 4 times as expensive as an entry level guitar tuner. When you switch it on, it immediately shows a level, starting at Easy. The top control toggles between Easy, Medium and Hard. Then when you sing a note appears on the bars on screen. For me it was more useful to see the note letter and octave. For this you can use the middle control. It also sets your center note. It starts at A4 but i set it to C4. Then the bottom control plays the note but with a simple toy-like sound.
Then there is a blue indicator and a sharp red indicator and a flat red indicator. Blue lighting up shows you that you are singing in perfect tune. Red sharp means: higher then perfect tune. Red sharp means: lower then perfect tune. The idea is that if you sing scales the right notes show and the indicator is mostly blue. On level Easy that is easy and on Hard its hard. Simple as that.
Now how does this work in practice? One of the most important things I have learned is to warm up the voice before performing. I use a standard warm up exercise with scales. This is where I now pick up the VPT-1 to just check that indeed most notes light up blue and that gives me confidence. I can see that at the start of the exercise there are more red notes and slowly i get into the blue zone. I do not switch to medium.
For me now using it this way its not a toy but a gadget. It would probably be no use for me while singing otherwise. You have to hold it close to your face to pick up your voice correctly. For just the warm up, which is its perfectly in tune, its fine. Then another exercise is lip buzzes. The VPT-1 does not handle that at all. It doesn’t recognize lip buzzes as notes. All in all I hope you find this information useful. Let me know how it works for you if you have it.
The first platform I looked at when starting to stream live was Instagram. Straight from the start it was obvious that Instagram wants you to use a phone. It needs to be upright and there is no out-of-the-box streaming solution for connecting streaming software from a PC. There are some software packages that allow you to stream from your PC to Instagram, like YellowDuck. These always need to jump through some hoops like authentication. I didn’t want to go there.
OK. Streaming from your phone seems to be the way for Instagram. In a previous post I explained that I want a good live audio quality. When live streaming, my starting point is the output from the mixing desk that I would connect to the PA when playing live in real life, if you know what i mean. So I feed the output of the live mixing desk directly into to the PC that streams to YouTube etc. Now for me the question is how to feed this into your phone. It could be very ‘live’ to use the microphone of the phone, but I could only see it lead to a noisy and garbled live show.
Fortunately, there are several ways to feed audio into your phone. Just like feeding the audio to a live streaming PC. Isn’t it amazing how phones have become kind of like the modern ultra portable PC? The bad news is that this time your cheap budget phone probably won’t cut it. You either need an iPhone or an Android phone above mid-range.
For an iPhone you can find plenty audio to lightning cables. If you want a bit more control you can use most of these iRig devices in the interface product section. Some of these have 2 inputs so they can act as some kind of live mixer for maybe a guitar and a microphone. For Android the situation is slightly more complex. You can check if your phone supports access to the audio by means of the USB C plug, or you can check if your phone supports OTG on its USB plug. If OTG is supported again most of the iRig devices will work like a charm.
In my case the Samsung Galaxy S10 supports OTG. So the first thing I did was lookup all the iRig devices to see which one was most suitable. Then I came across the Zoom U-22 and U-24 devices. There I remembered that my Zoom H1n is actually also an audio interface. Then I tested if the Samsung Galaxy S10 recognized my Zoom H1n as an audio interface and boom! Instant success! No need to buy anything new. Then I got carried away, because my live mixer is also from Zoom and I connected my live mixing desk as an audio interface, but that didn’t work unfortunately. The phone crashed.
So this was the setup for my first Instagram live stream. A special OTG cable connects the USB port of the Zoom H1N with the phone. The Zoom H1n line in is connected to the mix output of the Zoom L-12 LiveTrak mixer. The first results were very promising. Unfortunately I could hear a quite audible hiss. It should tune the signal flow between the live mixer and the audio input. It could also be that the quality of the Zoom H1n as an audio interface is inadequate. Another downside is that you have to rely on the Zoom H1n batteries and/or your phone batteries. Maybe not a good idea if you want to do a live stream marathon. For my purposes now its OK. I hope you can now too join the flood of Instagram live streamers!
If you have seen my recent live streams, you will have noticed that I ‘travel around’ these days while live streaming. I’ve started to use the Green Screen effect. With OBS Studio its so dead simple that you can start using it with a few clicks in your OBS Studio scenes. Of course there are also some caveats I want to address. The main picture for this post shows you what it can look like. It may not be super realistic, but it is eye catching.
So what do you need to get this going? A Green Screen is the first item you need. It does not have to be green. It can be blue or blue-green, but it should not match skin color or something you wear. It should cover most of the background, so it will need to be at least 2 meter by 1.6 meter, which is kind of a standard size you can find in shops. It should be smooth and solid. Creases and folds can result in folds in the backdrop, but some rippling is OK.
Then you need to set up OBS Studio. Its as simple as right-clicking your camera in the scene and selecting the Filters properties. In the dialog add the Chroma Key filter and select the color of your green screen. Then slide Similarity from somewhere around 100-250 to get a good picture. Everything outside the color range will become black. Then add a backdrop image (or video!) somewhere below the camera in the the scene list and you will have your Green Screen effect.
The first caveat I bumped into was that I set it up during daytime and it kind of worked, but then I found I stream in at night time and then you need light. In fact it turned out that 2 photo studio lights came in handy. When you use at least 2 studio lights they also cancel out shadows through folds and creases in the green screen. It does however bleed a little onto you as a subject, so you will be strangely highlighted as well. This is something you can also see in my first Amsterdam subway picture. Because of the uneven lighting in subways it does not really show. Not every picture is suitable as a backdrop. Photos with people or animals don’t work, because you expect them to move.
The second effect you see is that instruments with reflective surfaces also reflect the green screen. This will result in the background shining through reflecting surfaces. My take is that its a minor distraction, so I accept some shining through of the backdrop. Its also possible that some parts of your room don’t fit well with the Green Screen, doorways or cupboards. In that case you can choose to crop the camera in the scene by dragging the sides of the camera in the scene with the Alt-key (or Apple key) down. The cropped camera borders, will be replaced by the backdrop.