AI generated images are still not perfect, but why?

So I enjoy making pretty visuals that go along with the music, photography and video. It is always tricky to get what you are looking for. And what if you want a video of a burning car, or a picture of a woman falling from the sky? Meet the new kid on the block: AI. Wouldn’t be great if you could prompt your perfect AI generated image or video? Anything you wish for and then get it. Ready for use in any music visual? Sign me up!

Maybe you tried it like I did. Prompt something interesting or funny in Dall-E and and then the results were either amazing or hilariously bad. Well, lets look at it this way. Pandora’s box is open and there is no way it will close again. At first I dismissed AI imagery as uncontrollable, but these last months I found a a way to control it. I will explain how and then probably you will see why i still feel like I’m collaborating with a sorcerer’s apprentice.

In this article I will try to explain how generating images with AI works. And I will try to explain recent progress that has been made so you see that much more is possible than you maybe suspect. All my findings resulted in the AI generated images you find in this article and yes they are sometimes amazing and sometimes hilarious. I did at one time write about faceswap techniques, another way to use AI in much the same way. The old article is now a bit dated. Lets first understand more about image generation.

AI generated image of “a grand piano on a beach”

Training the sorcerer’s apprentice

There are many online services that offer magic straight out of the box. But what happens behind the curtains is well hidden. For all services it all starts with tagged visual content. Images with textual descriptions that correctly describe what’s in the picture. AI is just about recognizing patterns and qualifying these. So how is it possible that an AI generates something?

The clever trick is that people found out you could pit two AI’s against each other. Imagine a student and a teacher. The student AI (generator) is good at messing around with an image. Starting with just random noise. In endless cycles it shows the results of messing around with the image and showing it to the teacher AI (discriminator). The teacher then gives a grade that indicates which result better matches the given prompt.

But the problem is that the AI is mindless. Even though the image can match the prompt it doesn’t have to be realistic. Five fingers or six? Eyes a bit unfocussed? An extra limb? It is best compared with dreaming, sometimes even hallucinating. There is no logic that corrects the result, just random luck or accident. It seems that the human touch is to maybe be able to imagine things, but then make sure it is within the bounds of reality. The wizard’s apprentice is doing some uncontrollable magic.

AI generated images come at a cost

The whole training process and the the work of of generating images costs a lot of time and computing power. You will always have to pay in one way or another for generating images, let alone video, which is just lots and lots of images. In the case of video another AI trick needs to be done. Guessing what image will follow another image and have this pre-trained and processed. More training, more time, more computing power.

Create your own magic

Getting a lot of images and having them all perfectly described is the magic sauce of AI imagery. This is what the student and the teacher need to make the right images. It determines what kind of images can be generated and also the quality. All images are turned into small markers and then there are clever algorithms to find and combine the markers (vectors) to recreate an image. By letting the student and the teacher use sampling to navigate these markers you can start to generate images. The teacher tries to make sure it matches the prompt for the image.

Now is it possible to have an AI generate a specific person? Specific clothing? Specific settings? Yes this is possible. If I pretrain with only images of me, doing all kinds of things, the result will be that I’m the main character in all generated images. If you then prompt “A man playing a violin on the beach”, it will be me playing the violin. Now hopefully there were also images of playing the violin and images of a the beach. But so many images with correct descriptions? An almost impossible complex task it seems.

Or tame the magic

Let’s assume you don’t have these large sets of perfectly described images to pre-train. Now here is progress of the whole AI image generation in the past year. Imagine that you can start with a well pre-trained set of markers with lots of well described images. Clever people found out that you can just add to any of this AI generating stuff a few hints, a Low Rank Adaptation (LoRA). For images, imagine this. Take a smaller set of well described images and allow these to just tweak the results of the original sampling of a pre-trained set of images. And you trigger the results with new keywords in the descriptions.

The results are stunning. I found a way to get this all working on servers with enough computing power. Starting with a good pre-trained set of images called Flux, I then learned how to start training LoRA’s with my own images and descriptions. Now i can simply ask for an AI generated image of me playing the piano on the beach. Flux knows how to get to an image of someone playing the piano on the beach. The LoRA knows how to make this picture look like me. Of course I trained a LoRA with a set of pictures of myself. I trained it with a set of pictures of Alma, my imaginary persona. When Swiss DJ Oscar Pirovino visited me for Amsterdam Dance Event he wanted to participate and I generated another LoRA.

AI generated image of “oscar in a black suit playing a black grand piano on the beach”

The computing power needed is still immense, but now the results are much more controllable. The Flux pre-trained set is good enough to make sure that I can generate a nice photographic setting. A small set of well described additional images can tweak the results to have me, Alma or Oscar in the picture every time. The results are still unpredictable in details of course. In a way the results of a photoshoot are also a bit unpredictable, but in a real life photoshoot I can rearrange stuff and try different poses on the spot. No doubt AI generated images are getting there and maybe in a few years it will be just like a photoshoot. We will see!

AI generated image of “thewim in a black suit playing a black grand piano on the beach”

Make MIDI control a reliable part of your live stream

Yes I am using MIDI control as part of my live streaming. How? In a very straightforward way. Playing a new song will trigger a video clip playing as a backdrop for the stream. It can also trigger a scene in the lighting unit. Live streaming is my way to improve my live performances, even though a live stream is not the same as a live show. Anyway, hence my endeavors to still improve my live streams and make MIDI control a reliable part of the live stream.

The old way: MIDIControl

Up to now I had to rely on a separate program MIDIControl to catch MIDI events and relay these to OBS . I can tell you that any chain of devices or software is easily broken in a live stream. More then once I was in a situation where it simply didn’t work. No harm done musically, but the show does look a bit more bland. Lately a new version of OBS broke the link permanently. I had to wait for an update for the MIDIControl program.

This triggered a new search for alternatives and I found one in the form of a true OBS plugin: obs-midi-mg. The first version I used was a 1.x version, the latest version 2.2 has lots of improvements in the UI. In the first version you had to step into binding, step out and step into action to make a scene change work. Now it’s all on one page, like you can see in the screenshot in the header. The big improvement for me is that it is simply there when you start OBS. No need for the chain of programs to be up and running and connected.

For capturing MIDI I use a simple and cheap USB MIDI interface. It even does not have a brand I think. MIDI on windows is very sensitive to plugging it into a different USB port, that is why I put it on a separate interface that never moves. The plugin is setup to listen for a note being played and then trigger a video source to be shown. The video source starts playing from the start when becoming visible and never loops. I keep a spreadsheet with all the bindings to keep track of the note numbers used on different channels.

I hope I have inspired you to make MIDI control a reliable part of your live stream. There are many more useful applications possible. Also for instance using a Launchpad to trigger actions on sources in your Scenes.

Understanding audio mixing and routing in OBS

Last Friday I had a Halloween themed livestream in OBS. I wanted point the viewers to my upcoming song release and I wanted to play back a video with an interview I had with Pi.A about our collaboration on the new song. I tested everything in the afternoon and in the evening it turned out that there was no sound on the stream from my audio system. Bummer. After restarting the stream seemed to work again. Just a touch of real Halloween horror? I hope I can help you troubleshoot or make more advanced use of audio mixing and routing in OBS.

I spent this whole afternoon checking and rechecking my audio mixing and routing in OBS and I found no problem at all. It turns out that these things just happen. Let’s say that that is the charm of performing live, lol. However, it was something that had been long on my list to check out thoroughly, because I also had problem earlier with audio. So here is a recap of everything I know now.

Since you are probably a musician reading this, I won’t bore you with mixer basics like dBs and I’ll assume that you know how clipping sounds on audio output. I will also assume that you have no interest in Twitch Video On Demand stuff that is baked into OBS, because it is more geared towards gaming streamers.

It all starts with the sources

Every OBS source that outputs sound will appear in your center Audio Mixer panel. You can adjust the levels to your needs. Green is the safe area. Yellow is where your speech and music should ideally be. Red is the danger zone where the dreaded clipping might occur. You can mute a source to make sure it will not be recorded or streamed. By default all audio sources will be mixed to your stream or recordings.

OBS for now misses a master output level indicator. It might be added in the future. So if you mix in many different sources you end up guessing if the output will be OK. For now I exclusively mix the ASIO source, so that makes it easy to make sure the output is right. I mix in the Desktop audio just in case I have a sound from the PC I want to quickly mix in.

Just to be sure I added the Limiter as a filter. It’s not impossible to get an overcooked sound in the livestream, but the risk of clipping or overcooking goes down.

On the right track

If you right click in the Audio Mixer section you can go to the Advanced Audio Properties. Here you see all your sources, that is to say you can still choose to see only active sources. I have more complex scenes where I choose per song I play live which sources are active. On the right side you will see a block of six Tracks. These are stereo tracks you can mixdown to, so in fact you have a mixer before you with six stereo tracks.

Your live stream will use only one track. You can choose which one in the Advanced Streaming Settings for Output. By default it will be track 1. Now what is the use of having separate tracks to mix to? These tracks can be used for recording video with OBS. The tracks will end up written separately in the video file it records.

Gamers use this to have a track with the sound of the game and a track with their voice-over. Maybe also a separate track for sound effects and one for music. Then when they stream to Twitch they can leave out copyrighted music and still upload to YouTube with music. In the Advanced Recording Settings for Output you can choose which tracks will be written into the video output file.

Monitoring the output

If you leave everything set up as it is by default, all audio sources will output to your stream or recordings as you have just set it up. However there will not be any monitoring of the streamed or recorded output. This makes it heard to find the balance between different sources, if you have them. If you want to start monitoring you will have to select a Monitor option in the Audio Mixer settings. So there is also a stereo monitoring channel next to the six tracks.

In the Audio Settings, you will find the monitoring section. Here you can choose where to output the monitoring to. Please be aware of the latency that OBS introduces on the monitor output. You can’t use it live. You can just use it to find the right balance between different audio sources.

So what if there is no sound?

If you look in the monitoring options for an audio source in the Audio Mixer, there is even option to output sound to the monitoring channel, but not to the output tracks! So there is the option to set the level of an audio source to -inf, to mute the output and the option to switch off output to any of the output tracks. On top of that you could choose a track that does not get mixed to your stream. Very flexible, but it can also make it hard to find out why you have no sound on your stream.

There are only two ways to test your audio before really going live. The first one is to make sure you output the same track as your streaming track on a recording and then record a short video. The second one is to stream to an unlisted (YouTube) or private (FB) stream and then check the result. In short it is a miracle that this was the first time in maybe 50 livestreams that I had audio problems. Because it is live it is hard to know what the audience is hearing. I always ask at the start of a stream. Audio is really a tricky thing in a livestream to get right.

Controlling Ableton Live 10+ with the Komplete Kontrol A49 revisited

A long time ago I wrote something about getting my, then brand new, Komplete Kontrol A49 to work. I played around with it and soon found out it was still a work in progress with control surface tweaks and drivers. I also found out that my struggling to get it to work then is still the number one article on this blog. When you look for instructions in your favorite search engine on how to get the Komplete Kontrol A49 keyboard to work you will get here. Now it’s several versions later for both Ableton Live and the Native Instruments Komplete Kontrol software, so It was a good moment to revisit the matter to see how things have progressed.

I am happy to report that setting everything up now is a breeze. Looking back, everything started to work straight out of the box with version Ableton Live 10.0.5. More good news, it still works straight out of the box in Ableton Live versions 11+. Support has become integrated now. From the corner of my eye I did see that there might be problems with Komplete Kontrol S series and Ableton Live 11+ versions, but I am not able to verify that. So, what does the support mean? It means that you can immediately start working with your Komplete Kontrol A series keyboard by selecting it as a control surface in the Preferences > Midi > Control Surface section by selecting the Komplete Kontrol Surface and the corresponding DAW input and output.

Ableton Live MIDI Preferences settings

This is just the start. If you downloaded and activated the Komplete Kontrol software from Native Instruments (through Native Access), you will find the Komplete Kontrol VST instrument as a Plug-ins intstrument. Drag it into a MIDI track and you will have instant Kontakt instrument browsing from your track. Now that takes some getting used to I must admit. Please note the following. Your A series keyboard display browse much more responsive then the Komplete Kontrol VST, so ignore the screen and focus on the tiny A series display when browsing. Click the Browse button on the A series keyboard to jump back to browsing at any point.

Browsing the Strummed Acoustic instrument inside the Komplete Kontrol VST

When browsing Kontakt instruments, nudge the browse button left or right to step deeper and back into the levels of browsing process. So at the top level you choose your either Kontakt instruments, loops or one shots. At the deepest level you choose your sounds. You will hear the selection audition a sound as you browse. If you push (don’t nudge) the browse button down as a button it will select the auditioned sound. This might take a while, so be patient. After that remember that you can click the Browser button again and nudge left several times to back to the top level. Keep your eye on the tiny display to see where you are browsing.

Once you inside the Plug-in MIDI button will light up and you will notice that the controls on your A series keyboard will automatically control the instrument macro’s. Again, touch the knob to see on the tiny display which parameter or macro is controlled and tweak and turn to get the perfect sound. This is how your keyboard should have worked from the start of course, but I’m happy to see how it has progressed. For all other plain MIDI control use you can still use the method of placing your instrument in a rack and MIDI mapping the controls to your instrument.

Swapping faces in video with Deepfake

This is my first adventure with Deepfake technology. This blog is intended to show you how to get you started. In short its actually a technology that has a very dark side where it seems to be possible to make photo’s that show faces of people in videos or photo’s they’ve never appeared in by swapping faces. It can be done very fast and usually very unconvincingly by some apps on your phone.

The full blown and latest software can actually let politicians or your neighbor do and say crazy things very realistically and this way can corrupt your believe of what is truth or fake. Very scary. It also has a very creative side. Why can’t you be a superhero in a movie? I experimented with this creative side.

A new song for me is a new story to tell. Then a second way to tell the story is with a video clip and I like to tinker around with new ideas for video clips. Most musicians leave it at just a pretty cover picture and dump it on YouTube, but I like to experiment with video. There is a new song that is in the making now and I already found beautiful footage with a girl and a boy. The first step I take is to make a pilot with the footage and ask people if they like the concept of the clip.

Then I bumped into someone very creative on Instagram and when I showed the video it triggered some crazy new ideas. Why not make the story stronger with flashbacks? And there I thought why not swap myself in those flashbacks? The idea to use Deepfake technology was born. But how to get going with Deepfake?

Tools

First investigations led to two different tools: DeepfaceLab and Faceswap. There are many more tools, but in essence its probably all the same. Extraction tools to find faces in pictures. A machine learning engine like Tensorflow to train a model to swap two faces and converter tools to generate the final video. For you machine learning may be magic, but I already knew it from earlier explorations. Simply said its possible to mimic the pattern recognizing (read: face and voice recognizing here) that we humans are so good at.

Machine learning

Machine learning in the form that we have now in Tensorflow requires at least somewhere in the range of 1000 examples of something to recognize and the correct response to output when something is recognized. By feeding this into the machine learning engine it uses it can be trained to output a picture with a face replaced when recognizing the original face. To be able to make a reliable replacement the original and replacement data have to be formatted and lined up to make automated replacement possible. One aspect of the machine learning process is that it benefits a lot by GPU processing i.e. a powerful video card in your PC. This is important because current training mechanisms need around a million training cycles.

Faceswap software
Faceswap software

I chose Faceswap, because for DeepfaceLab it was harder to get all the runtimes. Faceswap has a simple setup tool and nice graphic user interface. The technology is complex but maybe I can help you getting started. By the time you read this there are probably many other good tools, but the idea remains the same. The Faceswap setup first installs a Conda Python library tool. Then all the technology gets loaded and a nice UI can be launched. There is one more step you need to do. You need to find out which GPU tooling you can use to accelerate machine learning. For a NVidia graphics card you will need to have CUDA installed.

Step 1: Extraction

The first step is actually getting suitable material to work with. The machine learning process needs lots of input and desired output in the form of images. At least around 1000 is a good start. This could mean 40 seconds of video at 25 fps, but 10 minutes of video will work even better of course. You can expect the best results if these match up as closely as possible. Even to the point of lighting, beards, glasses etc. If you know the target to do the face swap on you should find source material that matches as close as possible

Then its extraction time. This means already applying machine learning to find faces in the input and then extract these as separate images. These images contain only the faces, straightened up and formatted to get them ready to be used for the face swap training process. You need to extract faces from both the target and source video. For every face image the extraction process also records where the extracted image is found and how to crop and rotate the face to place it back. These are stored in Alignment files.

After extraction you need to single out only the faces that you’re interested in in case there are multiple faces in either source or target. From that point you can go to the next step, but the quality of the end result depends very much on the extraction process. Check the extracted images and check them again. Weed out all images that the learning process should not use. Then regenerate the associated Alignment files. Faceswap has a separate tool for this.

Step 2: Training

By passing in the locations of the target (A) and source (B) images and Alignment files you are ready for the meat of the face swap process, the machine learning training. Default settings dictate that training should involve 1.000.000 cycles of matching faces in target images to be replaced by faces in the source images. By default for all machine learning the software hopes that you have a powerful video card. In my case I have an NVidia card and CUDA and this works by default. If you don’t have a video card you can work without one. I found it slows the process down by a factor 7. My GPU went from 35% usage to 70% usage.

Deepfake GPU usage
Deepfake GPU usage

In my experiments I had material that took around 8 hours to train 100.000 times, so it would take 80 hours to train 1.000.000 times. Multiply that times 7 and you know its a good idea to have a powerful video card in your PC. During training you can see previews of the swap process and indicators for the quality of the swaps. These indicators should show improvement and the previews should reflect that. Note that the previews show face swaps vice versa. So even at this point you can switch source and target.

Training process with previews
Training process with previews

I saw indicators going up and down again, so at some point I thought that it was a good time to stop training. I quickly found out that the training results, the models, where absolutely useless. Bad matches and bad quality. At that point I went back to fixing the extractions again and rerunning the training. Much simpler, if the previews show fuzziness of the swap, the final result will also be fuzzy. So keeping track of the previews gives you a good idea of the quality of the final result. The nice thing about Faceswap is that it allows you to save an entire project. This makes it easier to go back and forth in the process.

Step 3: Converting

This is the fun part. The training result, the model, will be used to swap the faces in the target video. Faceswap generates the output video in the form of a folder with the image sequences. You will need a tool to convert this to a video. The built-in tool to convert images to video didn’t work for me. I used stop motion functionality from Corel VideoStudio. If the end results disappoints, its time to retrace steps in extraction or training. Converting is not as CPU/GPU intensive as training. You can at any point stop the training and try conversion out. Then when you start training again it builds on the last saved state of the model. If the model is crap, delete it and start over.

Deepfake sample (video DanielDash)
Deepfake sample (video DanielDash)

Here is a snip of the first fuzzy results. The final end result is not yet ready. Mind you, the song for the video clip is not yet ready. I will share the results here if it is all done. I hope now this is start for you to try this technology out now for your video’s! Please note that along the way there are many configuration options and alternative extraction and training models to choose from. Experimenting is time consuming, but worth it.

One more thing. Don’t use it to bend the truth. Use it artistically.

Why not a cartoon video?

So this is what one of the interviewers said when I visited the local radio station here: “why not a cartoon video?” It was a passing remark when going over my video channel after the radio interview. Its something that this person, working with lots of creatives at the art academy in Den Haag, can easily say. But what if you’re just this guy in the attic? How to make a cartoon video? Not easy. This is how I got close to the result I was looking for with my video release for Perfect (Extended Remix).

A go to place is of course Fiverr. Here you can find animation artists and have your cartoon video in no time. There are actually animation sites that allow you to make your own animated video with stock figures and objects and I tried it. The first results where promising, but you need to go on a payed subscription to have maximum freedom. Even then you’ll find its mostly targeted towards business animations and infographics. A fun video clip animation is still hard to make. If you want you can try it: Animaker.

Eventually I stumbled upon this Video Cartoonizer. Its not free, but it seemed like it could do some pretty amazing stuff with “cartoonizing” existing video. You can see parts of the original video material here. Its quite funky and in many ways old fashioned software. It takes agonizing days to process video recordings like this, but the end result was quite amazing. Model Sara was also pretty pleased with the result. So there you have it. My first “cartoon” video.

The secret sauce: Molekular effects

This is a glance in my kitchen where I will tell you my kitchen secret: the sauce. You will find it somewhere on almost every song I released, the Molekular effects inside a Reaktor FX chain. This is an effect powerhouse that I use to bring life to otherwise maybe repetitive or otherwise uninteresting sounds. It’s well hidden somewhere in the infinite sound and effect library of Native Instruments. However, if you use Reaktor as part of your workflow, you might already know it. It’s sound experimentation to the max.

Sound experimentation to the max! A messy kitchen
Sound experimentation to the max!

It’s hard to dive into the features of Molekular, because its really overflowing with possibilities. Just a look at the interface can already make your brain explode. Imagine that underneath that interface all kinds of wires are running to connect everything with anything. Reaktor users will be used to it, because it will be just a set of modules like all other modules. Please check out all video’s explaining the Molekular effects chain on the Native Instruments site.

Molekular effects
Molekular effects

I will try to make a start though. It starts with putting a Reaktor FX plugin in your effects chain. Then inside the FX plugin you load Molekular. Then in essence it starts on the bottom row. There you will see a chain of effects, that you can start modulating. The chain connections are depicted in the top right section. Effects can be chained one after the other, or parallel, or a combination of serial and parallel mixed. Then in the top left and middle you can choose how to modulate all the effect parameters.

The effects are just plain awesome. Hard filters, delays, reverbs, pitch shifters. Everything you need to bring bland sounds to life. You can make a rhythmic track tonal, or vice versa. You can drown sounds in distorted delays or otherwise alienating effects, or bring subtle life to a sound.

On the left side there are LFO’s, Envelopes, a step sequencer and a complex form of logic modulation. The modulation methods kind of overlap here and there and can then be interconnected to multiply or randomize the modulation of the effect chain. Then in the middle is a center piece, an X-Y modulator that can be set in motion by logic or the step sequencer, or by you.

The greatest power of this all is that if you replay your song you will have all modulations, no matter how complex, take place exactly the same way. The modulation can have complexity, but also repeatability in time. If you are a fan of totally random every time, this is always an option. For me the magic is the repeatability.

It means that I can just try some alchemy in effect chains and mess around with the modulation. If I find something that sounds cool, I can let it sound as cool every time. Assuming that you, like me, start the render from the same point every render time, the modulation of the effects will be the same. I find it inviting for experimentation, because it is rewarding if I find something that works.

There is only one problem now. With my luck, now that I tell you about it, it will probably jinx everything and it will be discontinued or stop functioning soon. This will really mean that I will have to freeze a machine software wise to allow it to keep running Molekular. With this in mind I will just tell you about it, so you can do the same.

OBS: With Green Screen

If you have seen my recent live streams, you will have noticed that I ‘travel around’ these days while live streaming. I’ve started to use the Green Screen effect. With OBS Studio its so dead simple that you can start using it with a few clicks in your OBS Studio scenes. Of course there are also some caveats I want to address. The main picture for this post shows you what it can look like. It may not be super realistic, but it is eye catching.

So what do you need to get this going? A Green Screen is the first item you need. It does not have to be green. It can be blue or blue-green, but it should not match skin color or something you wear. It should cover most of the background, so it will need to be at least 2 meter by 1.6 meter, which is kind of a standard size you can find in shops. It should be smooth and solid. Creases and folds can result in folds in the backdrop, but some rippling is OK.

Green Screen selfie
Green Screen selfie

Then you need to set up OBS Studio. Its as simple as right-clicking your camera in the scene and selecting the Filters properties. In the dialog add the Chroma Key filter and select the color of your green screen. Then slide Similarity from somewhere around 100-250 to get a good picture. Everything outside the color range will become black. Then add a backdrop image (or video!) somewhere below the camera in the the scene list and you will have your Green Screen effect.

OBS Camera Filter
OBS Camera Filters
OPBS Chroma Key Filter settings
OPBS Chroma Key Filter settings

The first caveat I bumped into was that I set it up during daytime and it kind of worked, but then I found I stream in at night time and then you need light. In fact it turned out that 2 photo studio lights came in handy. When you use at least 2 studio lights they also cancel out shadows through folds and creases in the green screen. It does however bleed a little onto you as a subject, so you will be strangely highlighted as well. This is something you can also see in my first Amsterdam subway picture. Because of the uneven lighting in subways it does not really show. Not every picture is suitable as a backdrop. Photos with people or animals don’t work, because you expect them to move.

The second effect you see is that instruments with reflective surfaces also reflect the green screen. This will result in the background shining through reflecting surfaces. My take is that its a minor distraction, so I accept some shining through of the backdrop. Its also possible that some parts of your room don’t fit well with the Green Screen, doorways or cupboards. In that case you can choose to crop the camera in the scene by dragging the sides of the camera in the scene with the Alt-key (or Apple key) down. The cropped camera borders, will be replaced by the backdrop.

OBS: Live streaming with good audio quality

In a previous post I mentioned that I use OBS Studio for my live streaming and a little bit about how. It shows that I use an ASIO plugin for audio in the OBS Studio post, but why is it needed? For me in the live stream I want to recreate the studio quality sound, but with a live touch. After all, why listen to a live stream when could just as well listen to the album or single in your favorite streaming app? Lets first see where the ASIO plugin comes into play.

Live Streaming Setup
Live Streaming Setup

My setup in the studio is divided in two parts. One part is dedicated to studio producing and recording, with a Focusrite Scarlett 18i8, a digital Yamaha mixing desk and a MIDI master keyboard. For recording I use Ableton Live. The other part is the live setup, with (again) Ableton Live, another Focusrite Scarlett 18i8, a Clavia Nord, Micro Korg and the Zoom L12 mixing desk. The live setup will directly connect to the PA with a stereo output. Both sides run on separate PCs (laptops).

Home Studio Live Side
Home Studio Live Side

For OBS Studio and the live streaming setup, I chose to use PC on the studio recording side. Its directly connected to the Internet (cabled) and can easily handle streaming when it doesn’t have to run studio work. I play the live stream on the set dedicated to playing live and i use the live side stereo PA audio out to connect it to the studio side to do the live streaming. This means the live side if the setup is exactly as I would use it live.

Home Studio Recording Side
Home Studio Recording Side

It all starts with the stereo output on the Zoom L12 mixing desk, that normally connects to the PA. On the mixing desk there is vocal processing and some compression on all channels to make it sound good in live situations. To get this into the live stream as audio I connect the stereo output to an input of the Yamaha mixing desk. This is then routed to a special channel in the studio side audio interface. This channel is never used in studio work.

Of course it could be that your live setup simpler then mine. Maybe only a guitar and and a microphone. But the essential part for me is this that you probably have some way to get these audio outputs to a (stereo) PA. If you don’t have a mixing panel yourself and you usually plug in to the mixing desk at the venue, this is the time to consider your own live mixing desk for streaming live. With vocal effects and the effects that you want to have on your instruments. Maybe even some compression to get more power out of the audio and make it sound more live.

But lets look at where the ASIO plugin comes into play. The ASIO plugin takes the input of the special live channel from the Yamaha mixing desk using the studio side audio interface and that becomes the audio of the stream. Because I have full control over the vocal effects on the live side, i can just use a dry mic to address the stream chat and announce songs. Then switch on delay and reverb when singing. Just like when I play live, without the need for a technician even.

Playing a live stream is different from playing live, because it has a different dynamic. In a live stream its OK to babble and chat minutes on end, this is probably not a good idea live. I find however when it comes to the audio, it helps to start out with a PA ready output signal. Similar to the audio you would send to the PA in a real live show. Also it helps to have full hands on control over your live audio mix to prevent you having to dive into hairy OBS controls while streaming live. Lastly, for me its also important that streaming live is no different from a playing live at a venue in that you can break the mix, miss notes, mix up lyrics and that you feel the same nerves while playing.

Streaming live with OBS Studio

Okay, like everybody else i started streaming too. I had a planned live show, but live shows will not be possible for at least another half year. Every evening my social timelines start buzzing with live streams and all the big artists have also started to stream live. No place for me with my newly created and sometimes shaky solo live performance to make a stand? After some discussions with friends i decided to make make the jump.

But how to go about it? If you already have experience with live streaming, you can skip this entire article. This is here just for the record so to say. After some looking around I came to this setup:

OBS Studio with ASIO plugin
Restream.io for casting to multiple streaming platforms
Logitech C920 webcam
Ring light
– Ayra ComPar 2 stage light see this article

OBS is surprisingly simple to set up. It has its quirks. Sometimes it does not recognize the camera, but some fiddling with settings does the trick. You define a scene by adding video and audio sources. Every time you switch from scene to scene it adds a nice cross fade to make the transition smooth. You can switch the cross fade feature off of course.

OBS Main scene setup
OBS Main scene setup

I only use one scene. The video clip is there to promote any YouTube video clip. It plays in a corner and disappears when it has played out. The logo is just “b2fab” somewhere in a corner. The HD cam is the C920 and the ASIO source is routed from my live mixer to the audio interface on the PC. I setup a limiter at -6db on the ASIO audio as a filter to make sure i don’t get distortion over any audio peaks.

I also had to choose my platform. From the start i wanted also to stream live on Facebook and Instagram. Instagram however kind of limits access to live streaming to only phones. There is software to stream from a PC, but then you have to set it up again for every session and you need to switch off two-factor authentication. For me one bridge too far for now.

I chose Restream.io as a single platform to set up for streaming from OBS. It then allows to stream to multiple platforms and bundle all the chats from the different platforms into a single session. For Facebook pages however, you need a paid subscription tier. For now I selected the free options YouTube, Twitch and Periscope. YouTube because it is easy to access for my older audience. Twitch because it seemed quite fun and i also like gaming. Periscope because it connects to Twitter.

If the live show takes shape i might step into streaming from my Facebook page. Another plan is to try the iRig Stream solution and start making separate live streams on Instagram. With high quality audio from the live mixer. I will surely blog about it if i start working with it.

For now it all works. Restream.io allows me to drop a widget on my site. Its a bit basic and only comes alive when i am live, so i have to add relevant information to it to make it interesting. If you want to drop in and join my live musings check my YouTube, Twitch and Periscope channels or my site at around 21:00 CEST.