AI generated images are still not perfect, but why?

So I enjoy making pretty visuals that go along with the music, photography and video. It is always tricky to get what you are looking for. And what if you want a video of a burning car, or a picture of a woman falling from the sky? Meet the new kid on the block: AI. Wouldn’t be great if you could prompt your perfect AI generated image or video? Anything you wish for and then get it. Ready for use in any music visual? Sign me up!

Maybe you tried it like I did. Prompt something interesting or funny in Dall-E and and then the results were either amazing or hilariously bad. Well, lets look at it this way. Pandora’s box is open and there is no way it will close again. At first I dismissed AI imagery as uncontrollable, but these last months I found a a way to control it. I will explain how and then probably you will see why i still feel like I’m collaborating with a sorcerer’s apprentice.

In this article I will try to explain how generating images with AI works. And I will try to explain recent progress that has been made so you see that much more is possible than you maybe suspect. All my findings resulted in the AI generated images you find in this article and yes they are sometimes amazing and sometimes hilarious. I did at one time write about faceswap techniques, another way to use AI in much the same way. The old article is now a bit dated. Lets first understand more about image generation.

AI generated image of “a grand piano on a beach”

Training the sorcerer’s apprentice

There are many online services that offer magic straight out of the box. But what happens behind the curtains is well hidden. For all services it all starts with tagged visual content. Images with textual descriptions that correctly describe what’s in the picture. AI is just about recognizing patterns and qualifying these. So how is it possible that an AI generates something?

The clever trick is that people found out you could pit two AI’s against each other. Imagine a student and a teacher. The student AI (generator) is good at messing around with an image. Starting with just random noise. In endless cycles it shows the results of messing around with the image and showing it to the teacher AI (discriminator). The teacher then gives a grade that indicates which result better matches the given prompt.

But the problem is that the AI is mindless. Even though the image can match the prompt it doesn’t have to be realistic. Five fingers or six? Eyes a bit unfocussed? An extra limb? It is best compared with dreaming, sometimes even hallucinating. There is no logic that corrects the result, just random luck or accident. It seems that the human touch is to maybe be able to imagine things, but then make sure it is within the bounds of reality. The wizard’s apprentice is doing some uncontrollable magic.

AI generated images come at a cost

The whole training process and the the work of of generating images costs a lot of time and computing power. You will always have to pay in one way or another for generating images, let alone video, which is just lots and lots of images. In the case of video another AI trick needs to be done. Guessing what image will follow another image and have this pre-trained and processed. More training, more time, more computing power.

Create your own magic

Getting a lot of images and having them all perfectly described is the magic sauce of AI imagery. This is what the student and the teacher need to make the right images. It determines what kind of images can be generated and also the quality. All images are turned into small markers and then there are clever algorithms to find and combine the markers (vectors) to recreate an image. By letting the student and the teacher use sampling to navigate these markers you can start to generate images. The teacher tries to make sure it matches the prompt for the image.

Now is it possible to have an AI generate a specific person? Specific clothing? Specific settings? Yes this is possible. If I pretrain with only images of me, doing all kinds of things, the result will be that I’m the main character in all generated images. If you then prompt “A man playing a violin on the beach”, it will be me playing the violin. Now hopefully there were also images of playing the violin and images of a the beach. But so many images with correct descriptions? An almost impossible complex task it seems.

Or tame the magic

Let’s assume you don’t have these large sets of perfectly described images to pre-train. Now here is progress of the whole AI image generation in the past year. Imagine that you can start with a well pre-trained set of markers with lots of well described images. Clever people found out that you can just add to any of this AI generating stuff a few hints, a Low Rank Adaptation (LoRA). For images, imagine this. Take a smaller set of well described images and allow these to just tweak the results of the original sampling of a pre-trained set of images. And you trigger the results with new keywords in the descriptions.

The results are stunning. I found a way to get this all working on servers with enough computing power. Starting with a good pre-trained set of images called Flux, I then learned how to start training LoRA’s with my own images and descriptions. Now i can simply ask for an AI generated image of me playing the piano on the beach. Flux knows how to get to an image of someone playing the piano on the beach. The LoRA knows how to make this picture look like me. Of course I trained a LoRA with a set of pictures of myself. I trained it with a set of pictures of Alma, my imaginary persona. When Swiss DJ Oscar Pirovino visited me for Amsterdam Dance Event he wanted to participate and I generated another LoRA.

AI generated image of “oscar in a black suit playing a black grand piano on the beach”

The computing power needed is still immense, but now the results are much more controllable. The Flux pre-trained set is good enough to make sure that I can generate a nice photographic setting. A small set of well described additional images can tweak the results to have me, Alma or Oscar in the picture every time. The results are still unpredictable in details of course. In a way the results of a photoshoot are also a bit unpredictable, but in a real life photoshoot I can rearrange stuff and try different poses on the spot. No doubt AI generated images are getting there and maybe in a few years it will be just like a photoshoot. We will see!

AI generated image of “thewim in a black suit playing a black grand piano on the beach”

The secret sauce: Molekular effects

This is a glance in my kitchen where I will tell you my kitchen secret: the sauce. You will find it somewhere on almost every song I released, the Molekular effects inside a Reaktor FX chain. This is an effect powerhouse that I use to bring life to otherwise maybe repetitive or otherwise uninteresting sounds. It’s well hidden somewhere in the infinite sound and effect library of Native Instruments. However, if you use Reaktor as part of your workflow, you might already know it. It’s sound experimentation to the max.

Sound experimentation to the max! A messy kitchen
Sound experimentation to the max!

It’s hard to dive into the features of Molekular, because its really overflowing with possibilities. Just a look at the interface can already make your brain explode. Imagine that underneath that interface all kinds of wires are running to connect everything with anything. Reaktor users will be used to it, because it will be just a set of modules like all other modules. Please check out all video’s explaining the Molekular effects chain on the Native Instruments site.

Molekular effects
Molekular effects

I will try to make a start though. It starts with putting a Reaktor FX plugin in your effects chain. Then inside the FX plugin you load Molekular. Then in essence it starts on the bottom row. There you will see a chain of effects, that you can start modulating. The chain connections are depicted in the top right section. Effects can be chained one after the other, or parallel, or a combination of serial and parallel mixed. Then in the top left and middle you can choose how to modulate all the effect parameters.

The effects are just plain awesome. Hard filters, delays, reverbs, pitch shifters. Everything you need to bring bland sounds to life. You can make a rhythmic track tonal, or vice versa. You can drown sounds in distorted delays or otherwise alienating effects, or bring subtle life to a sound.

On the left side there are LFO’s, Envelopes, a step sequencer and a complex form of logic modulation. The modulation methods kind of overlap here and there and can then be interconnected to multiply or randomize the modulation of the effect chain. Then in the middle is a center piece, an X-Y modulator that can be set in motion by logic or the step sequencer, or by you.

The greatest power of this all is that if you replay your song you will have all modulations, no matter how complex, take place exactly the same way. The modulation can have complexity, but also repeatability in time. If you are a fan of totally random every time, this is always an option. For me the magic is the repeatability.

It means that I can just try some alchemy in effect chains and mess around with the modulation. If I find something that sounds cool, I can let it sound as cool every time. Assuming that you, like me, start the render from the same point every render time, the modulation of the effects will be the same. I find it inviting for experimentation, because it is rewarding if I find something that works.

There is only one problem now. With my luck, now that I tell you about it, it will probably jinx everything and it will be discontinued or stop functioning soon. This will really mean that I will have to freeze a machine software wise to allow it to keep running Molekular. With this in mind I will just tell you about it, so you can do the same.

Why you should start using a 360 camera

Already four years ago I started using a 360 camera. At that time I wanted to create those videoclips where you are really in the set and I wanted viewers to experience the video. The video quality was then an issue and for me it still is, unless you have a solid budget to spend. At the 3.000 euro price point video quality is no longer a big issue. At the lower end however, things have improved slightly. I have now invested in an Insta360 ONE X at a fraction of that price, 400 euro. What has persuaded me to invest in this camera if the quality is only slightly better?

First off, it comes with software that allows you to take your full 360 degree recording and cut out a flat rectangle that looks like you recorded it with a normal camera. Where is the advantage in that? It is actually intended to allow you use it as an action camera and then in the video editing cut out, pan and zoom into any action around you. You can see samples of this in the product page. What use is that to me as a musician, you might ask. Well, how about filming a whole gig from several points and cutting, panning and zooming into all the action on stage and in the crowd? Also the software has some really captivating special effects like speeding up, turning the 360 view into a ball, fish eye etc.

Secondly, it has rock-solid stabilization, because it uses gyroscopes to record all movements. This also ensures that the recording is perfectly horizontal, even when recording at an angle. You will find that if there is too much movement in your recording, most viewers will become sea sick really fast. A smooth recording and stable recording makes the difference. I can now confidently record while walking. Also freaky is that if you use a selfie stick to hold the camera, the software will remove the stick. It will appear as if the camera is hovering above you.

Schemerstad
Schemerstad

Thirdly, it actually matters that the quality of the recordings is at least slightly better than that of the first generation of 360 cameras. The performance in low light is dramatically better and the 25% increase in pixels of camera’s in the same price range does make a difference. Am I completely happy? No of course not. I can really and wholeheartedly recommend the ONE X at the lower price tier. It has made some impossible recordings possible and I will keep using 360 as part of my video recording to capture the action and experiences.

So this is why you too (as a musician) should start using a 360 camera. Not because you want people to experience VR, but to capture everything and decide how to use the recording later. On stage and everywhere the action happens.

Komplete Kontrol A49, you’re not using it right

Please note: starting with version 10.0.5, support for the A40 keyboard is integrated in Ableton live. If you use a new version of Ableton Live please read this article.

After a month of working on singing and performing. Everything but working in the studio, I wanted to get up and running again with making music. As always, I started with updating the studio software. When updating the Native Instruments (NI) suite I am using, the A49 was part of the updates. When playing around in Ableton Live after that it soon became obvious that things did not work quite right. So it was time to reserve some hours diving into this.

The NI Native Access manager was updated and the first step is then of course to check all the software installations inside it. It soon turned out that the VST installation path of Komplete Kontrol was not correct anymore. NI likes to think that it is the only source for plugins on your computer, so I needed tot tell it that VSTs are located elsewhere on the computer. The Komplete Kontrol installation was then fixed by reinstalling. Nice.

After checking if both the version of Komplete Kontrol inside Live and Komplete Kontrol as a standalone application were matching. Things started working again. A plugin rescan was needed to pick up all NI instruments in both versions, so a lot of instrument settings were not matching up apparently. Also a quick scan of the MIDI integration settings revealed that the integration was still correct.

I use the Komplete Kontrol Rack VST in Ableton Live, but when you update your NI software this is not automatically updated in Ableton Host Integration. Time to copy vst files (vst) all over again from C:\Program Files\Common Files\Native Instruments\Host Integration\Ableton Live to D:\Documents\Ableton\Library\Presets\Instruments\Instrument Rack. Or some equivalent on a Mac.

This Komplete Kontrol instrument rack can host any plug in instrument and map the A49 knobs to macros to controls in the instrument. Please note: Only use this for all instruments other than NI instruments! You must manually map any control to any control inside the instrument. Not very pretty, but once you’ve set it up it works.

And what if you do want to use a NI instrument? I also found out that instead of adding Kontakt to a track to start working with a NI instrument, as I always did, its better to use the Komplete Kontrol plugin. This immediately gives you full control with the A49 and allows you to quickly switch instruments on the fly. Oh well. Never too old to learn.

Zoom H1n – Singing in the car

When there is an opportunity to practice singing I take it and is there a better place to sing than in the car? Probably not. Technically its the wrong position for your body to sing, but somehow singing along in the car just sounds better. It is probably the closed area and the close in mix with the sound system that makes it work. The question is then, because practicing and recording go hand in hand: can I also record in the car?

To that end I’ve tried the voice recorder of my phone and to say it bluntly: that doesn’t work. Only car sounds, no music. Fortunately I found something that does work. The Zoom H1n recorder. Now I can practice singing and at the end of the journey hear if I am on the right track and which songs need more work. Also, its a great way to experiment with new ideas along an otherwise boring trip from A to B.

The Digital Signal Processing (DSP) of modern voice recorders is not tuned to music recording. Probably anything that works as a voice recorder simply does not work for singing in a car. You need to have a broader frequency range otherwise the car sounds will just take over. On top of that you need something that can be operated while driving, so it has to be a one button start/stop operation and the recording device must really be mounted securely.

Enter the Zoom H1n. It has a camera mount, so any camera stand that can be used in the car will hold it. Then its one button to start and one button to stop. You can even feel your way through the operations so there is no need to take your eyes of the road. All the other editions of the Zoom portable recorder range will probably also do the job, but not at the price point of the H1n. Did you know you can also use it as an ASIO device? Other brands might also offer the same experience, but you should check the mounts and the capabilities for recording music.

Trying out the Spitfire eDNA Earth instrument

I will try to write about my impressions with the Earth instrument. However, I will not completely review it. For in-depth reviews please check MusicRadar or TheAudioSpotlight or others. For me, ever since Camel Audio was bought by Apple and its Alchemy synthesizer disappeared as a standalone virtual instrument, I felt lost. Alchemy had a granular synthesis engine and a unique way to parameterize its sounds. The unique sound of this instrument disappeared and there was nothing to replace it. Omnisphere apparently is capable of recreating some sounds, but that is mainly because it can synthesize anything and its priced accordingly. The moment I heard a demo for Earth, I heard back some of that Alchemy sound again.

Technically its a completely different beast, compared to Alchemy. The Earth sounds are based on an orchestral sample library, but are then processed by the Kontakt engine to sound, cinematic, outer worldly and sometimes electronic. Yes its a Kontakt instrument, so you need at least the Kontakt player. Inside Kontakt you will find the eDNA interface of this instrument. As an owner of a Komplete Kontrol A series keyboard, this is very convenient. It means I can use the Komplete Kontrol browser to quickly browse through the sounds and immediately tweak parameters of the sounds once loaded.

The Kontakt engine and the eDNA interface of Earth takes some getting used to. To make sure you fully understand its workings its a good idea to go through the walk through on the Spitfire Audio site. In short, every sound consists of two samples from the library. Which are mangled, then mixed, then chopped up and lastly processed by a set of effects. Very important is to see that you have sounds, but also full versions of the same sound. The full version contains the full range of orchestral samples. This allows you not only to start with a fixed set of samples, but eventually switch out one of the samples for another.

The result is that you get a sound that is usually cinematic. Sometimes a wash or a drone in the background and sometimes a sharp stab in the foreground. Because of the mangling and the chopping, sounds can really get that grainy Alchemy sound, or a dirty sound. None of the patches is really clean. I can only say: I love it. All sounds immediately inspire to let you build a soundscape. Even better, with a Komplete Kontrol Keyboard you can also immediately start changing the sound, bringing it even more to life.

If are looking for cinematic sounds, drones, or dirty stabs and you want an affordable synth then I invite you to take a look at this Kontakt library. In most reviews you will find some comments on eDNA interface of this instrument and I have to agree that it can be kind of hard to find your way in elements that are not inviting you to click or drag. After some getting used to it is not that bad. All in all: recommended!

SoundBrenner Pulse wearable metronome, first impressions

What people say

This product appears everywhere in timelines on social media when you’re interested in making music. I must say it immediately got my attention when I saw it. For me the appeal is that would solve the problems playing along with the computer when practicing or playing live. I don’t always have live musicians to play along with and the computer is unforgiving. Any metronome is welcome there and the SoundBrenner Metronome app is then already of great help.

But now the Pulse is there and it adds to this a haptic vibrating metronome you can feel. Now you don’t have to look at blinking lights while playing. Also, I use Ableton Live, also live, and there is even the possibility to use Ableton Link with Metronome app. If this all works together as one integrated haptic Metronome that allows me to feel the tempo while playing along with Ableton? Perfection! The ultimate gadget heaven!

Before buying I always look around for reviews and more info. One big complaint is that it is not an actual watch kind of thing. A lot of people hoped that it would also display the tempo. It doesn’t. You have to look at the screen of your phone (or tablet) to see settings and tempo. This also means that you have to keep the phone screen on. At the same time the Pulse is connected via Bluetooth. The phone is the brains, so you must at all times keep it charged and connected. A challenge, specially live.

Then there is some word going around on it not being accurate, but I think that is already fixed now through firmware updates. Another complaint is that it takes time to get used to ‘feeling’ the tempo. I guess that a lot of people send it back immediately, but I am more patient. Most new skills take time to get used to and I am quite convinced that this Pulse is a good idea. But now for first impressions.

What I say now

When you first start using the Pulse you will find that it is a bit fiddly to operate. You have to tap the watch face to start using it, but its not really touch sensitive. You have to really press it to pick up the taps. Then, straight out of the box it is set to really buzz the rhythm very strongly. And audibly also. Fortunately you can immediately go back to the app to set it to a more friendly and short vibration. In the lightest mode it really feels okay, but I play keyboards, When playing a more physical instrument, like drums, I can imagine you need the stronger buzz.

Charging it is also fiddly. It is a small kind of dock that has to properly connect to the device. After popping the Pulse in the band it gets even harder to let it connect to the charging dock, because the band pushes it from the dock and the dock can easily slip away, because its so light. People complain about the time that the device can be used on a full charge, but I don’t have enough experience now to say if it is really a problem for me.

Then its time to start practicing and linking it up with Ableton Link. That’s where all starts to get a little bit flakey for now. Ableton Link somehow goes in and out of the connection with the app. Which is ok for practicing in my case, but I don’t think this is ready for playing live. Also my phone sometimes loses connection with the Pulse after several minutes of playing. My phone is an Android phone, running Oreo and I know it can be very aggressive in killing background processes, specifically if they draw power. Probably that is not helping here, so I want to try it with another iOS device also.

One other thing to mention: its quite a big device. Maybe better suited for male wrists. There is another bigger strap in the box for your leg or your upper arm, but this device will have a hard time looking elegant on fragile ladies arms.

First conclusions now:

  • Big. Don’t expect this device to be light to operate, you really have to tap hard
  • Dive in to the settings to tune it to your preferences
  • Great for practicing, but for playing live this is a really complex setup to get and keep running

I hope this helps you appreciating the device for what it is now. I will keep using it and I’ll keep you up to date. Please note that there is also a new Soundbrenner device on Kickstarter that is actually more like watch, the Core.

Editing a music video for IGTV

From lying down to upright

Instagram took everyone by surprise by introducing the new upright video format IGTV video channel for all users. Shooting a video was obviously a horizontally oriented wide screen experience, matching the orientation of TVs and cinema. Instagram stories however were always vertically oriented to match the way you naturally hold your phone. IGTV nicely cultivates that. Some people always record vertically and that footage is then hard to show on TV, YouTube and such. Now you have a new outlet for that, enter IGTV.

If you have your material for your music video already recorded in upright position then you are so ready to edit it for IGTV! What I can see however is that not many existing recordings were ready for IGTV, so many decided to just clip off some footage from the left and right to keep the middle bit. The worst ones cut off parts of the titling so you can clearly see that its not the right IGTV stuff. As a viewer you feel cheated, because obviously you’re missing parts of the video.

But what if you have already recorded a video clip to be shown on YouTube and its in the landscape format? How to reuse that recording to make something that looks right on IGTV? What are the technicalities of the new IGTV video format?

Tunnel vision

The first step for me with the landscape clip for the Just a Game video, was to render it without the titling. All titling that does not fit the vertical format. The format to go for is HD, but then with reversed horizontal and vertical resolutions. So 1080×1920. With lengthy music video clips, you will find that upright portrait HD results in files that are too big in size. There is a size limit for regular video uploads, a maximum 10 minutes length and 650MB. The error messages from IGTV are not at all revealing unfortunately. A clip of 4 minutes length or more however, can easily go over 650MB. Then you will have to consider HDReady 720×1280.

IGTV Pan and Zoom
IGTV Pan and Zoom

If you removed the titling because of the landscape format, now is the time to redo the titling for the vertical format to show the viewer that you have intended this clip to be in IGTV format. After that, all you will have to do is to use pan and zoom to cut out the upright sections of the clip that really show the user all the action in the clip. This way you don’t have to give away that the clip recording was not intended for IGTV. As always I am using Corel VideoStudio for the simple work and its capable of rendering the required output for IGTV. Now its time to upload! Tell me about your experiences!

Connecting the Logitech Craft knob to Ableton Live

Just in, the gadget of the month: the Logitech Craft. I was looking out for some more control over the mixing process and of course there are many controllers. When you already have an Ableton Push what more do you need? Well actually there is a thing about me and Push. I cannot use it blindly, so I always have to look at either the screen, or the controls, or the display. When mixing in the Ableton Live arrangement view it gets worse. Mouse, keyboard, screen, Push… It is at its best in Session View.

There were two things I was looking for. A high quality ‘chicklet’ keyboard like on my new Lenovo and it has an extra: A Knob. A dial that is touch sensitive and clickable to perform specific actions in any part of any program that has focus on your desktop. I am quite sure that your regular keyboard and a Microsoft Dial controller wil also make up good combo, but I chose the Craft to replace my old and clunky keyboard with media controls.

Unpacking and installing was the easy part. The previous keyboard was also a Logitech and it used the same Unified remote. Switch on and off and the keyboard was connected. Then a disappointment! No profile for Ableton Live. With a profile the keyboard recognizes the program its in and it immediately adds some shortcuts to the knob to control. For instance in a browser you can select a tab with the knob. In Photoshop you can zoom. In Lightroom you can change the exposure, or so I’m told. Standard functionality in other applications is controlling the volume of the PC and clicking it will pause/play music.

So there I was staring at Ableton, without being able to use the knob. I started diving into the settings, and there i found the Development Mode. Click it and you will need to also enable sending stats to Logitech. Tough but there is no escape.

From there you can select more programs to control with the knob and yes, Ableton Live is there!

And lo and behold, assigning up and down buttons allows you to control Ableton Live mixing with the knob. A new world opens up, where you can look at the screen. Listen to the mix and control a setting in Ableton Live with the knob. This was what I was looking for, more control and a better keyboard for the daily typing chores. Yay!

Fixing phase problems in a mix

Please note: If you are experienced in mixing this article will probably state the obvious. This blog is mostly intended as a “note to self” and everybody else that cares to take interest.

Recently one of my music friends Hanny told me that she had performed at a gig with her new band HannaH (check it out) that was broadcast live on local radio. Of course, she had asked for a recording that could be used for promotional purposes. There seemed to be no problems with the actual broadcast, but the recording behaved very strangely. When listening on a phone, the guitar disappeared. When you listened with headphones, your head would explode. The whole mix seemed strangely unbalanced. Phase problems… But we got out of it with a result that was even better than the original!

How did we get here?

These kind of phase problems probably have a very simple reason. In a stereo mix, somehow two wires in one channel were wired the wrong way. One bad cable can do the trick. What happens if some or left or right channel wires are wired the wrong way? The signals cancel each other out. Simple math shows it:

But the top signal seems ok, right? Well its drawn maybe a little incorrectly. The top signal actually has a little more richness of the waveforms to it and a little of cancellation. When used right, you will have a Phaser effect. Something you can find in any DAW and set of guitar effects as an effect that also wobbles the phase to add a little excitement and widening to a otherwise boring signal. Overdo it, or mistreat a stereo signal and you get cancellations and left and right stretched to far. It can result in headaches while listening.

How to get out of here?

So you now you have a mix with phase problems and its not your mix, just a stereo sound file. Aside from plugins with the purpose of fixing phase problems, is there a way to get out with just the tools you have? Fortunately for me there is. Ableton has a utility effect section that centers around treating left, right and mono signals from a track. I am quite sure that your DAW has similar built in effects.

The trick here was to duplicate the signal and create one track with the phase difference signal and one track where left and right were mixed into a mono signal.


Now one track only featured the guitar. Proof that there was a phase problem with just the guitar in the stereo mix. The other track featured everything except the guitar. When mixing both in mono suddenly I as able to remix the recording! Do you want more or less guitar in this song? No problem? All problems solved for HannaH. Good enough in mono, because the audio was for promotional purposes. Once again like in a Dutch football proverb: Every downside has its upside. Happy mixing!