Splicing in The Big O’s original intro with FFmpeg… again!

Caoimhe ⬥ 26th of March 2025

I have posted before about using FFmpeg to restore The Big O’s original intro and also my Jellyfin server. I am going to talk about the former again but first some more detail on the latter.

My Jellyfin server is actually just my desktop, which acts as server for everything on my home network. It has a 12TiB hard drive for storage which is divided up into a few partitions, one of which is my “library” partition of important files I want to keep, and which I make regular backups of, and one of which is my “media” partition that holds the files for my Jellyfin server. The media partition was running out of space (I may go a bit extreme with the bluray rips) and the library had several times more free space than used, so I decided to try to resize them and grow the media partition into some of the library partition’s empty space.

I just used KDE’s built-in partition manager for this, which successfully shrank the library partition but for some reason failed when trying to grow and move the media partition. I don’t know why this happened and am just hoping there’s no hardware problems. Nothing from the library partition was lost (and it was all backed up anyway) but the media partition was gone and so I’ve had to rebuild my Jellyfin library. This is not a huge deal it’s just been a little time-consuming but one of the things that was lost was, of course, my edited Big O episodes, which meant that I had to redo splicing the original intro in. But I had a fresh head again, free of the frustrations accumulated while trying to do this the first time, and I decided to do it better and actually get to grips with FFmpeg’s filter syntax. Here are the commands I ended up with:

                mkdir -p tmp
mkdir -p out
for v in The\ Big\ O*.mkv
	set b (basename "$v" ".mkv")
	ffmpeg -i intro.webm -ss 00:01:12.02 -i "$v" -filter_complex "[0:v] scale=1424:1080,setsar=1:1 [intro]; [intro][0:a][0:a][1:v][1:a:0][1:a:1] concat=n=2:v=1:a=2 [outv][outa];" -shortest -map "[outv]" -map "[outa]" -metadata:s:a:0 language=eng -metadata:s:a:1 language=jpn "tmp/$v"

	set subs "$b.ass"
	ffmpeg -itsoffset -4.42 -i "$v" "tmp/$subs"
	ffmpeg -i "tmp/$v" -i "tmp/$subs" -shortest -map 0 -map 1 -c copy -metadata:s:s:0 language=eng "out/$v"
end

              

Let’s break down what’s happening here. First I create two directories, tmp and out. tmp is where temporary working files are going to be written to and out is for the final files when we’re finished processing.

Then loop over each episode matroška file with the file name for each one assigned to $v inside the loop. The file name without the file extension is set to the variable $b. I’m using a Fish shell here rather than Bash so the syntax is a little different to Bash.

Then the big command. We pass in the first input, intro.webm, which is the intro that I downloaded off of Youtube. Our second input is the episode, with the seek parameter -ss telling FFmpeg to skip to one and twelve point zero two seconds in when reading it. This is, unintuitively, set before you specify the input it applies to, not after.

Then the big -filter_complex. This takes a big string that takes filter definitions separated by semicolons. Each filter has input and output streams identified by labels in square brackets.

The first filter is [0:v] scale=1424:1080,setsar=1:1 [intro]. Its input is [0:v], the video stream from the first input¹, i.e., intro.webm. It then resizes it to a resolution of 1,424×1080 pixels and sample aspect ratio of 1:1 and outputs it to a new stream labelled [intro]. The [intro] stream now has the same resolution as our episodes which will allow us to concatenate them in the next filter.

The second filter is [intro][0:a][0:a][1:v][1:a:0][1:a:1] concat=n=2:v=1:a=2 [outv][outa]. Let’s start in the middle here. concat=n=2:v=1:a=2 means that we are going to concatenate two segments (n=2) which each have one video stream (v=1) and two audio streams (a=2). Those two audio streams are going to be the English and Japanese dubs.

The inputs for this filter are [intro][0:a][0:a][1:v][1:a:0][1:a:1], which can be divided into our two segments—[intro][0:a][0:a] and [1:v][1:a:0][1:a:1]—which each have one video and two audio streams specified. The first segment has our resized intro video stream, [intro], and [a:0] is the audio from our first input (the intro again) specified twice because we are going to combine the same intro audio with both the English and Japanese episode audio. The second segment has the video and two audio streams from our second input file; the episode itself and its English and Japanese audio tracks.

The concatenation then has two output streams, [outv][outa], the video and audio.

Then the rest of the command: -shortest makes sure that the output of the command is equal to the shortest stream in the output, i.e. if your output has five minutes of video but only two minutes of audio then the output will be two minutes long rather than five minutes of video with three minutes of silence. I think that shouldn’t really be needed here but I was using it while testing and forgot to remove it and thought it would be dishonest to take it out for the post when it was what I actually ran.

-map "[outv]" -map "[outa]" defines what streams to include in the output, which here is simply the output streams of our concatination.

-metadata:s:a:0 language=eng -metadata:s:a:1 language=jpn labels the audio output streams as being English and Japanese, respectively, so that media players can display that information.

And then the last part of the command is ouputting the video to the tmp folder.

This gives us output files with the original intro with both dub tracks preserved, which is more than I had last time and with a lot less processing. But if I am going to include the Japanese audio I probably also want subtitles for that and unfortunately the -ss parameter does not seem to correctly offset the subtitles. If I want subtitles with correct timing I will have to fix them with another command.

First set a variable, $subs to the file name we want for the subtitles in the Advanced SubStation format.

Then read the original episode file into FFmpeg again with a negative offset of 4.42 seconds (-itsoffset -4.42) and write the subtitle data to a file in the tmp folder.

The last command is taking in out output video and the subtitle file and recombining them, using another metadata command to label the subtitle track as English, setting the codec mode (-c) to copy so that the audio and video do not get re-encoded and writing the finished file to the out folder.

I didn’t bother fixing the chapters this time.

FFmpeg indexes from 0, so [0:v] refers to the video stream from the first input, [1:a:0] refers to the first audio stream from the second input, etc. ↩

#The Big O #FFmpeg #Jellyfin