Discussion:
[FFmpeg-user] Encoding multiple files for adaptive bit rate streaming using RTMP and HLS
Joel Lopez
2015-05-11 23:25:37 UTC
Permalink
We allow the viewers to select a low bandwidth or high bandwidth file
and stream just the selected file. I am trying to move us to one
player with that does adaptive bit rate switching. Of course I'd like
the videos to be playable in the older iOS, androids as well as
desktops with slow and fast connections.

The files currently stream have unaligned keyframes with the following
bit rates and sizes.
738 kb/s 426x240
1340 kb/s 854x480

Can I add a few lower versions to the mix and a higher one? Can I
realign the keyframes of the existing files? In my testing it seems
to switch ok with these 2 files so far so I'm not so sure how crucial
aligned key frames are. What have you seen out there?

I see YouTube offers 6 and sometimes 7 different qualities for
different connections.
1080p
720p
480p
360p
240p
144p

What bitrates and dimensions do you think they're using? What do you
recommend is good for adaptive bit rate switching?

Someone was kind enough to help me out and get me started with this
command. He said I didn't need to change the dimensions when making
different bit rate versions. What's your opinion? At least not
upscaling which makes sense. But if I want the videos to play on
older phones should I make smaller dimensions?

How do I change this command to make the different bitrate or sizes?

ffmpeg [rawvideo demuxer options if required] -i input -c:v libx264
-crf 23 -preset medium -pix_fmt yuv420p -c:a libfdk_aac -vbr 5
-movflags +faststart output.mp4

Thanks for any help.
Werner Robitza
2015-05-12 09:31:18 UTC
Permalink
Post by Joel Lopez
Can I add a few lower versions to the mix and a higher one?
Yes, you should do that.
Post by Joel Lopez
Can I realign the keyframes of the existing files?
That's going to be hard. You're better off re-encoding from scratch
with a fixed interval.
Post by Joel Lopez
I'm not so sure how crucial aligned key frames are. What have you seen out there?
The recommendation is definitely to keep them aligned; otherwise
you'll run into sync issues.
Post by Joel Lopez
What bitrates and dimensions do you think they're using? What do you
recommend is good for adaptive bit rate switching?
See https://developer.apple.com/library/ios/technotes/tn2224/_index.html#//apple_ref/doc/uid/DTS40009745-CH1-SETTINGSFILES

It's a good starting point.
Post by Joel Lopez
He said I didn't need to change the dimensions when making
different bit rate versions. What's your opinion?
Typically you should vary both bit rate and resolution, and frame rate
if you're targeting really crappy devices.
Changing just the bit rate but not the resolution may make the video
look worse at lower bit rates than if it had a lower resolution and
the same low bit rate. I recently did a test with bitrate-only
switching and I'm not sure if I'd recommend that.
Post by Joel Lopez
How do I change this command to make the different bitrate or sizes?
ffmpeg [rawvideo demuxer options if required] -i input -c:v libx264
-crf 23 -preset medium -pix_fmt yuv420p -c:a libfdk_aac -vbr 5
-movflags +faststart output.mp4
Check the H.264 encoding guide
(https://trac.ffmpeg.org/wiki/Encode/H.264). Do not use the CRF option
but use "-b:v 1M" for 1 MBit/s, or similar. Ideally, do a two-pass
encode for better quality, if you have the time.
Henk D. Schoneveld
2015-05-12 09:47:44 UTC
Permalink
Verstuurd vanaf mijn iPhone
Post by Werner Robitza
Post by Joel Lopez
Can I add a few lower versions to the mix and a higher one?
Yes, you should do that.
Post by Joel Lopez
Can I realign the keyframes of the existing files?
That's going to be hard. You're better off re-encoding from scratch
with a fixed interval.
Post by Joel Lopez
I'm not so sure how crucial aligned key frames are. What have you seen out there?
The recommendation is definitely to keep them aligned; otherwise
you'll run into sync issues.
Post by Joel Lopez
What bitrates and dimensions do you think they're using? What do you
recommend is good for adaptive bit rate switching?
See https://developer.apple.com/library/ios/technotes/tn2224/_index.html#//apple_ref/doc/uid/DTS40009745-CH1-SETTINGSFILES
It's a good starting point.
Post by Joel Lopez
He said I didn't need to change the dimensions when making
different bit rate versions. What's your opinion?
Typically you should vary both bit rate and resolution, and frame rate
if you're targeting really crappy devices.
Changing just the bit rate but not the resolution may make the video
look worse at lower bit rates than if it had a lower resolution and
the same low bit rate. I recently did a test with bitrate-only
switching and I'm not sure if I'd recommend that.
Post by Joel Lopez
How do I change this command to make the different bitrate or sizes?
ffmpeg [rawvideo demuxer options if required] -i input -c:v libx264
-crf 23 -preset medium -pix_fmt yuv420p -c:a libfdk_aac -vbr 5
-movflags +faststart output.mp4
Check the H.264 encoding guide
(https://trac.ffmpeg.org/wiki/Encode/H.264). Do not use the CRF option
but use "-b:v 1M" for 1 MBit/s, or similar. Ideally, do a two-pass
encode for better quality, if you have the time.
Would you be so kind to explain why to NOT use the crf option?
Post by Werner Robitza
_______________________________________________
ffmpeg-user mailing list
http://ffmpeg.org/mailman/listinfo/ffmpeg-user
Werner Robitza
2015-05-12 11:50:22 UTC
Permalink
Post by Henk D. Schoneveld
Would you be so kind to explain why to NOT use the crf option?
CRF is essentially a constant quality mode, which results in variable
bitrate depending on the spatiotemporal complexity of the scenes. For
streaming purposes, this is not ideal, since your adaptive streaming
client assumes that a segment encoded at a target bitrate of x kBit/s
is can actually be transmitted over a link with x kBit/s throughput.

If you want to make sure you're not exceeding a certain bandwidth, the
VBV encoding mode is probably the best option (see also
https://trac.ffmpeg.org/wiki/EncodingForStreamingSites). Therefore,
set -maxrate and -bufsize.

A single-pass constant bitrate will not generally be more stable than
CRF, but it should be less spikey.

See this chart for a comparison between CRF, single-pass CBR and
single-pass CBR with -maxrate set: Loading Image...
y-axis is the frame size, moving average of 120 frames.
Henk D. Schoneveld
2015-05-12 13:16:21 UTC
Permalink
Post by Werner Robitza
Post by Henk D. Schoneveld
Would you be so kind to explain why to NOT use the crf option?
CRF is essentially a constant quality mode, which results in variable
bitrate depending on the spatiotemporal complexity of the scenes.
Your goal is max quality within a given link-capacity I assume.
Upfront choosing an arbitrary bitrate to achieve max possible quality seems sub-optimal/contradictionary to me.
A. to many bits for talking heads
B. to few bits for action dominant events.
Post by Werner Robitza
For
streaming purposes, this is not ideal, since your adaptive streaming
client assumes that a segment encoded at a target bitrate of x kBit/s
is can actually be transmitted over a link with x kBit/s throughput.
Where both the stream and link bitrate are averages. A stream consists of I and P and sometimes B frames, the size of these individual frames differ in size. For example choosing a small GOP-size, with relatively more I-frames, will result in more or less 'avoidable' relatively lower average quality.
Post by Werner Robitza
If you want to make sure you're not exceeding a certain bandwidth, the
VBV encoding mode is probably the best option (see also
https://trac.ffmpeg.org/wiki/EncodingForStreamingSites). Therefore,
set -maxrate and -bufsize.
A single-pass constant bitrate will not generally be more stable than
CRF, but it should be less spikey.
See this chart for a comparison between CRF, single-pass CBR and
single-pass CBR with -maxrate set: http://i.imgur.com/GxTW4Jy.png
y-axis is the frame size, moving average of 120 frames.
I see the difference between the methods, but I don’t really understand what it’s trying to tel me. What does the X-axis say, total stream-size/#frames/ ?
Post by Werner Robitza
_______________________________________________
ffmpeg-user mailing list
http://ffmpeg.org/mailman/listinfo/ffmpeg-user
Werner Robitza
2015-05-12 13:50:43 UTC
Permalink
Post by Henk D. Schoneveld
Post by Werner Robitza
Post by Henk D. Schoneveld
Would you be so kind to explain why to NOT use the crf option?
CRF is essentially a constant quality mode, which results in variable
bitrate depending on the spatiotemporal complexity of the scenes.
Your goal is max quality within a given link-capacity I assume.
Upfront choosing an arbitrary bitrate to achieve max possible quality seems sub-optimal/contradictionary to me.
A. to many bits for talking heads
B. to few bits for action dominant events.
Yes, but that's still what's typically done, unless you choose a CQ
encoding type with a max-rate (which is what libvpx recommends too).

You should've mentioned that you're experienced with this -- otherwise
I would've given a different answer.
Post by Henk D. Schoneveld
I see the difference between the methods, but I don’t really understand what it’s trying to tel me. What does the X-axis say, total stream-size/#frames/ ?
It's the frame number (index) in a 3 minute sequence.
Henk D. Schoneveld
2015-05-12 14:32:17 UTC
Permalink
Post by Werner Robitza
Post by Henk D. Schoneveld
Post by Werner Robitza
Post by Henk D. Schoneveld
Would you be so kind to explain why to NOT use the crf option?
CRF is essentially a constant quality mode, which results in variable
bitrate depending on the spatiotemporal complexity of the scenes.
Your goal is max quality within a given link-capacity I assume.
Upfront choosing an arbitrary bitrate to achieve max possible quality seems sub-optimal/contradictionary to me.
A. to many bits for talking heads
B. to few bits for action dominant events.
Yes, but that's still what's typically done, unless you choose a CQ
encoding type with a max-rate (which is what libvpx recommends too).
But libvpx is much less efficient than libx264 for the same quality.
Post by Werner Robitza
You should've mentioned that you're experienced with this -- otherwise
I would've given a different answer.
Hm what difference does it make if I know a little bit more or less to how/what you’re answering ?
Another option to ‘optimise’ quality for a given bitrate is to use anamorphic encoding. I know YouTube doesn’t accept that but all modern players I know of handle it without any problem.
For 720p I use -s 880x720 which works for me pretty well. With this you’ll get a reduction of (1280-880)/1280=31% in needed bits for very reasonable quality. At least a lot better then reducing the bitrate by that % and using 1:1 ie. -s 1280x720 encoding.
Post by Werner Robitza
Post by Henk D. Schoneveld
I see the difference between the methods, but I don’t really understand what it’s trying to tel me. What does the X-axis say, total stream-size/#frames/ ?
It's the frame number (index) in a 3 minute sequence.
_______________________________________________
ffmpeg-user mailing list
http://ffmpeg.org/mailman/listinfo/ffmpeg-user
Henk D. Schoneveld
2015-05-12 17:36:13 UTC
Permalink
Post by Henk D. Schoneveld
Post by Werner Robitza
Post by Henk D. Schoneveld
Post by Werner Robitza
Post by Henk D. Schoneveld
Would you be so kind to explain why to NOT use the crf option?
CRF is essentially a constant quality mode, which results in variable
bitrate depending on the spatiotemporal complexity of the scenes.
Your goal is max quality within a given link-capacity I assume.
Upfront choosing an arbitrary bitrate to achieve max possible quality seems sub-optimal/contradictionary to me.
A. to many bits for talking heads
B. to few bits for action dominant events.
Yes, but that's still what's typically done, unless you choose a CQ
encoding type with a max-rate (which is what libvpx recommends too).
But libvpx is much less efficient than libx264 for the same quality.
Post by Werner Robitza
You should've mentioned that you're experienced with this -- otherwise
I would've given a different answer.
Hm what difference does it make if I know a little bit more or less to how/what you’re answering ?
Another option to ‘optimise’ quality for a given bitrate is to use anamorphic encoding. I know YouTube doesn’t accept that but all modern players I know of handle it without any problem.
For 720p I use -s 880x720 which works for me pretty well. With this you’ll get a reduction of (1280-880)/1280=31% in needed bits for very reasonable quality. At least a lot better then reducing the bitrate by that % and using 1:1 ie. -s 1280x720 encoding.
Test results by doing as forementioned
2442734 May 12 2015 Source_Code-crf22-1280.mp4
1656672 May 12 2015 Source_Code-crf22-880.mp4
2045725 May 12 2015 Source_Code-crf23-1280.mp4
1403091 May 12 2015 Source_Code-crf23-880.mp4

This are video only files.
Post by Henk D. Schoneveld
Post by Werner Robitza
Post by Henk D. Schoneveld
I see the difference between the methods, but I don’t really understand what it’s trying to tel me. What does the X-axis say, total stream-size/#frames/ ?
It's the frame number (index) in a 3 minute sequence.
_______________________________________________
ffmpeg-user mailing list
http://ffmpeg.org/mailman/listinfo/ffmpeg-user
_______________________________________________
ffmpeg-user mailing list
http://ffmpeg.org/mailman/listinfo/ffmpeg-user
Joel Lopez
2015-05-12 19:08:02 UTC
Permalink
Thanks for the advice. Now I guess I should encode a video using
double pass and CFR and compare them. There seems to be a hot debate
about which is best.

The settings below are what the apple.com recommends for 4:3. Forgive
my slowness but I'm having a tough time converting what sites
recommend into ffmpeg commands. Now especially double pass and CFR.


From the wiki page on encoding for streaming they show this example
https://trac.ffmpeg.org/wiki/EncodingForStreamingSites

ffmpeg -i input.mkv -vcodec libx264 -preset medium -maxrate 3000k
-bufsize 6000k \
-vf "scale=1280:-1,format=yuv420p" -g 50 -acodec libmp3lame -b:a 128k
-ac 2 -ar 44100 file.flv

so is this how I should change it? attempting the highest and lowest

ffmpeg -i input.mp4 -vcodec libx264 -preset medium -maxrate 264k -bufsize 500k \
-vf "scale=400:-1,format=yuv420p" -g 50 -acodec aac -b:a 64k -ac 2 -ar
48000 output_400.mp4

ffmpeg -i input.mp4 -vcodec libx264 -preset medium -maxrate 5120k
-bufsize 10000k \
-vf "scale=1280:-1,format=yuv420p" -g 50 -acodec aac -b:a 128k -ac 2
-ar 48000 output_1280.mp4

I'm totally guessing on -bufsize by doubling the -maxrate. What
should it actually be?
Also on the scale, what's the difference between -1 and -2?
Since the -g = 50 does this mean my keyframes will be aligned 5 seconds apart?

400x300
Frame Rate = 12
Total Bit Rate = 264
Video Bit Rate = 200
Audio Bit Rate = 64
Audio Sample Rate = 48
Keyframe = 36
Profile = Baseline, 3.0
B-frames = 0
Segment Size = 9

480x360
Frame Rate = 15
Total Bit Rate = 464
Video Bit Rate = 400
Audio Bit Rate = 64
Audio Sample Rate = 48
Keyframe = 45
Profile = Baseline, 3.0
B-frames = 0
Segment Size = 9

640x480
Frame Rate = 29.97
Total Bit Rate = 664
Video Bit Rate = 600
Audio Bit Rate = 64
Audio Sample Rate = 48
Keyframe = 90
Profile = Baseline, 3.0
B-frames = 0
Segment Size = 9

640x480
Frame Rate = 29.97
Total Bit Rate = 1296
Video Bit Rate = 1200
Audio Bit Rate = 96
Audio Sample Rate = 48
Keyframe = 90
Profile = Baseline, 3.1
B-frames = 0
Segment Size = 9

960x720
Frame Rate = 29.97
Total Bit Rate = 3596
Video Bit Rate = 3500
Audio Bit Rate = 96
Audio Sample Rate = 48
Keyframe = 90
Profile = Main, 3.1
B-frames = as needed
Segment Size = 9

1280x960
Frame Rate = 29.97
Total Bit Rate = 5128
Video Bit Rate = 5000
Audio Bit Rate = 128
Audio Sample Rate = 48
Keyframe = 90
Profile = Main, 3.1
B-frames = as needed
Segment Size = 9
Post by Henk D. Schoneveld
Post by Henk D. Schoneveld
Post by Werner Robitza
Post by Henk D. Schoneveld
Post by Werner Robitza
Post by Henk D. Schoneveld
Would you be so kind to explain why to NOT use the crf option?
CRF is essentially a constant quality mode, which results in variable
bitrate depending on the spatiotemporal complexity of the scenes.
Your goal is max quality within a given link-capacity I assume.
Upfront choosing an arbitrary bitrate to achieve max possible quality seems sub-optimal/contradictionary to me.
A. to many bits for talking heads
B. to few bits for action dominant events.
Yes, but that's still what's typically done, unless you choose a CQ
encoding type with a max-rate (which is what libvpx recommends too).
But libvpx is much less efficient than libx264 for the same quality.
Post by Werner Robitza
You should've mentioned that you're experienced with this -- otherwise
I would've given a different answer.
Hm what difference does it make if I know a little bit more or less to how/what you’re answering ?
Another option to ‘optimise’ quality for a given bitrate is to use anamorphic encoding. I know YouTube doesn’t accept that but all modern players I know of handle it without any problem.
For 720p I use -s 880x720 which works for me pretty well. With this you’ll get a reduction of (1280-880)/1280=31% in needed bits for very reasonable quality. At least a lot better then reducing the bitrate by that % and using 1:1 ie. -s 1280x720 encoding.
Test results by doing as forementioned
2442734 May 12 2015 Source_Code-crf22-1280.mp4
1656672 May 12 2015 Source_Code-crf22-880.mp4
2045725 May 12 2015 Source_Code-crf23-1280.mp4
1403091 May 12 2015 Source_Code-crf23-880.mp4
This are video only files.
Post by Henk D. Schoneveld
Post by Werner Robitza
Post by Henk D. Schoneveld
I see the difference between the methods, but I don’t really understand what it’s trying to tel me. What does the X-axis say, total stream-size/#frames/ ?
It's the frame number (index) in a 3 minute sequence.
_______________________________________________
ffmpeg-user mailing list
http://ffmpeg.org/mailman/listinfo/ffmpeg-user
_______________________________________________
ffmpeg-user mailing list
http://ffmpeg.org/mailman/listinfo/ffmpeg-user
_______________________________________________
ffmpeg-user mailing list
http://ffmpeg.org/mailman/listinfo/ffmpeg-user
Joel Lopez
2015-05-12 19:11:50 UTC
Permalink
(Did this one not top post? Trying to figure out how to avoid that on Gmail)

Thanks for the advice. Now I guess I should encode a video using
double pass and CFR and compare them. There seems to be a hot debate
about which is best.

The settings below are what the apple.com recommends for 4:3. Forgive
my slowness but I'm having a tough time converting what sites
recommend into ffmpeg commands. Now especially double pass and CFR.


From the wiki page on encoding for streaming they show this example
https://trac.ffmpeg.org/wiki/EncodingForStreamingSites

ffmpeg -i input.mkv -vcodec libx264 -preset medium -maxrate 3000k
-bufsize 6000k \
-vf "scale=1280:-1,format=yuv420p" -g 50 -acodec libmp3lame -b:a 128k
-ac 2 -ar 44100 file.flv

so is this how I should change it? attempting the highest and lowest

ffmpeg -i input.mp4 -vcodec libx264 -preset medium -maxrate 264k -bufsize 500k \
-vf "scale=400:-1,format=yuv420p" -g 50 -acodec aac -b:a 64k -ac 2 -ar
48000 output_400.mp4

ffmpeg -i input.mp4 -vcodec libx264 -preset medium -maxrate 5120k
-bufsize 10000k \
-vf "scale=1280:-1,format=yuv420p" -g 50 -acodec aac -b:a 128k -ac 2
-ar 48000 output_1280.mp4

I'm totally guessing on -bufsize by doubling the -maxrate. What
should it actually be?
Also on the scale, what's the difference between -1 and -2?
Since the -g = 50 does this mean my keyframes will be aligned 5 seconds apart?

400x300
Frame Rate = 12
Total Bit Rate = 264
Video Bit Rate = 200
Audio Bit Rate = 64
Audio Sample Rate = 48
Keyframe = 36
Profile = Baseline, 3.0
B-frames = 0
Segment Size = 9

480x360
Frame Rate = 15
Total Bit Rate = 464
Video Bit Rate = 400
Audio Bit Rate = 64
Audio Sample Rate = 48
Keyframe = 45
Profile = Baseline, 3.0
B-frames = 0
Segment Size = 9

640x480
Frame Rate = 29.97
Total Bit Rate = 664
Video Bit Rate = 600
Audio Bit Rate = 64
Audio Sample Rate = 48
Keyframe = 90
Profile = Baseline, 3.0
B-frames = 0
Segment Size = 9

640x480
Frame Rate = 29.97
Total Bit Rate = 1296
Video Bit Rate = 1200
Audio Bit Rate = 96
Audio Sample Rate = 48
Keyframe = 90
Profile = Baseline, 3.1
B-frames = 0
Segment Size = 9

960x720
Frame Rate = 29.97
Total Bit Rate = 3596
Video Bit Rate = 3500
Audio Bit Rate = 96
Audio Sample Rate = 48
Keyframe = 90
Profile = Main, 3.1
B-frames = as needed
Segment Size = 9

1280x960
Frame Rate = 29.97
Total Bit Rate = 5128
Video Bit Rate = 5000
Audio Bit Rate = 128
Audio Sample Rate = 48
Keyframe = 90
Profile = Main, 3.1
B-frames = as needed
Segment Size = 9

Loading...