leandromoreira/ffmpeg-libav-tutorial
I used to be procuring for an academic/book that can say me easy recommendations to open to notify FFmpeg as a library (a.k.a. libav) and then I discovered out the » write a video participant in now not up to 1k traces » tutorial.
Sadly it used to be deprecated, so I determined to jot down this one.
Most of the code in right here will seemingly be in c but don’t grief: that you just may also effortlessly sign and apply it to your preferred language.
FFmpeg libav has plenty of bindings for a range of languages worship python, hotfoot and even in case your language would now not savor it, that you just may also unruffled give a enhance to it thru the ffi
(right here is an instance with Lua).
We are going to open with a fleet lesson about what’s video, audio, codec and container and then we will hotfoot to a rupture route on easy recommendations to notify FFmpeg
repeat line and at closing we will write code, feel free to skip right this moment to the piece Learn FFmpeg libav the Laborious Skill.
Some folks frail to direct that the Net video streaming is the lengthy flee of the venerable TV, on the least, the FFmpeg is one thing that worths to be studied.
Table of Contents
video – what you gaze!
Whenever you are going to savor a series series of pictures and replace them at a given frequency (for instance 24 pictures per 2d), you are going to make an illusion of circulation.
In abstract that is the very overall belief on the motivate of a video: a series of pictures / frames running at a given rate.
Zeitgenössische Illustration (1886)
audio – what you listen!
Despite the fact that a muted video can lisp a quite quantity of emotions, adding sound to it brings extra pleasure to the trip.
Sound is the vibration that propagates as a wave of stress, thru the air or any diverse transmission medium, comparable to a gas, liquid or right.
In a digital audio system, a microphone converts sound to an analog electrical signal, then an analog-to-digital converter (ADC)—veritably the notify of pulse-code modulation—converts (PCM) the analog signal accurate into a digital signal.
https://commons.wikimedia.org/wiki/File:CPT-Sound-ADC-DAC.svg
codec – skittish facts
CODEC is an digital circuit or instrument that compresses or decompresses digital audio/video. It converts raw (uncompressed) digital audio/video to a compressed format or vice versa.
https://en.wikipedia.org/wiki/Video_codec
But when we selected to pack millions of pictures in a single file and called it a movie, we’d also find yourself with a big file. Let’s develop the mathematics:
Drawl we are setting up a video with a resolution of 1080 x 1920
(high x width) and that we’re going to exhaust Three bytes
per pixel (the minimal level at a note) to encode the coloration (or 24 bit coloration, what affords us 16,777,215 diverse colors) and this video runs at 24 frames per 2d
and it’s miles half-hour
lengthy.
toppf = 1080 * 1920 //total_of_pixels_per_frame
cpp = Three //cost_per_pixel
tis = 30 * 60 //time_in_seconds
fps = 24 //frames_per_second
required_storage = tis * fps * toppf * cpp
This video would require us a storage of roughly 250.28GB
or a bandwidth of 1.11Gbps
! For this reason we want to notify a CODEC.
container – a cushty problem for audio and video
A container or wrapper format is a metafile format whose specification describes how diverse parts of facts and metadata coexist in a pc file.
https://en.wikipedia.org/wiki/Digital_container_format
A single file that contains your entire streams (mostly the audio and video) and it furthermore affords synchronization and overall metadata, comparable to title, resolution and and plenty of others.
Customarily we can infer the format of a file by taking a seek for at its extension: as an instance a video.webm
may well perchance also very successfully be a video the notify of the container webm
.
A total, scandalous-platform resolution to file, convert and dawdle audio and video.
To work with multimedia we can notify the AMAZING instrument/library called FFmpeg. Chances are high you know/notify right this moment or circuitously it (develop you make notify of Chrome?).
It has a repeat line program called ffmpeg
, a in point of fact straightforward but important binary.
As an illustration, that you just may also convert from mp4
to the container avi
unbiased by typing the apply repeat:
$ ffmpeg -i input.mp4 output.avi
We simply made a remuxing right here, which is converting from one container to 1 other one.
Technically FFmpeg will seemingly be furthermore be doing a transcoding but we will discuss about that later.
FFmpeg repeat line instrument One zero one
FFmpeg does savor a documentation that explains severely the scheme it in actuality works.
To rep things rapid, the FFmpeg repeat line program expects the following argument format to electrify its actions ffmpeg {1} {2} -i {Three} {Four} {5}
, where:
- world alternate recommendations
- input file alternate recommendations
- input url
- output file alternate recommendations
- output url
The parts 2, Three, Four and 5 would be as many as you will need.
Or now not it’s more uncomplicated to sign this argument format in action:
$ wget -O bunny_1080p_60fps.mp4 http://distribution.bbb3d.renderfarming.procure/video/mp4/bbb_sunflower_1080p_60fps_normal.mp4
$ ffmpeg
-y # world alternate recommendations
-c:a libfdk_aac -c:v libx264 # input alternate recommendations
-i bunny_1080p_60fps.mp4 # input url
-c:v libvpx-vp9 -c:a libvorbis # output alternate recommendations
bunny_1080p_60fps_vp9.webm # output url
This repeat takes an input file mp4
containing two streams (an audio encoded with aac
CODEC and a video encoded the notify of h264
CODEC) and convert it to webm
, changing its audio and video CODECs too.
Lets simplify the repeat above but then take into account that FFmpeg will undertake or bet the default values for you.
As an illustration in the event you simply kind ffmpeg -i input.avi output.mp4
what audio/video CODEC does it notify to catch the output.mp4
?
Werner Robitza wrote a must read/construct tutorial about encoding and editing with FFmpeg.
While working with audio/video we veritably develop a subject of tasks with the media.
Transcoding
What? the act of converting one of many streams (audio or video) from one CODEC to 1 other one.
Why? veritably some devices (TVs, smartphones, console and and plenty of others) would now not give a enhance to X but Y and more fresh CODECs present higher compression rate.
How? converting an H264
(AVC) video to an H265
(HEVC).
$ ffmpeg
-i bunny_1080p_60fps.mp4
-c:v libx265
bunny_1080p_60fps_h265.mp4
Transmuxing
What? the act of converting from one format (container) to 1 other one.
Why? veritably some devices (TVs, smartphones, console and and plenty of others) would now not give a enhance to X but Y and veritably more fresh containers present trendy required points.
How? converting a mp4
to a webm
.
$ ffmpeg
-i bunny_1080p_60fps.mp4
-c reproduction # simply asserting to ffmpeg to skip encoding
bunny_1080p_60fps.webm
Transrating
What? the act of changing the bit rate, or producing diverse renditions.
Why? folks will try to deem about your video in a 2G
(edge) connection the notify of a less important smartphone or in a fiber
Net connection on their 4K TVs therefore it is most practical to unruffled provide bigger than on rendition of the identical video with diverse bit rate.
How? producing a rendition with bit rate between 3856K and 2000K.
$ ffmpeg
-i bunny_1080p_60fps.mp4
-minrate 964K -maxrate 3856K -bufsize 2000K
bunny_1080p_60fps_transrating_964_3856.mp4
Customarily we will be the notify of transrating with transsizing. Werner Robitza wrote one other must read/construct series of posts about FFmpeg rate retain a watch on.
Transsizing
What? the act of converting from one resolution to 1 other one. As stated earlier than transsizing is veritably frail with transrating.
Why? reasons are about the identical as for the transrating.
How? converting a 1080p
to a 480p
resolution.
$ ffmpeg
-i bunny_1080p_60fps.mp4
-vf scale=480:-1
bunny_1080p_60fps_transsizing_480.mp4
Bonus Spherical: Adaptive Streaming
What? the act of manufacturing many resolutions (bit rates) and split the media into chunks and serve them thru http.
Why? to provide a flexible media that would be watched on a low end smartphone or on a 4K TV, it’s furthermore easy to scale and deploy then all over again it may well add latency.
How? setting up an adaptive WebM the notify of DASH.
# video streams
$ ffmpeg -i bunny_1080p_60fps.mp4 -c:v libvpx-vp9 -s 160x90 -b:v 250k -keyint_min one hundred fifty -g one hundred fifty -an -f webm -flee 1 video_160x90_250k.webm
$ ffmpeg -i bunny_1080p_60fps.mp4 -c:v libvpx-vp9 -s 320x180 -b:v 500k -keyint_min one hundred fifty -g one hundred fifty -an -f webm -flee 1 video_320x180_500k.webm
$ ffmpeg -i bunny_1080p_60fps.mp4 -c:v libvpx-vp9 -s 640x360 -b:v 750k -keyint_min one hundred fifty -g one hundred fifty -an -f webm -flee 1 video_640x360_750k.webm
$ ffmpeg -i bunny_1080p_60fps.mp4 -c:v libvpx-vp9 -s 640x360 -b:v 1000k -keyint_min one hundred fifty -g one hundred fifty -an -f webm -flee 1 video_640x360_1000k.webm
$ ffmpeg -i bunny_1080p_60fps.mp4 -c:v libvpx-vp9 -s 1280x720 -b:v 1500k -keyint_min one hundred fifty -g one hundred fifty -an -f webm -flee 1 video_1280x720_1500k.webm
# audio streams
$ ffmpeg -i bunny_1080p_60fps.mp4 -c:a libvorbis -b:a 128k -vn -f webm -flee 1 audio_128k.webm
# the DASH manifest
$ ffmpeg
-f webm_dash_manifest -i video_160x90_250k.webm
-f webm_dash_manifest -i video_320x180_500k.webm
-f webm_dash_manifest -i video_640x360_750k.webm
-f webm_dash_manifest -i video_640x360_1000k.webm
-f webm_dash_manifest -i video_1280x720_500k.webm
-f webm_dash_manifest -i audio_128k.webm
-c reproduction -scheme zero -scheme 1 -scheme 2 -scheme Three -scheme Four -scheme 5
-f webm_dash_manifest
-adaptation_sets "identity=zero,streams=zero,1,2,Three,Four identity=1,streams=5"
manifest.mpd
PS: I stole this situation from the Directions to playback Adaptive WebM the notify of DASH
Going beyond
There are many and plenty of of substitute usages for FFmpeg.
I notify it in conjunction with iMovie to catch/edit some movies for YouTube and that you just may also with out a doubt notify it professionally.
Make now not you wonder veritably ’bout sound and vision?
David Robert Jones
For the reason that FFmpeg is so functional as a repeat line instrument to develop a must savor tasks over the media recordsdata, how can we notify it in our programs?
FFmpeg is mute by quite lots of libraries that would be integrated into our possess programs.
Customarily, in the event you set up FFmpeg, it installs mechanically all these libraries. I’m going to be referring to the subject of those libraries as FFmpeg libav.
This title is a homage to Zed Shaw’s series Learn X the Laborious Skill specially his book Learn C the Laborious Skill.
Chapter zero – The immoral hello world
This hello world in actuality may well perchance also simply now not utter the message "hello world"
in the terminal
As a substitute we’re going to print out facts about the video, things worship its format (container), length, resolution, audio channels and, finally, we will decode some frames and attach them as characterize recordsdata.
FFmpeg libav structure
But earlier than we open to code, let’s be taught the fashion FFmpeg libav structure works and how its parts check with others.
Right here’s a plan of the strategy of decoding a video:
You can first want to load your media file accurate into a part called AVFormatContext
(the video container is veritably is named format).
It in actuality would now not fully load your entire file: it veritably simplest reads the header.
When we loaded the minimal header of our container, we can access its streams (deem of them as a rudimentary audio and video facts).
Every dawdle will seemingly be on hand in a part called AVStream
.
Creep is a esteem name for a right circulate of facts.
Drawl our video has two streams: an audio encoded with AAC CODEC and a video encoded with H264 (AVC) CODEC. From each and each dawdle we can extract pieces (slices) of facts called packets that would perchance be loaded into parts named AVPacket
.
The facts internal the packets are unruffled coded (compressed) and in account for to decode the packets, we want to cross them to a particular AVCodec
.
The AVCodec
will decode them into AVFrame
and at closing, this part affords us the uncompressed frame. Noticed that the identical terminology/job is frail both by audio and video dawdle.
Chapter zero – code walkthrough
TLDR; utter me the code and execution.
$ rep download $ rep cut_smaller_version $ rep hello_world
We are going to skip some critical points, but don’t grief: the provide code is on hand at github.
The major thing we want to develop is to register your entire codecs, formats and protocols.
To develop it, we simply want to name the characteristic av_register_all
:
Now we’re going to allocate memory to the part AVFormatContext
that can receive facts about the format (container).
AVFormatContext *pFormatContext = avformat_alloc_context();
Now we’re going to commence the file and browse its header and possess the AVFormatContext
with minimal facts about the format (stumble on that veritably the codecs are now not opened).
The characteristic frail to develop that is avformat_open_input
. It expects an AVFormatContext
, a filename
and two now not mandatory arguments: the AVInputFormat
(whenever you cross NULL
, FFmpeg will bet the format) and the AVDictionary
(that are the alternate recommendations to the demuxer).
avformat_open_input(&pFormatContext, filename, NULL, NULL);
We are going to have the option to print the format name and the media length:
printf("Layout %s, length %lld us", pFormatContext->iformat->long_name, pFormatContext->length);
To access the streams
, we want to read facts from the media. The characteristic avformat_find_stream_info
does that.
Now, the pFormatContext->nb_streams
will receive the quantity of streams and the pFormatContext->streams[i]
will give us the i
dawdle (an AVStream
).
avformat_find_stream_info(pFormatContext, NULL);
Now we will loop thru your entire streams.
for (int i = zero; i < pFormatContext->nb_streams; i++)
{
//
}
For each and each dawdle, we’re going to retain the AVCodecParameters
, which describes the properties of a codec frail by the dawdle i
.
AVCodecParameters *pLocalCodecParameters = pFormatContext->streams[i]->codecpar;
With the codec properties we can note up the generous CODEC querying the characteristic avcodec_find_decoder
and catch the registered decoder for the codec identity and return an AVCodec
, the part that is aware of easy recommendations to enCOde and DECode the dawdle.
AVCodec *pLocalCodec = avcodec_find_decoder(pLocalCodecParameters->codec_id);
Now we can print facts about the codecs.
// particular for video and audio
if (pLocalCodecParameters->codec_type == AVMEDIA_TYPE_VIDEO) {
printf("Video Codec: resolution %d x %d", pLocalCodecParameters->width, pLocalCodecParameters->high);
} else if (pLocalCodecParameters->codec_type == AVMEDIA_TYPE_AUDIO) {
printf("Audio Codec: %d channels, sample rate %d", pLocalCodecParameters->channels, pLocalCodecParameters->sample_rate);
}
// overall
printf("tCodec %s ID %d bit_rate %lld", pLocalCodec->long_name, pLocalCodec->identity, pCodecParameters->bit_rate);
With the codec, we can allocate memory for the AVCodecContext
, which is able to receive the context for our decode/encode job, but then we want to possess this codec context with CODEC parameters; we develop that with avcodec_parameters_to_context
.
When we stuffed the codec context, we want to commence the codec. We name the characteristic avcodec_open2
and then we can notify it.
AVCodecContext *pCodecContext = avcodec_alloc_context3(pCodec);
avcodec_parameters_to_context(pCodecContext, pCodecParameters);
avcodec_open2(pCodecContext, pCodec, NULL);
Now we’re going to read the packets from the dawdle and decode them into frames but first, we want to allocate memory for every and each parts, the AVPacket
and AVFrame
.
AVPacket *pPacket = av_packet_alloc();
AVFrame *pFrame = av_frame_alloc();
Let’s feed our packets from the streams with the characteristic av_read_frame
while it has packets.
while (av_read_frame(pFormatContext, pPacket) >= zero) {
//...
}
Let’s ship the raw facts packet (compressed frame) to the decoder, thru the codec context, the notify of the characteristic avcodec_send_packet
.
avcodec_send_packet(pCodecContext, pPacket);
And let’s receive the raw facts frame (uncompressed frame) from the decoder, thru the identical codec context, the notify of the characteristic avcodec_receive_frame
.
avcodec_receive_frame(pCodecContext, pFrame);
We are going to have the option to print the frame quantity, the PTS, DTS, frame kind and and plenty of others.
printf(
"Physique %c (%d) pts %d dts %d key_frame %d [coded_picture_number%d, display_picture_number %d]",
av_get_picture_type_char(pFrame->pict_type),
pCodecContext->frame_number,
pFrame->pts,
pFrame->pkt_dts,
pFrame->key_frame,
pFrame->coded_picture_number,
pFrame->display_picture_number
);
In the slay we can attach our decoded frame accurate into a straightforward grey characterize. The technique is terribly straightforward, we will notify the pFrame->facts
where the index is expounded to the planes Y, Cb and Cr, we simply picked zero
(Y) to connect our grey characterize.
save_gray_frame(pFrame->facts[zero], pFrame->linesize[zero], pFrame->width, pFrame->high, frame_filename);
static void save_gray_frame(unsigned char *buf, int wrap, int xsize, int ysize, char *filename)
{
FILE *f;
int i;
f = fopen(filename,"w");
// writing the minimal required header for a pgm file format
// portable graymap format -> https://en.wikipedia.org/wiki/Netpbm_format#PGM_example
fprintf(f, "P5n%d %dn%dn", xsize, ysize, 255);
// writing line by line
for (i = zero; i < ysize; i++)
fwrite(buf + i * wrap, 1, xsize, f);
fclose(f);
}
And voilà! Now we savor a gray scale characterize with 2MB:
Chapter 1 – syncing audio and video
Be the participant – a younger JS developer writing a original MSE video participant.
Sooner than we pass to code a transcoding instance let’s discuss about timing, or how a video participant is aware of the final word time to play a frame.
In the closing instance, we saved some frames that would be viewed right here:
When we’re designing a video participant we want to play each and each frame at a given trudge, in every other case it’d be now not easy to pleasantly gaze the video both on story of it’s taking half in so fleet or so late.
This signifies that truth we want to introduce some logic to play each and each frame with out be troubled. For that subject, each and each frame has a presentation timestamp (PTS) which is an rising quantity factored in a timebase that would also very successfully be a rational quantity (where the denominator is know as timescale) divisible by the frame rate (fps).
Or now not it’s more uncomplicated to sign once we note at some examples, let’s simulate some eventualities.
For a fps=60/1
and timebase=1/60000
each and each PTS will expand timescale / fps = a thousand
therefore the PTS real time for every and each frame will seemingly be (supposing it started at zero):
frame=zero, PTS = zero, PTS_TIME = zero
frame=1, PTS = a thousand, PTS_TIME = PTS * timebase = zero.016
frame=2, PTS = 2000, PTS_TIME = PTS * timebase = zero.033
For nearly the identical subject but with a timebase equal to 1/60
.
frame=zero, PTS = zero, PTS_TIME = zero
frame=1, PTS = 1, PTS_TIME = PTS * timebase = zero.016
frame=2, PTS = 2, PTS_TIME = PTS * timebase = zero.033
frame=Three, PTS = Three, PTS_TIME = PTS * timebase = zero.050
For a fps=25/1
and timebase=1/Seventy five
each and each PTS will expand timescale / fps = Three
and the PTS time will seemingly be:
frame=zero, PTS = zero, PTS_TIME = zero
frame=1, PTS = Three, PTS_TIME = PTS * timebase = zero.04
frame=2, PTS = 6, PTS_TIME = PTS * timebase = zero.08
frame=Three, PTS = 9, PTS_TIME = PTS * timebase = zero.12
- …
frame=24, PTS = Seventy two, PTS_TIME = PTS * timebase = zero.ninety six
- …
frame=4064, PTS = 12192, PTS_TIME = PTS * timebase = 162.fifty six
Now with the pts_time
we can catch an answer to render this synched with audio pts_time
or with a system clock. The FFmpeg libav affords these facts thru its API:
Real out of curiosity, the frames we saved were despatched in a DTS account for (frames: 1,6,Four,2,Three,5) but performed at a PTS account for (frames: 1,2,Three,Four,5). Additionally, stumble on how low-imprint are B-Frames in comparability to P or I-Frames.
LOG: AVStream->r_frame_rate 60/1
LOG: AVStream->time_base 1/60000
...
LOG: Physique 1 (kind=I, size=153797 bytes) pts 6000 key_frame 1 [DTS 0]
LOG: Physique 2 (kind=B, size=8117 bytes) pts 7000 key_frame zero [DTS 3]
LOG: Physique Three (kind=B, size=8226 bytes) pts 8000 key_frame zero [DTS 4]
LOG: Physique Four (kind=B, size=17699 bytes) pts 9000 key_frame zero [DTS 2]
LOG: Physique 5 (kind=B, size=6253 bytes) pts ten thousand key_frame zero [DTS 5]
LOG: Physique 6 (kind=P, size=34992 bytes) pts 11000 key_frame zero [DTS 1]
Chapter 2 – transcoding
Learn More
Commentaires récents