This project provides a minimal implementation (inside the browser) of a live video and audio encoder and video / audio player based on MOQT draft-04. The goal is to provide a minimal live platform implementation that helps learning on low latency trade offs and facilitates experimentation.
It is NOT optimized for performance / production at all since the 1st goal is experimenting / learning.
Fig1: Main block diagram
For the server/relay side we have used moxygen.
Note: You need to be careful and check that protocol versions implemented by this code and moxygen matches
It uses a variation of LOC as media packager.
The encoder implements MOQT publisher role. It is based on Webcodecs, and AudioContext, see the block diagram in fig3
Fig3: Encoder block diagram
Note: We have used WebTransport, so the underlying transport is QUIC (QUIC streams to be more accurate)
Video encoding config:
// Video encoder config
const videoEncoderConfig = {
encoderConfig: {
codec: 'avc1.42001e', // Baseline = 66, level 30 (see: https://en.wikipedia.org/wiki/Advanced_Video_Coding)
width: 320,
height: 180,
bitrate: 1_000_000, // 1 Mbps
framerate: 30,
latencyMode: 'realtime', // Sends 1 chunk per frame
},
encoderMaxQueueSize: 2,
keyframeEvery: 60,
};
Audio encoder config:
// Audio encoder config
const audioEncoderConfig = {
encoderConfig: {
codec: 'opus', // AAC NOT implemented YET (it is in their roadmap)
sampleRate: 48000, // To fill later
numberOfChannels: 1, // To fill later
bitrate: 32000,
opus: { // See https://www.w3.org/TR/webcodecs-opus-codec-registration/
frameDuration: 10000 // In us. Lower latency than default = 20000
}
},
encoderMaxQueueSize: 10,
};
Muxer config:
const muxerSenderConfig = {
urlHostPort: '',
urlPath: '',
moqTracks: {
"audio": {
id: 0,
namespace: "vc",
name: "aaa/audio",
maxInFlightRequests: 100,
isHipri: true,
authInfo: "secret"
},
"video": {
id: 1,
namespace: "vc",
name: "aaa/video",
maxInFlightRequests: 50,
isHipri: false,
authInfo: "secret"
}
},
}
Main encoder webpage and also glues all encoder pieces together
a_capture
or v_capture
:
TimeBufferChecker
(for latency tracking)a_encoder
or v_encoder
:
Stores the frames timestamps and the wall clock generation time from the raw generated frames. That allows us keep track of each frame / chunk creation time (wall clock)
WebWorker that waits for the next RGB or YUV video frame from capture device, augments it adding wallclock, and sends it via post message to video encoder
WebWorker Receives the audio PCM frame (few ms, ~10ms to 25ms of audio samples) from capture device, augments it adding wallclock, and finally send it (doing copy) via post message to audio encoder
WebWorker Encodes RGB or YUV video frames into encoded video chunks
v_capture.js
encodeQueueSize
(that helps when encoder is overwhelmed)keyframeEvery
Note: We configure VideoEncoder
in realtime
latency mode, so it delivers a chunk per video frame
WebWorker Encodes PCM audio frames (samples) into encoded audio chunks
a_capture.js
encodeQueueSize
(that helps when encoder is overwhelmed)Note: opus.frameDuration
setting helps keeping encoding latency low
moq_sender.js
as media packagerFig4: LOC header structure
WebWorker Implements MOQT and sends video and audio packets (see loc_packager.js
) to the server / relay following MOQT and a variation of LOC
a_encoder.js
and v_encoder.js
maxInFlightRequest
The encoder implements MOQT subscriber role. It uses Webcodecs and AudioContext / Worklet, SharedArrayBuffer, and Atomic
Fig5: Player block diagram
To keep the audio and video in-sync the following strategy is applied:
audio_circular_buffer.js
) keeps track of last played timestamp (delivered to audio device by source_buffer_worklet.js
) by using PTS value in the current playing AudioData
frame and adding the duration of the number of samples delivered. This information is accessible from player page via timingInfo.renderer.currentAudioTS
, who also adds the hardware latency provided by AudioContext
.video_render_buffer
(who contains YUV/RGB frames + timestamps) gets called and:
timingInfo.renderer.currentAudioTS
)AudioDecoder
does NOT track timestamps, it just uses the 1st one sent and at every decoded audio sample adds 1/fs (so sample time). That means if we drop and audio packet those timestamps will be collapsed creating A/V out of sync. To work around that problem we calculate all the audio GAPs duration timestampOffset
(by last playedTS - newTS, ideally = 0 if NO gaps), and we compensate the issued PTS by that.WebWorker Implements MOQT and extracts video and audio packets (see loc_packager.js
) from the server / relay following MOQT and a variation of LOC
loc_packager.js
)EncodedVideoChunk
EncodedAudioChunk
Since we do not have any guarantee that QUIC streams are delivered in order we need to order them before sending them to the decoder. This is the function of the deJitter. We create one instance per track, in this case one for Audio, one for video
moq_demuxer_downloader.js
seqID
bufferSizeMs
) we deliver (remove) the 1st element in the listseqID
detecting:
WebWorker when it receives and audio chunk it decodes it and it sends the audio PCM samples to the audio renderer.
AudioDecoder
does NOT track timestamps on decoded data, it just uses the 1st one sent and at every decoded audio sample adds 1/fs (so sample time). That means if we drop and audio packet those timestamps will be collapsed creating A/V out of sync.
To work around that problem we calculate all the audio GAPs duration timestampOffset
and we publish that to allow other elements in the pipeline to have accurate idea of live head position
lostTime = currentChunkTimestamp - lastChunkSentTimestamp;
Where lastChunkSentTimestamp = lastSentChunk.timestamp + lastSentChunk.duration
timestampOffset += lostTime
Leverages SharedArrayBuffer and Atomic to implement following mechanisms to share data in a “multi thread” environment:
sharedAudiobuffers
): Main buffer used to share audio PCM data from decoder to renderer source_buffer_worklet.js
sharedStates
): Use to share states and data between renderer source_buffer_worklet.js
and main threadAudioWorkletProcessor, implements an audio source Worklet that sends audio samples to renderer.
WebWorker, Decodes video chunks and sends the decoded data (YUV or RGB) to the next stage (video_render_buffer.js
)
video_render_buffer.js
Buffer that stores video decoded frames
We can activate the option “Activate latency tracker (overlays data on video)” in the encoder (CPU consuming), this options will add the epoch ms clock of the encoder in the video frame as soon as it is received from the camera. It replaces the first video lines with that clock information. It is also encoded in a way that is resilient to video processing / encoding / decoding operations (see ./overlay_processor/overlay_encoder.js
and ./overlay_processor/overlay_decoder.js
in the code)
The player will decode that info from every frame and when it is about to show that frame it will calculate the latency by: latency_ms = now_in_ms - frame_capture_in_ms
.
Note: This assumes the clocks of the encoder and the decoder are in-sync. Always true if you use same computer to encode and decode
timestamp
and clkms
(wall clock) is added into latencyAudioChecker
and latencyVideoChecker
queue (instances of TimeBufferChecker
)renderer.currentAudioTS
(current audio sample rendered) is used to get the closest wall clock time from audioTimeChecker
. From there we sync video.Latency = Now - whenSampleWasGenerated
Note: Encoder and Player clock have to be in sync for this metric to be accurate. If you use same computer as encoder & player then metric should be pretty accurate
./create_self_signed_certs.sh
Note: The trick here is that this script will create a self signed certificate for localhost with EDCSA and validity of 10 days (<15), this is the type Chrome will accept.
git clone git@github.com:facebookexperimental/moq-encoder-player.git
Install Python (see this guide)
Run local webserver by calling:
./start-http-server-cross-origin-isolated.py
Note: You need to use this script to run the player because it adds some needed headers (more info here)
Track Name
from encoder webpage and paste it into Receiver demuxer Track Name
ENJOY YOUR POCing!!! :-)
Fig6: Encoder UI
Fig7: Player UI
Note: This is an experimentation code, we plan the evolve it quick, so those screenshots could be a bit outdated
moq-encoder-player is released under the MIT License.