So I’m happy to finally say that Nightly now has initial support for WebVTT! Woot!! To check it out follow these steps:
- Download Nightly.
- Turn the WebVTT pref on by typing ‘about:config’ into your URL bar, search for webvtt, and double clicking on the result to flip it to true.
- Browse to a webpage with WebVTT like this page.
Do all these things and you should see subtitles! It’s been about six months of work on and off now in school and afterwards to get this initial support working on Firefox so it’s pretty amazing to see it working.
What’s Left to Do — Everything
The thing that we have next on our plate is to fix up our current implementation so that we meet the spec requirements. We have a lot left to do to bridge that gap… In regards to this I spent Thursday last week going through the spec, figuring out the outline of what we may need to do to get our implementation up to date, and logged bugs about it. I logged a ton of them and there are still more hiding in the spec, so there really is a lot of work left to do. Hopefully, this will give us more clarity around what needs/may need to happen in implementing the spec. If it turns out we don’t need/want to implement something internally that the spec recommends we should then these bugs will at least serve as a place to discuss and document those decisions.
Just last Friday we had one of these particular incidents. The spec recommends that we use an ‘active flag’ to keep track of which cues are currently active, or showing, on the screen. The problem with this is that it would require us to iterate through the entire cue list and check which cues have the active flag set every time we want to update the rendering of the captions. This isn’t very efficient and it’s an internal detail to the spec i.e. it would never be exposed to the user so we have some leeway to implement this in ways that make sense for Gecko. What we decided on is that we will keep the main cue list of a text track sorted which will make it very easy for us to keep track of the active cues based on the current play back time. I’ve done some work for that here.
Unfortunately, the main bug for tests along with Caitlin’s new tests for making sure that we don’t display cues that have malformed time stamps still haven’t landed. We’ve had problem after problem with these. The newest of them being that they don’t pass on android for some reason. I suspect the track data isn’t being loaded properly so we keep waiting in the test for this to happen and it eventually times out.
The two problems we had before that have finally been solved.The first of them was that we were using the ‘loadedmetadata’ event to determine when the cues were loaded for a VideoElement. The problem with this was that we don’t implement the necessary steps to inform the VideoElement when the cues are ready, so ‘loadedmetadata’ is dispatched before they are actually ready. To solve this Catilin implemented the HTMLTrackElement::ReadyState, which is in the spec, but wasn’t in our implementation, and we can hook into that to determine when they are ready. The second problem was that the Query Interface implementation for HTMLTrackElement had a bug in it where we weren’t able to cast it to an nsIDOMHTMLElement. This was fine on opt builds, but for debug builds we were getting assertions that the HTMLTrackElement wasn’t a DOMNode or DOMWindow when we tried to check it’s content policy. I landed a patch that solved that just last Friday.
Hopefully, we can have these tests landed by this week… we shall see though. It depends on how quickly this Android debugging goes for me..
The rest of my time these last few weeks has been taken up with relatively small things that build up on each other, like commenting on bugs, reading the spec (which can take forever as it’s fairly complicated), helping out Marcus, giving feedback on patches, planning to figure out what is the best thing to do next, etc. Hopefully, we can get WebVTT in Firefox up to spec soon, we’ll have to see though.
If you are interested and wanting to get involved I would really encourage you to come out and become a contributor. There really is a ton of work left to do — for the implementation of the track spec as well as for libwebvtt (the C parser). You can get in touch with the current contributors via a number of ways — Mozilla’s dev media mailing list, the #media IRC channel on irc.mozilla.org, or on the mozilla/webvtt GitHub page. Check out the WebVTT wiki page for more information on how to get involved.
Until next time.
I have some great news. Bug 833382 and 833386, the last pieces needed to get initial support for WEBVTT into Gecko will be landing for sure next week. That being said, I hope I don’t have to eat my words. It’s looking really good though. Bug 833382 has gotten to the point where Boris (:bz) has conditionally r+’ed it and 833386 has reached the same point with Ralph (:rillian). Now all that’s left to do for 833382 is to spin up some try builds and if it’s all green, hopefully we’ll be good to go. 833386 still needs to go through review with Chris Pearce (:cpearce), but I don’t think that will take that long.
I’ve been pushing really hard on landing these two this last week and I’m ecstatic to see that we’ve gotten to the point where we will be landing them in the next few days. The WebVTTLoadListener in particular I’m very happy about as most of that code is mine and I’ve worked really hard at it. It’ll feel good to land that. In the case of 833386 most of that code is Jordan Raffoul’s (:jbraffoul) and Marcus Nsaad’s (:msaad) and the work I did on it was just to consolidate it and get it through the last couple rounds of review. Marcus had been on vacation for a while and we really wanted to get this landed ASAP as it is blocking quite a few things so I asked Marcus if I could step in and he didn’t have a problem with it (Yay Marcus!).
I finally figured out, when asking the right people (bz and Ms2ger, who woulda’ thunk?), that it’s actually impossible to test with two different prefs in one page as there is one prototype for each element on the page and the pref is only applied to it once, when the element is created for the first time. So once you’ve created an element underneath one pref on a page, that’s it, it will behave like it is preffed that way no matter what you do. After that it was pretty quick to get through the rest of the code needed. One of the really good things about this process too is that it allowed me to find a lot of points where our current implementation is not to spec. I’ve filed a few bugs on those this week. I’ve also closed a few bugs that have been fixed with changes in the last while. I also spun off another bug for tests that we will need to implement when the WEBVTT pref finally gets removed.
Try Server Access
The other really awesome news is that I’ve finally got try server access! Earlier this week Daniel Holbert (:dholbert) suggessted that I should apply for it and that he would vouch for me. I probably should have applied for it sooner then this as I could have used it for sure. The process was fairly easy and I’m glad to say I now have Level 1 commit access. Woot! If your interested in applying as well check out this page that describes what you will need to do.
In accordance with this new awesomeness I also had to learn about the process of pushing to the try server. Check out this good page for more information on how to do that. You’ll also probably want to look at Mercurial Manage Queue extension as it helps with managing a bunch of patches that you can move in and out of your branches easily. This really works well with my workflow of working on git and then just applying patches on my hg repository and pushing to the try server.
Until next time.
For the last two weeks we’ve been working steadily on the WEBVTT parser. Most of the work being done now is related to getting the parser integrated into Firefox. We’re building on top of the original bug filed on bugzilla by Ralph and are now using it as a co-ordination bug for the five other bugs we’ve split it up into. The “bug sections” that we’ve split it up into are:
- Integrating the webvtt build system into the Firefox build system.
- Adding a captions div to an nsVideoFrame so the captions can be rendered on screen.
- Creating a “Text Track Decoder” that will function as the entry point for Gecko into the webvtt parser.
- Creating new DOM bindings i.e. TextTrack, TextTrackCue, TextTrackCueList.
- Creating DOM tests using Mochitest for the new DOM bindings.
You can check out a more in depth break down of our bug plan here.
The other major thing that a few of us in the class have been engaged in is the review of the 0.4 parser code. The review is still in it’s early to mid stages, so we have a lot more to do on that. I’ve been participating there by filing and commenting on issues and fixing a few of the bugs that have surfaced.
We’ve also moved the parser code over to the mozilla webvtt repository on GitHub (yay!) and have landed the 0.4 parser code there in a dev branch. After the review is done it will be landed on the master branch.
I’ve been working on the Text Track Decoder for the parser integration into Firefox. This part of the integration functions as an entry point into our parser for Gecko.
How It Works
The short version of how the Text Track Decoder works is that when an HtmlTrackElement receives new data from a vtt byte stream it passes it off to it’s WebVTTLoadListener (Text Track Decoder) which then calls our webvtt parser to parse the chunk of the byte stream it just reveived. The WebVTTLoadListener also provides call back functions to the parser for passing back cues when the parser has finished them or for reporting errors when the parser encounters them. The final function that the WebVTTLoadListener facilitates is converting the cues that have been passed back in the call back function to the various DOM elements that represent a webvtt_cue and then attaching those to either the HtmlTrackElement’s track, in the case of the cue settings, or the HtmlTrackElement’s MediaElement’s video div caption overlay (phew), in the case of the parsed webvtt cue text node tree.
What We’ve Done
The first order of business that we took care of in getting this done was to ask Chris Pearce, who works very closely with Firefox’s media stuff, to give us a high level overview of what we would need to accomplish in order to get this working. That was sent in the form of an email which my Professor, Dave Humphrey, then kindly posted on our bug (I forgot to do so!).
We then quickly went about implementing Chris’s initial steps that he talked about. We’ve done steps 1 – 4 so far:
- The HtmlTrackElement::LoadListener has been moved to it’s own file and renamed WebVTTLoadListener.
- The HtmlTrackElement now has a WebVTTLoadListener reference which is initialized in LoadResource.
- WebVTTLoadListener now manages a single webvtt parser which is created and destroyed along with it.
- WebVTTLoadListener now provides call back functions to the parser for returning finished cues and reporting errors.
We’ve also added three convenience functions to turn webvtt cue stuff into the DOM bindings. These are:
- cCueToDomCue – Transforms a webvtt cue’s settings into a TextTrackCue (almost done).
- cNodeToHtmlElement – Transforms a webvtt node into an HtmlElement; recursively adds HtmlElements to it’s children if it is converting an internal webvtt node (not done at all!).
- cNodeListToDocumentFragment – Transforms the head node’s children into HtmlElements and adds them to a DocumentFragment (pretty much done).
The call back function for returning cues now:
- Calls the cCueToDomCue function and adds the resulting HtmlTextTrackCue to it’s owning HtmlTrackElements cue list.
- Calls the cNodeListToDocumentFragment and adds the resulting DocumentFragment to the caption overlay.
Right now we’ve run into some problems in figuring out how to work with the Firefox code. I’ve listed those in my recent WIP update on the bug. Other then implementing those steps I’ve just been getting acquainted with the Firefox code that we have to touch and figuring out the basics of how it’s all going to fit in. I think we’ve gotten a big chunk of it done so far, mostly the overall frame of how it’s going to work as well as turning a webvtt cue’s settings into a TextTrackCue. I’ve also met the deadlines and goals that I set for myself at the beginning of this semester, so I’m fairly happy. Going forward I think I know enough now to ask intelligent questions about how to solve the problems that I listed in the WIP, so that’s what I will be doing in the coming weeks when I get stuck.
As always, I’m ever confident that we will finish the project!