WEBVTT: Refactoring the Parser

So I’ve been working on refactoring the parser a bit over the last week. I talked about this in the one of my previous posts. You can check out the work I’ve done so far on my GitHub.

Squashing Code is Nice

So I have successfully squashed the node and cuetext token data code and it’s worked out nicely. This reduces the overall amount of code that needs to be written and the complexity of the code that needs to be written to access those structs. Heres what they look like now:


webvtt_cuetext_token_t {
  webvtt_cuetext_token_type token_type;
  webvtt_string tag_name; // Only used for start token and end token types.
  union {
    webvtt_string text;
    webvtt_timestamp time_stamp;
    webvtt_cuetext_start_token_data *start_token_data;

* Represents a start tag in the cue text.
* These take the form of <[TAG_NAME].[CLASSES] [POSSIBLE_ANNOTATION]> in the cue text.
webvtt_cuetext_start_token_data_t {
  webvtt_string_list *css_classes;
  webvtt_string annotations;


webvtt_node_t {
  * The specification asks for uni directional linked list, but we have added a parent
  * node in order to facilitate an iterative cue text parsing solution.
  webvtt_node *parent;
  webvtt_node_kind kind;

  union {
    webvtt_string text;
    webvtt_timestamp time_stamp;
    webvtt_internal_node_data *internal_data;

webvtt_internal_node_data_t {
  webvtt_string annotation;
  webvtt_string_list *css_classes;

  webvtt_uint alloc;
  webvtt_uint length;
  webvtt_node **children;

Here is one example how it simplifies the code:


if( webvtt_create_node_from_token( token_ptr, &temp_node_ptr, current_node_ptr ) != WEBVTT_SUCCESS )
  /* Do something here? */
  webvtt_attach_internal_node( (webvtt_internal_node_ptr)current_node_ptr->concrete_node, temp_node_ptr );
  if( WEBVTT_IS_VALID_INTERNAL_NODE( temp_node_ptr->kind ) )
    current_node_ptr = temp_node_ptr;


if( webvtt_create_node_from_token( token, &temp_node, current_node ) != WEBVTT_SUCCESS )
  /* Do something here? */
  { continue; }
  else {
    webvtt_attach_internal_node( current_node, temp_node );
     if( WEBVTT_IS_VALID_INTERNAL_NODE( temp_node->kind ) )
       { current_node = temp_node; }

Tokens vs Nodes

In my last blog post about refactoring the parser I talked about removing the token stuff and instead just using webvtt_node straight away. I haven’t done that yet because I have discovered a problem with it. The problem is that tokens have a way of representing an end tag
whereas nodes do not. This is important because when the tokenizer returns an end tag token a new node is not instantiated, instead the current node will change to the current nodes parent. Therefore, since a node has no idea of what an end tag is it cannot replace a token… yet.
I’m still not convinced that we can’t do this in some way.

Those Pesky Pointer Typedefs

I’ve also begun to remove all the pointer typedefs I’ve been using throughout my code. I liked it at first, but now that I have been thinking about it just using straight pointer notation is probably better.

  1. It’s more clear as to what is actually going on.
  2. I was typedef-ing pointers as a [object]_ptr. So if I instead just use pointer notation it will decrease the character count everywhere. Instead of [object]_ptr everywhere we will just have *[object].

Memory Leaks in C++ Bindings

The last thing I have done for now is  changed the C++ bindings over to instantiate the entire tree of nodes when the cue is created. This way the cue knows about the nodes and will be able to release them at the end if it’s lifespan. In this way we can avoid memory leaks.

UTF16 to UTF8

The final thing that I need to do is get the parser using UTF8 instead of UTF16. Caitlin is currently doing a major overhaul on the webvtt_string stuff to change it from using UTF16 to instead use our webvtt_byte which is straight UTF8. That way we can have all those nice string functions that we created for UTF16 with our UTF8 stuff.

Forward to Review

I’m looking to get this done as well as debugging the unit tests by Friday. Then I believe the parser will be ready for review.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s