WebVTT 0.3 Release : Final

Today our 0.3 release is due for the WebVTT parser. I’ve completed the cue text parsing portion of the parser and it’s sitting on my GitHub repo. The main places you can look for the code I have added are:

Much of what I discussed in my last blog post has stayed the same with my final version of the 0.3 release. The major structure of the algorithm has stayed the same. However, I have made changes to some of the syntax in order to get rid of minor bugs. I won’t re-post all that slightly changed code as it would make this blog post to long. You can ether look at the GitHub links for that stuff or you can check out my earlier blog post.

I’ll go over what I’ve done in the time since my last post:

I’ve completed the UTF16 append functions:

webvtt_status
append_wchar_to_wchar( webvtt_wchar *append_to, webvtt_uint len, webvtt_wchar *to_append, webvtt_uint start, webvtt_uint stop )
{
	int i;

	if( !append_to || !to_append )
		return WEBVTT_INVALID_PARAM;

	for(i = len; i < len + stop; i++, start++ )
		append_to[i] = to_append[start];
	append_to[i] = UTF16_NULL_BYTE;

	return WEBVTT_SUCCESS;
}

webvtt_status
webvtt_string_append_wchar( webvtt_string *append_to, webvtt_wchar *to_append, webvtt_uint len )
{
	webvtt_status status;

	if( !to_append || !append_to )
		return WEBVTT_INVALID_PARAM;

	if( ( status = grow( (*append_to)->length + len, &(*append_to) ) ) != WEBVTT_SUCCESS )
		return status;

	if( ( status = append_wchar_to_wchar( (*append_to)->text, (*append_to)->length, to_append, 0, len ) ) != WEBVTT_SUCCESS )
		return status;

	(*append_to)->length += len;

	return WEBVTT_SUCCESS;
}

webvtt_status
webvtt_string_append_single_wchar( webvtt_string *append_to, webvtt_wchar to_append )
{
	webvtt_wchar temp[1];

	if( !append_to )
		return WEBVTT_INVALID_PARAM;

	temp[0] = to_append;

	return webvtt_string_append_wchar( append_to, temp, 1 );
}

webvtt_status
webvtt_string_append_string( webvtt_string *append_to, webvtt_string to_append )
{
	webvtt_status status;

	if( ( status = webvtt_string_append_wchar( append_to, to_append->text, to_append->length ) ) != WEBVTT_SUCCESS )
		return status;

	return WEBVTT_SUCCESS;
}

I’ve added in functions that compare two strings or two wchars:

webvtt_uint
webvtt_compare_wchars( webvtt_wchar  *one, webvtt_uint one_len, webvtt_wchar *two, webvtt_uint two_len )
{
	int i;

	/* Should we return a webvtt_status to account for this case here? */
	if( !one || !two )
		return 0;

	if( one_len != two_len )
		return 0;

	for( i = 0; i < one_len; i++ )
	{
		if( one[i] != two[i] )
		{
			return 0;
		}
	}

	return 1;
}

webvtt_uint
webvtt_compare_strings( webvtt_string one, webvtt_string two )
{
	if( !one || !two )
		return 0;

	return webvtt_compare_wchars( one->text, one->length, two->text, two->length );
}

I’ve changed the webvtt_string_list struct and it’s functions since the last blog post:

struct
webvtt_string_list_t
{
	webvtt_uint alloc;
	webvtt_uint list_count;
	webvtt_string *items;
};

webvtt_status
webvtt_create_string_list( webvtt_string_list_ptr *string_list_pptr )
{
	webvtt_string_list_ptr temp_string_list_ptr = (webvtt_string_list_ptr)malloc( sizeof(*temp_string_list_ptr) );

	if( !temp_string_list_ptr )
		return WEBVTT_OUT_OF_MEMORY;

	temp_string_list_ptr->alloc = 0;
	temp_string_list_ptr->items = 0;
	temp_string_list_ptr->list_count = NULL;

	*string_list_pptr = temp_string_list_ptr;

	return WEBVTT_SUCCESS;
}

void
webvtt_delete_string_list( webvtt_string_list_ptr string_list_ptr )
{
	int i;

	for( i = 0; i < string_list_ptr->list_count; i++ )
	{
		webvtt_delete_string( string_list_ptr->items[i] );
	}
}

webvtt_status
webvtt_add_to_string_list( webvtt_string_list_ptr string_list_ptr, webvtt_string string )
{
	if( !string )
	{
		return WEBVTT_INVALID_PARAM;
	}

	if( string_list_ptr->alloc == string_list_ptr->list_count )
		string_list_ptr->alloc += 4;

	if( !string_list_ptr->alloc == 0 )
		string_list_ptr->items = (webvtt_string *)malloc( sizeof(webvtt_string) );
	else
		string_list_ptr->items = (webvtt_string *)realloc( string_list_ptr->items, sizeof(webvtt_string *) * string_list_ptr->alloc );

	if( !string_list_ptr->items )
		return WEBVTT_OUT_OF_MEMORY;

	string_list_ptr->items[string_list_ptr->list_count++] = string;

	return WEBVTT_SUCCESS;
}

I’ve changed the WEBVTT_CALLBACK that will call the webvtt_parse_cuetext function:

static void WEBVTT_CALLBACK
cue( void *userdata, webvtt_cue cue )
{
	webvtt_parse_cuetext( cue->payload->text, cue->node_head );
}

Before it didn’t call the parse cue text function and so the cue text wasn’t parsed.

I’ve changed the function signature for webvtt_parse_cuetext:

WEBVTT_EXPORT webvtt_status
webvtt_parse_cuetext( webvtt_wchar *cue_text, webvtt_node_ptr node_ptr )

I’ve gotten rid of the webvtt_parser pointer, the line number, the length of *cue text, and the length of node_ptr in the function signature that was there previously.

  • For the webvtt_parser pointer and line number I did this because there purpose was to be able to throw an error to the webvtt_parser pointer error call back and reference the line that it happened on, but currently the parser does not support this.
  • I got rid of the length of cue text because it should always be a null terminated pointer. So we can just tell that we are at the end of the line by checking for that. No need for the line length.
  • I got rid of the length of the node_ptr because the parser no longer returns an array of node_ptr it now returns a single node_ptr of type WEBVTT_HEAD_NODE, which contains an array of node_ptr underneath it.

I know we will be changing this in the future, but I got rid of it now to make it more clear.

The other major thing that caitp and I were talking about on IRC last night was the data structure of the nodes. Before caitp had it set up that an internal node and a leaf node would contain a node and so they would be subclasses of node. Then you could just return an array of nodes and cast it to a particular type of node based on it’s node kind.

The way I have it set up right now is similar but slightly different. In my version the node contains a pointer to a leaf node or internal node and based on its node kind you can cast it to either an internal node or a leaf node.

caitp made the case for converting it back to the old format as it might be more readable and possibly take up less space in memory. This is something that we should probably discuss in the future.

Some other things to note that we will need to take care of in the future:

  • I have not had the chance to test the parsing of escape characters, but the code for it is there.
  • It does not parse the new “lang” tag that was recently added to the W3C specification.
  • The memory operations in the node, token, and string list struct do not make use of the allocator functions that we have built into the framework.

Yupp, thats it. See ya.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s