
Yesterday Google began rolling out video support for GMail. The support is in form of a browser plugin, which is currently available for Mac and Windows platform.
While it’s nice that GMail gets audio and video capabilities, having to download and install plugin voids the webapp benefts (available anywhere), and raises a question on why videochat wasn’t done as an upgrade to desktop Google GTalk client.
The devil is in the (protocol) details
What’s more problematic is the way Google decided to implement these features in the protocol. As GTalk is actually XMPP (with a couple of Google extensions), the good news is that Google continues using XMPP for the new features. The bad news is that Google completely ignored XMPP standard for voice and video (a.k.a Jingle RTP) sessions, and invented their own signalling.
To make matters worse, the way they decided to implement it is really screwed up. In a video call, there’s typically two media streams – one for video and one for audio (if you don’t want audio, you can set up just the video one). Google’s videochat also uses two streams – but, instead of signalling them as two jingle contents, they bundle all the information into one stream, and then have to do ugly hacks with transport candidate naming to distinguish between ports for each of those, and so on.)
All this can be worked around with, and while it will make everyone’s client into shpagetthi code for supporting such a twisted implementation, it can (and will) be solved. The real problem is that,they’ve decided on supporting only one codec, H.264/SVC that practically nobody uses (well, up until now), and that has no free implementations (apparently they’re using the codecs from Vidyo). They could’ve added additional support for any other codec (e.g. H.263, or Theora), that could be used when calling from/to non-gmail client) .. but they didn’t.
While they might have the best of intentions, this all looks very different from the “spirit of open communications” they’re mentioning. While it’s too late for them to change the signalling protocol, I hope they’ll be more sensible about the codecs, support more open (or at least more widely used) ones, and try to play nicer with the rest of us.

For everyone who hasn’t noticed.. SVC is an extended profile, its more complicated than the AVC profile that everyone does.
I wonder whey they used a special plugin when Flash already supports both webcams and streaming. It seems they could have just used flash and hit the platforms that flash is supported on.
Flash only supports streaming to the media server, meaning all the a/v traffic would have to go through Google’s servers. In Jingle, peers in the call communicate directly (unless they’re both behind very restrictive firewalls and have to use relaying).
In general, IMHO Flash streaming is not very well suited for voip use-cases, and when you do it, you end up with something like Tokbox.com (ie. have a central server that clients connect to, instead of a decentralized network). There’s some talk about Flash10 having better p2p support, but I guess that’s a long way coming.
Also, RTMP (protocol used by Flash for streaming) is completely closed, proprietary, and while Red5 (open-source flash media server clone) claims to support it, it’s still a lot worse choice than XMPP/RTP/H.264 (kudos to Google for at least attempting to do the right thing here).
When we started working on this, the Jingle RTP standards weren’t fully done yet. Rather than try to hit a moving target, we revved our original Google Talk protocol.
Fortunately, as a web application, we can change the XMPP that we send from the server. So it’s not “too late” for us to change the signaling protocol.
Regarding the codec, we will add support for H.264/AVC and other codecs.
It’s great to hear you’ll add support for more common codecs!
It’s true that signalling protocol is not yet finalized, but it is mature, and you could’ve picked any version and said “we’re sticking to this one”. IMHO it would be closer to the current spec, easier for others to implement, and easier for you to upgrade later, if you so decide.
I’m glad to hear you haven’t given up on syncing with the Jingle XEP in the future. By “too late” I meant you now have protocol variant you’ll want to be backwards compatible. It’s not only a web-app, it’s a client communicating with other clients using a protocol – if you silently change it, all interop breaks. Being backwards compatible is not impossible, but it does mean more work.
It’ll be great if Google can change to match XMPP specs sometime in near future. OSS clients have just started properly implementing jingle (empathy) and and any extension over this will be easier to support rather than having to write stream support from scratch.
Also, will adding H.264/AVC support make it compatible with Firefox 3.1 and Opera? That’ll be interesting as folks with access to those browsers won’t need to install the extension in that case (think cross platform.)
OOops, I meant Ogg Theora support in Opera and Firefox 3.1 may take away the need to install extensions for those browsers.
Ajay, the plugin probably handles not just the codec, but the signalling and media transport, so I think it’s unlikely that it could be done without some sort of extension.
is there any possibility that gmail videochat works with ichat as client ?
Videochat with any operative systems is something missing on MacOS !!
H.264/SVC content can be played (at the least part of it) in any H.264 compatible player (flash for instance). Hence its a good choice.
The best thing about SVC, ofcourse, is that same encoded stream can deliver good user experience to all kinds of clients – the ones with high bandwidth and low bandwidth.
Hi Senko,
Justin has committed for Google Talk to support Jingle signalling as defined in the specs, so I don’t think it does much good to say “Google Talk should have supported one of the interim versions”. In fact that might been even more confusing! The most important thing is that we’re all converging on support for what’s in the specs now (also incorporating any feedback we receive at this point). I think the specs are very close to done, but more reviews and implementation experience will help us move them forward to Draft, so keep sending in feedback on the jingle@xmpp.org list! :)
Peter,
I was happy to hear that Google stays committed to supporting Jingle standard, and also that they’re planning to support more open/widespread codecs.
Upon the initial announcement, it seemed to me they took the NIH approach, but it’s actually great to see I was wrong :), and that they’re going in the right direction (and are very responsive to feedback, I might add).
Senko,
The comments about Flash are slightly out of date. Adobe recently announced support for a peer-to-peer channel (RTFMP) in Flash10, but never announced when support would be available in their Media server for the control side of this protocol.
re: GTalk standardization ..
Hopefully, libjingle will be updated also when these changes are made, though I’m not holding my breath so that developers have a working reference design to code with/from.
jon
Jon,
Thanks for clarifying the Flash p2p situation.
I believe Google have an uptodate version of libjingle in SVN at http://libjingle.googlecode.com/svn/trunk/
(which seems to be a different repo than http://code.google.com/p/libjingle/source/browse/ , leading to some confusion), so I believe it will be updated again in the future when the new changes are made.