The title might be a little misleading, but not by much. Everyone reading should probably have heard of Google Meets, or Zoom, or Microsoft Teams, or Discord, or Snapchat; or perhaps all of the above. What they all have in common? They use an API developed by Google called WebRTC, which stands for Web Real-Time Communication. Ever wonder how all that works? Well this is the post for you. I’ll start with how WebRTC works in general, then move on to how I’m using it in, you guessed it, ReLife. Onwards!

WebRTC

Alrighty then, let’s get into it. I’ll keep this overview fairly basic, because while it is important to get the basic concepts, networking can get very complicated very quickly. The basic goal of WebRTC is to send a stream of information (your voice) from your computer to another computer. Keeping that in mind, let’s back up. If you’ve never heard about protocols, you’re probably in the majority of people, but it’s time to fix that.

Protocols

You’ve probably heard at least a couple of these acronyms: IP, TCP, UDP, HTTPS. What do they all have in common? They have a ‘P’ in them. That P stands for protocol, and the reason there are so many is that there are several layers to networking. The internet is an incredibly complex thing, and computers need to be able to talk to each other at different levels.

Starting at the most low-level layer, you have the ‘physical layer’. This one is literally sending stuff like electrical signals, radio waves, and light pulses. This layer doesn’t really have protocols, but rather standards (voltages, frequencies, etc).

The next layer is the ‘link layer’. This one handles physical connections your computer has; notably, to your router. There are a couple of protocols to look at here, notably Ethernet and WiFi. The fancy name for the WiFi protocol is 802.11a/b/g/n, but that’s relatively unimportant, and depends on what router you have.

Finally, we have the ‘network layer’. You’ve likely heard more about this one, because this is where you deal with IP addresses. IP itself stands for Internet Protocol, and is essentially the protocol computers use to talk to one another. When you have another computer’s IP address, you can find a route to talk to that computer.

However, we want to be able to run multiple things on one computer, so we need more than just an IP addresss. This is the ‘transport layer’, and this is where you get the idea of a port. Ports are essentially an additional number you specify when you send data to the computer, so that it knows which program to give the message to. The two most popular protocols here are TCP (transmission control protocol) and UDP (user datagram protocol). This is where things start to get really interesting, and we have a couple of options of how to do things.

UDP is what stuff like video streaming uses, and it doesn’t establish a connection. One side basically just throws data at the other, hoping most of it sticks. Sometimes you’ll lose some data in transfer, but for stuff like stremaing , losing a few frames here or there doesn’t matter too much, and it is much faster. TCP does establish a connection, through a process called handshaking. A handshake is where one end says hello and sends some information about itself, then the other end acknowledges that request and sends some information about itself back. This means that in TCP, you only send data when the other end says they’re ready, which means you don’t lose information like in UDP. Things like file transfers and websites use this

The final layer is the ‘application layer’. This layer is after you’ve established what program to send the message to through use of ports, and is how the program actually interprets that message. There are a ton of these, but the one you’re probably most familar with is HTTPS.

With all of that out of the way, WebRTC uses a UDP connection, which means it has much less overhead, for the small price of potentially dropping some data. You can see this in how if your internet is slow, whoever you’re talking to will only hear parts of what you’re saying. There are two ways you can do WebRTC, one is directly peer-to-peer (P2P), and the other is using a server as an intermediary. Using an intermediate server has the benefit of added security, so that you don’t have a direct connection, but has the downside of adding a little overhead.

My implementation of it uses P2P connections, because it’s a bit easier to manage, and that’s what I found example code for. While there is still a central server, that server is more to tell them how to connect to each other. For me specifically, that server serves some additional purposes such as volume control since this is a proximity voice chat server, not just normal voice chat. Now that you know how it works, lets get into how I used it.

ReLife Integration

There were… a lot of challenges in implementing this. It took stealing code from the internet, fixing said code, giving up and coming back months later, and a lot of problem solving to integrate it. You see, I’ve talked about this before, but my vision for ReLife is for it to require minimal setup, both for the server operator and clients. I also want it to be standalone, which means I don’t want to be hosting just one voice server website all the active relife servers connect to; I want them to host their own. However, remember what I said about ports earlier? Well, any properly secure Minecraft hosting service on the internet is only going to have the one port Minecraft uses open. Which means… on one port, I have to be listening for Minecraft connections, HTTPS connections, and WebSocket connections.

Ok I should explain a little. The Minecraft connection part is fairly obvious: it still has to work as a Minecraft server. The HTTPS also pretty obvious, because I need to have a website for people to go to in order to use WebRTC. The WebSocket part may be a little more confusing, but that is just the protocol I’m using for the central server for WebRTC.

With that out of the way, before we can make Minecraft listen for three things, we first must understand how it listens to just the one type of connection. There are a lot of libraries out there, but the one Minecraft chose is called Netty. Before I get into how I modified it, we need to go over some vocab.

Netty has ‘channels’ and ‘pipelines’. Every channel has a pipeline, and channels basically represent connections. The pipeline, as its name implies, is what actually handles the incoming data. The pipeline can have as many ‘handlers’ as you want, in any order. A handler basically takes in data, and can either output modified data or do something with that data. Finally, there are a few common types of handlers: ‘aggregators’, ‘codecs’, and just generic ‘handlers’. A codec will then process that individual data and convert it from bytes to helpful info. Also, if that data is improperly formatted for whatever protocol the codec is made for, it will generate an error. Aggregators will take in a lot of individual pieces of data that have been processed by a codec and then combins those into one ‘packet’ of data. Finally, handlers will take that packet the aggregator sends along and does something with it. Here’s an example of what a series of handlers might look like for an HTTPS server: * SSL Handler * Http Server Codec * Http Object Aggregator * Custom Handler (actually sends webpage data back)

Basically you tell the API which of the various handlers to use and in what order. There are a lot of premade handlers for stuff like HTTPS and WebSocket, or you can make your own like Minecraft did. I basically found the code they used for that, and then copy-pasted it into my code so I could modify parts of it. Remember how I said codecs would error if they received bad data? Well, if someone, say, tries to connect to a Minecraft server with an HTTPS connection, I can see that it errors, and then process it as an website. Essentially, my code is a giant If-Else of network handlers. If it doesn’t work as a Minecraft connection, try HTTPS. If not that, try WebSocket. If you’re curious, all of those are TCP connections. If you recall what I said earlier about handshakes, that means I don’t have to check every time it receives a message; just for the one handshake at the beginning.

That’s the basics! There were several snags I’m not mentioning here, but my initial idea worked very well for the most part. I’m not going to talk about the website itself too much, because it’s pretty simple (though it took a while), but basically on the website you see a list of online players to pick from, then in Minecraft you confirm that you are in fact the one on the website, and then you’re connected and the central server handles modulating volume!

Wrapup

If you’re still here, thanks for sticking with it. I hope this post didn’t get too complicated, I tried to keep it as simple as I could. I personally found working on this to be very hard and complicated, but also very rewarding. Personally, I find it very cool to work on a diverse range of projects, and get a little experience in a lot of things. Hopefully you’ve learned a thing or two about networks, ideally without too many glazed eyes. I do already have something to write about next time, since it’s been so long, but until then, happy spring break!