Thanks!

You are now subscribed to our monthly blog digest. Happy reading!

Click anywhere to dismiss ...

Refactoring the Jungle Disk Gateway Service

Every technology company has some level of technical debt, and Jungle Disk is no exception.

As a newly independent company with newly allocated development resources at the start of 2016, we dedicated some portion of our work to refactoring some of our back-end services. I spent most of the year working with these types of changes and would like to share a particularly interesting issue we overcame.

Translating from one written language to the next isn’t perfect in all cases; it may take multiple words in one language to describe the meaning of a word in the other, for example. Refactoring from one programming language to the next can be like this as well - sometimes due to differences in primitives or the standard library.

This time, we’re going to look into the challenges associated with the refactor of our Gateway service.

Gate-what?

The Jungle Disk Gateway service is what allows live, indirect communication between our Server Edition Management Client (an app for remotely managing your server backup configuration) and the Server Edition Agent (installed on your servers). Because each of these programs are making outgoing connections to our Gateway service, you don’t have to open incoming ports in your firewalls to remotely control your server’s backup configuration. It was also originally used for file change notifications (for our Workgroup/Desktop Network Drive features), but that responsibility was eventually moved to its own service several years ago.

Receive Connection

When either of the Jungle Disk Server Edition programs connects to our gateway service, it authenticates and establishes a what we call a “Receive Connection” (which is simply an always-open https connection for receiving communication data). Some firewalls will close connections on the client’s end if it hasn’t received any transmissions for an extended period of time, though. To take care of this, we send an empty message out to the client once it reaches a minute or two of idle time. Some adjustments, messaging and cleanup take place immediately following a client disconnection from the Gateway Service.

Send Connection

When a client program wants to send some information (for example, if you’re starting a backup manually or changing settings), it’ll open a single-use Send Connection and transmit its message to our Gateways while the Receive Connection is still open.

Why Keep this Approach?

While there are some benefits with always-open connections, the concept of Polling is more standard today and is what most of our other services use. Unfortunately, switching to Polling was not an option since it would require some significant changes in our Server Edition client/agent programs and this would extend the timeline too much.

Refactoring

To keep the Receive Connection open, our original .NET service used Sockets and generally followed a Comet-style approach. When a message is received from Client A for Client B, that message is forwarded to the Client B’s already-open Receive Connection. Client B may respond back to Client A, but at that point, it’s just treated as a new message for Client A and the process is repeated. If either client disconnects, the remaining client is informed and knows that the conversation is over.

1. Always-Open Connections

So in Go, we needed to keep the connection open at all times to feed message data (bytes) into the connection as we receive them. We accomplished this by using the golang feature of connection hijacking (example).

2. Trouble immediately detecting client disconnection

Later in testing, we encountered another issue: writing to a connection may be unsuccessful without returning an error for some period of time/writes (and was inconsistent). In other words, when the client software stopped, was disconnected, or the client computer shut down, our server didn’t know until after forwarding a few messages (which meant silently losing those messages instead of holding them in case the client reconnected shortly afterwards).

The pattern would look something like this:

Message Client Received? Response server received
1 yes ok
2 no ok
3 no error!

This was more complicated and we had trouble tracking down information on the topic, but eventually stumbled upon a Stack Overflow post of someone explaining that a Read on a connection is able to detect the loss of connection much more quickly. Even though we never have anything to read from our Receive Connection (since from the server’s perspective, we only write to the Receive connections), running a regular Read request to this connection in the background would reliably tell us within one second if the connection was severed. We used that information to signal to the Receive (writing) routine that the connection is dead and we should stop waiting.

Here’s a portion of our code showing how it all fit together:

...
// Monitor heartbeat of connection
var (
    heartbeatDone = make(chan bool)
    connected     = true
    listener      = client.Listener()
)
go func() {
    // Monitor connection and trigger disconnect/shutdown of client when it stops responding
    for primer := make([]byte, 1); connected; time.Sleep(1 * time.Second) {
        if _, err := conn.Read(primer); err != nil {
            context.Debug("hearbeat of connection no longer detected")
            postmaster.DeactivateClient(client) // stops message pulling; remove from locally connected clients map
            close(listener)                     // stops message pushing; close listener
            connected = false                   // stops heartbeat monitoring
        }
    }
    heartbeatDone <- true
}() // NOTE: We want to share "connected", so we aren't passing it in as a parameter

...

// Main message receive loop
for message := range listener {
    bytesWritten, err := conn.Write(message)
    if err != nil {
        ...
        break
    }

    ...
}
...

After making those changes, we were able to detect connection loss in an acceptable amount of time and met all of the requirements for this piece of our project.

Protect Your Business Data

We are passionate about helping our customers protect their data. We want you to use Jungle Disk to protect yours. Click on Sign Up to get started. It takes less than 5 minutes!

Sign Up