I recently had the need for a realtime graph and thought it would be a fun hack. I spent some time thinking about a few ways to optimize the drawing to reduce both the client and server side overhead. Having a way to visualize data without a significant effect on system performance is a must.
The result is UberGraph. For those that care less about the how and why and more about seeing the code, you can find it here. I'm working on a few different ideas in there. But please, keep in mind that the data collectors are ugly hacks and only used to test the widget.
For the impatient, here is a comparison of the drawing overhead to System Monitor.
In it's most simplist form, the graph must be able to present a grid with two scales; the x and y axis. Presented upon these scales are the samples which will animate right-to-left. In the most unoptimized form, this seems like a simple problem and we might as well start to attack it like such and optimize from there.
This is essentially how the graphs are done Gnome System Monitor. I don't mean to pick on G-S-M, but it was an interesting problem to me (being that the CPU usage every time I ran G-S-M skewed my results) and it was a good learning experience.
There are many places for which this can be optimized. To begin with, it is important to consider the design of the X11 protocol and how to take advantage of it. X11 provides pixmaps, which are a "server-side" texture. This means you push your content to the server over the X11 protocol, and then reference by a unique-id going forward. If designed correctly, we can exploit this to do the sliding animation efficiently without sending a new image for every frame. This results in less work for X, less work for our process, and less bandwidth to X (resulting in better performance over ssh -X). When you draw onto a GdkWindow, you are creating the pixmap on the client, and sending the contents over the wire to the X server via the X11 protocol and exposing it to the window. Now, if you are drawing a graph on every exposure (up to say 20 frames per second), you can imagine how this can take a hit on your systems performance. You are rendering the pixmap, encoding the request, delivering it to the server (via UNIX/TCP socket), decoding the request, loading the pixmap, and then copying it into the framebuffer (through the driver infrastructure). Thats a lot of work to do 20x per second. Even more so over ssh forwarding.
If you ever want to get a **quick and dirty** idea of the amount of data you are sending to the X server, use strace. "strace -ewritev" seems to do the trick for me. The X socket typically ends up as FD 3 in my experience. Just grab the last value in the line for number of bytes sent. Sadly, I haven't found an easy way to determine this with xtrace, yet.
To allow us to get some of the desired speedups, we must first split the foreground and background into separate pixmaps. This way, the only content we ever need to continually ship to the X server is the foreground. The background is an opaque pixmap and the foreground will have mostly transparent pixels (which should compress well too!). Thanks to cairo, we can easily composite the transparent pixmap onto the background using gdkcairosetsourcepixmap(). Alternatively, on older systems you can blend using the GDK_XOR operation (which will only blit a portion of the foreground based on its comparison to the background). When using XOR, you need to pre XOR your drawing colors so they result in the proper color.
So now that we have our foreground and background separated, we can move on to getting some decent CPU savings. If we make one small sacrifice, we can keep the CPU from doing a significant amount of work repeatedly. This sacrifice is allowing ourself up to 1 second of latency between the time of a sample and presenting it on the screen. This is useful because it allows us to render the new data outside the visible range and slide the pixmap right-to-left on the server. Doing this means we only draw once per second, rather than 20 times.
Lets take a quick look at what I mean.
Using this model, we can render the new version of the graph and send the pixmap to the server. Then, in our frame-expired callback every (1 / Frames-Per-Second) seconds, we simply adjust the position at which the pixmap is rendered by the horizontal distance between frames.
At 20 frames-per-second, this saves us 19 iterations of rendering, encoding, sending, decoding, and copying to the framebuffer per second. Instead, we simply tell the server that we wish to render the pixmap at a given offset and it will copy it into the framebuffer for us.
However, why should we send the entire content once per second? Surely the historical data hasn't changed. In fact, if we use two pixmaps on the server-side to flip between, we can get another large X bandwidth savings. We start by copying the first pixmap into the second pixmap at its new offset. Then, on the client, we render the new second worth of content (properly clipped with cairo_clip()) and send it to the X server. The content is moved to the "New Data" region of the pixmap and then animated into view over the following second. This saves us ((Width - WidthPerSecond) * Height) pixels to send across the wire every second. This is good since that will often exceed the size of a single packet (again, helping over ssh -X). Locally, the speedup is modest. However, remotely it allows us to provide almost the same experience even on low-bandwidth links (ssh -X from my n900 via AT&T Edge for example).
Autoscaling the graph to handle values outside the visible range is done by checking the value bounds when retrieving the value. Scaling down is done during a callback to see if we can shrink (currently every 5 seconds). No sense in having this in a hot path, it's simply not that important.
It is worth adding, that while I mention X throughout this post, this should work similarly on backends other than X11 since GdkPixmap abstracts that.
Fork me on Github.
Thanks for reading.
-- Christian Hergert 2010-08-06
Back to Index