I’ve decided that I need to get serious about performance testing. The fact is that people are not going to want to play an arcade style game in less the performance is good. But how do I know if the performance is good? How do I know if it is getting better or worse each time that I make changes to the application?
Why bother with performance testing?
It turns out that this is a fairly involved process. Really, it’s a pretty critical step that I suspect many hobby developers skip. But because of the nature of my application and the fact that I want to learn how to do this well I have decided to explore the subject.
Today, my preference testing process looks like this: I make changes and then play the game on my local laptop. If the game still appears to work as good or better than before then I deploy the application to the cloud. Then I play it again from the cloud. If it still looks good then I move on.
If I plan on being the only one that will ever play the game than I guess that is good enough. But I would really look other people to play the game so that is not good enough. I have this point when my sin Micah told me today that he tried to play the game from his Chrome Book and school it it was so laggy he couldn’t play it. I mea, test it of course!
So where do I even start?
Ok, so now I am convinced that I want do this better where to even start? I’ve never really tried to do any kind of real performance testing on a web application before. Especially not on with with a fair amount going on like a real time, multiplayer browser based arcade game. Here are the high level breaks out, I think.
Performance Test Automation
I know enough to know that I want the tests to run automatically when ever changes to the code are made. But do I run them every time I save a code change locally? Like when the unit test runner runs? Or do I only run them when I am checking local changes back to GitHub? Or only when merging a pull request?
I suspect only when I commit to GitHub or merge a pull request. Because of the nature of these tests I think running them locally every time I change something would be difficult. I think that because there will likely be timing involved, UX involved, especially for the canvas rendering, and also for networking conditions.
Performance Test Benchmarking
So once I figure out when to run my performance tests I’ll need to know if they are passing, right? With a typical unit test it is enough to know if the test passed or failed. But with tests like these it is not that simple. There is no binary pass or fail criterial.
I think what you would want to do is compare the results against previous test. Maybe if the performance degrades then you mark the test as a failure. So do you take the performance results and store them somewhere? Maybe a performance test history stored in a logging system or Mongo database somewhere?
I guess it is finally time to set up a true CI pipeline. That will be the primary mechanism to running this performance testing each time application changes are committed.
I’m guessing I will do this by spinning up a TravisCI account, configuring web hooks and then figuring out how to build and run some basic tests to start out.
What to Test?
There are a lot of moving parts in this application and I guess I am going to need to test them all:
- Game renderer running on the client (Canvas rendering in the browser)
- Load response times from the server (HTTP over TCP/IP)
- Game networking performance of the (Web Sockets over DDP)
- Meteor framework traffic over (Web Sockets over DDP)
I’m not sure how to test on this stuff. But I think I do have some initial thoughts:
- I will need to establish some test data that I an use to run for some of the tests to ensure consistent tests are done. For example, the game objects array and any configuration settings.
- I think I am going to need to break the application up into separate NPM modules. That will allow me to isolate changes and pull the modules into test harnesses that I can build to run the tests. The trade off is going to be that it will be less convenient having the game code spread across multiple modules. I mean, from a maintenance perspective. But it is good practice anyway and should lead to a better design.
- There are lots of tools out there that will help with basic load testing for handling HTTP requests coming into the site.
- I’m not really sure how I will test the network traffic that the game produces. This is the web socket traffic that carries player commands to the server and then the game state back to each player. And this is critical too because it may be the single hardest thing to control because of the unpredictability of cloud environments and network traffic of the internet.
- I’m really not too worried about the meteor framework traffic. This is very minimal right now. Later on, if I have lots of data views to sync across clients I will start to care more.
I’ll need to think about my test environments too because:
- The server side environments will be changing as I scale horizontally and/or vertically to handle increased loads. For example, once I’ve determined what my units of work are (most likely individual players for now) I may need to tweak my preferred VM size to get the basic performance I would like. Then, I’ll need to scale out as more players join the game.
- The client side is very unpredictable. There are so many operating systems, browsers and devices that can run an HTML application. I’ll just have to start with some low hanging fruit. Like maybe the major desktop browsers. And Chromes Books too, of course!
How do I really know how the application is performing out in the wild? I’ll need to setup logging and monitoring (maybe through something like Loggly) so that I can get a wholistic sense for how the app is really running in production. That is always a great benchmark!
Well, that’s a lot of stuff to think about! I need to think about the most likely causes for the poor performance.
- Is it because slower browsers just can’t handle the canvas rendering at 60fps?
- Is it because slower browsers just can’t run the engine at 60fps?
- Is it because the server can’t handle increased connections?
I suspect that the really bad performance that my son saw on the Chrome Book today are because of the lower power of the device versus my MacBook Pro. And I’m guessing it is the canvas rendering that can’t keep up.
I’ve also observed that the game gets worse as more people join. I suspect that the game server can’t handle the increased networking load from all those game updates going out at 20 per second times number of players.
Once I’ve thought all this through I need to set out a plan of which things I am going to do and in what order. Some of the stuff I need anyway like the CI capabilities.