Deploying a model trained with GPU in Torch into JavaScript, for everyone to use   English

It’s a great time to be into machine learning and artificial intelligence. Since deep learning took the world by storm in the last few years, there are new and exciting papers coming out almost every day. The speed at which the state of the art evolves is startling and indeed it is very hard to keep up! I’ve had this problem myself and I decided to spend some time on a fun project1 to better understand these sequence models everyone is so hyped about. Another requirement I had for my project was to build something that could prove useful to some people.

This work involved LSTMs (Long-short term memories), and you can read more about them in this other blog post.

As the field’s boom is still quite recent, there has been only little effort in enabling users to take advantage of these techniques without specific equipment and/or lengthy installations of frameworks. The usual solution is to make the neural network run in a server, and just serve the content to the user. That works great if you have resources, but this doesn’t scale very well for hobbyists like me. People visiting my demo will have a working CPU, so why not just use that? There is some previous work in this direction: Andrej Karpathy shared a suite of JavaScript modules that can run in your browser. They are completely cross-platform (Windows, Mac, Linux, but also Android and iOS) and they can be used for a variety of tasks. He divided his code into three libraries, that cover three different applications: ConvNetJS, REINFORCEjs and RecurrentJS.

However, while these tools are great for some simple applications, they take a very long time to train a good model - we are talking days. The reason why they are so slow is because they use the CPU to do all the calculations. One of the biggest advances in the field and - in my opinion - the biggest enabler for its recent success has been the discovery that graphic cards can be used instead of a CPU for these tasks, and they work a lot faster2. This is an amazing success in itself: a good GPU costs ~300$ and that is already enough to train pretty good models! However, what was missing was the possibility to train a model on a GPU and deploy it so that everyone could use it, without buying or installing anything3 while also not imposing a cost for the servers to the owner.

The code that I’m releasing today does just that and acts as a bridge between two projects Karpathy made: char-rnn and RecurrentJS. I’m releasing all the code on my GitHub, as well as a demo that I’m confident most will find amusing.

It runs quite fast (almost instantaneously on the few machines I have tried it on) despite running a decently sized model (the model has two stacked LSTMs, each with 300 units). I also made a mobile version (please remember that it will use the CPU of your phone so you have to be a bit more patient than with the desktop version)4.

Some technical challenges that I had to face:

  1. LUAJIT doesn’t handle objects bigger than 1 GB. This caused a problem when unit testing my code on a bigger net, as I was incorporating the model into a big LUA table before writing the whole object into JSON. I solved it simply by creating the JSON on the fly, as I’m reading the binary with the model.
  2. JSON files are much bigger than binaries. This is of course somehow inevitable as JSON is nowhere as efficient as a binary representation. That being said, there is still something that can be done. The scientific literature shows that not only there is no loss of accuracy in storing a network’s weights in single precision instead of double precision, but there is also a very small loss of accuracy in taking this into the extreme and using only 16 bits per weight (“half precision”). In fact, this approach has been so successful that nVidia now supports the type in its CUDA libraries. However, Torch does not support it yet so I had to be a bit creative. In the end, I just truncated the weights to 5 digits (close to what a FP16 number would look like), but I also left users the option to choose other numbers. The model in the demo runs with 5 digits per weight. Also, it is worth mentioning that most web servers will compress assets before sending them to the client. I tested zipping both the binary file and the JSON and they get to a file of similar size. Then, after truncating the weights to 5 digits, the converted JSON file is actually smaller than the binary when both are zipped.
  3. LUA’s strings are a bit weird. I was never able to print single and double quotes on their own, even though I needed to. In the end, I resorted to a hack to unblock me which was to store those characters as “SINGLEQUOTE” and “DOUBLEQUOTE”, respectively. I modified ConvNetJS’s code to convert them back to the right character when they read the model. This unfortunately means that I could not make my code completely transparent to ConvNetJS, but I figured that this can be easily fixed in a future release (and maybe the community can help with this).

Finally, there was another lesson I learned from this project, which is that I should probably have started by publishing a first demo of my Trump LSTM. It was the first thing it came to my mind when tinkering with Karpathy’s Torch code so I had that ready long before someone else had the same idea. However, the focus of my project has been on the conversion Torch JavaScript, so while the demo lost some of its “wow” effect, the code contribution was my focus and I believe it still has all of its value. That being said, I’m somehow happy this happened, because I learned a valuable lesson for the future: one should deliver continuously, even if the first iterations are not that satisfying (in my case, I thought that just releasing a few manually-picked samples was not very technically impressive and thus I shouldn’t aim for those). To me, this seems a lesson better learned sooner than later in my career.

All in all, this was a fun project. Getting funny quotes does take a while, but in the end I got my robot Trump to say that Italians “want to take our jobs”. That did genuinely make me laugh, and it was a nice reward in itself.

  1. An important disclaimer: while I am currently working for Microsoft, they had nothing to do with this project either in terms of opinions or technology. In fact, one of the aims of this project was to gain confidence with non-Microsoft technologies.

  2. nVidia claims GPUs are usually between 10 and 100 times faster than a CPU.

  3. Running a pre-trained model takes a fraction of the time required to train it, so CPUs can do just fine on this task.

  4. You might notice there is a slight difference in design quality between the desktop and the mobile version. For the desktop part, I was helped by a mysterious designer we will call DesignerX. The other is done by me, and I guess it shows.