It’s a great time to be into machine learning and artificial intelligence. Since deep learning took the world by storm in the last few years, there are new and exciting papers coming out almost every day. The speed at which the state of the art evolves is startling and indeed it is very hard to keep up! I’ve had this problem myself and I decided to spend some time on a fun project1 to better understand these sequence models everyone is so hyped about. Another requirement I had for my project was to build something that could prove useful to some people.
However, while these tools are great for some simple applications, they take a very long time to train a good model - we are talking days. The reason why they are so slow is because they use the CPU to do all the calculations. One of the biggest advances in the field and - in my opinion - the biggest enabler for its recent success has been the discovery that graphic cards can be used instead of a CPU for these tasks, and they work a lot faster2. This is an amazing success in itself: a good GPU costs ~300$ and that is already enough to train pretty good models! However, what was missing was the possibility to train a model on a GPU and deploy it so that everyone could use it, without buying or installing anything3 while also not imposing a cost for the servers to the owner.
The code that I’m releasing today does just that and acts as a bridge between two projects Karpathy made: char-rnn and RecurrentJS. I’m releasing all the code on my GitHub, as well as a demo that I’m confident most will find amusing.
It runs quite fast (almost instantaneously on the few machines I have tried it on) despite running a decently sized model (the model has two stacked LSTMs, each with 300 units). I also made a mobile version (please remember that it will use the CPU of your phone so you have to be a bit more patient than with the desktop version)4.
Some technical challenges that I had to face:
- LUAJIT doesn’t handle objects bigger than 1 GB. This caused a problem when unit testing my code on a bigger net, as I was incorporating the model into a big LUA table before writing the whole object into JSON. I solved it simply by creating the JSON on the fly, as I’m reading the binary with the model.
- JSON files are much bigger than binaries. This is of course somehow inevitable as JSON is nowhere as efficient as a binary representation. That being said, there is still something that can be done. The scientific literature shows that not only there is no loss of accuracy in storing a network’s weights in single precision instead of double precision, but there is also a very small loss of accuracy in taking this into the extreme and using only 16 bits per weight (“half precision”). In fact, this approach has been so successful that nVidia now supports the type in its CUDA libraries. However, Torch does not support it yet so I had to be a bit creative. In the end, I just truncated the weights to 5 digits (close to what a FP16 number would look like), but I also left users the option to choose other numbers. The model in the demo runs with 5 digits per weight. Also, it is worth mentioning that most web servers will compress assets before sending them to the client. I tested zipping both the binary file and the JSON and they get to a file of similar size. Then, after truncating the weights to 5 digits, the converted JSON file is actually smaller than the binary when both are zipped.
- LUA’s strings are a bit weird. I was never able to print single and double quotes on their own, even though I needed to. In the end, I resorted to a hack to unblock me which was to store those characters as “SINGLEQUOTE” and “DOUBLEQUOTE”, respectively. I modified ConvNetJS’s code to convert them back to the right character when they read the model. This unfortunately means that I could not make my code completely transparent to ConvNetJS, but I figured that this can be easily fixed in a future release (and maybe the community can help with this).
All in all, this was a fun project. Getting funny quotes does take a while, but in the end I got my robot Trump to say that Italians “want to take our jobs”. That did genuinely make me laugh, and it was a nice reward in itself.
An important disclaimer: while I am currently working for Microsoft, they had nothing to do with this project either in terms of opinions or technology. In fact, one of the aims of this project was to gain confidence with non-Microsoft technologies. ↩
Running a pre-trained model takes a fraction of the time required to train it, so CPUs can do just fine on this task. ↩
You might notice there is a slight difference in design quality between the desktop and the mobile version. For the desktop part, I was helped by a mysterious designer we will call DesignerX. The other is done by me, and I guess it shows. ↩