A programming language for AI

I am curious which programming language is more useful for Artificial Intelligence. “Choose the language that you are more proficient in”, it is not an option to me. Choose the right tool, for the right problem is better in my case.

I was looking in Quora and founding some results. “What is best programming language for Artificial Intelligence projects?” is one of the most interesting, I was reading the answers from there. And the conclusion among the results is: Python (because it is fast to develop things and there are interesting libraries), C/C++ (because the speed and performance) or Java.

Taking a look to google, I found a tutorial written by Günter Neumann, from the German Research Center for Artificial Intelligence, entitled Programming Languages in Artificial Intelligence. In the tutorial, you can read why functional programming languages and symbolic languages are more useful for AI and then you find an introduction to Lisp and a small part for Prolog.

It is a simple introduction to Lisp, but I couldn’t avoid to remember µlisp (an small Lisp interpreter that I did, based in another book Build Your Own Lisp). I built a simple version of Lisp using C. In that point there were no standard libraries, you have to build them by yourself and I was wondering, if in that point you can start to create a language that it helps you to represent the world.

As always that was a crazy idea. Create a programming language that experts tell it is useful for artificial intelligence and build the standard libraries to represent part of the world that the system have to work with. As far as you go with the idea, you know that you cannot represent the complete real world with that approach, but… could you do a mix of implement part of the world with the programming language and part of the world based in the experience somehow? That was my thought, maybe there would be a way to do it.

By the way, my mind took me to start reading the new book of Deep Learning by Ian Goodfellow, Aaron Courville and Yoshua Bengio. In the introduction, there is a reference to Cyc (Lenat and Guha, 1989) and knowledge base.

A computer can reason about statements in these formal languages automatically using logical inference rules. This is known as the knowledge base approach to artificial intelligence. None of these projects has lead to a major success. One of the most famous such projects is Cyc (Lenat and Guha, 1989)  [extracted from the draft]

I am still thinking that it could work, because my approach is not to write every single rule of the world with the programming language, if not, to have some base using the language prepared for that specific problem like a DSL, but going further and without any limitation from the language itself. Either way, it is just an idea, I will continue reading the book from Yoshua Bengio, about deep learning it looks really promising and I will take a look to the review of the Lenat & Guha book, maybe I can figure out more.

Deep learning in a large scale distributed system

Deep learning is interesting in many ways. But when you consider to do it in thousands of cores that can process millions of parameters, then the problem is more interesting and complex at the same time.

Google Datacenter (via Google)

Google was doing an interesting experiment, training a deep network with millions of parameters in thousands of CPUs. The goal was to train very large datasets without to limit the form of the model.

The paper describes the use of DistBelief, a framework created for distributed parallel computing applied to deep learning training. A collection of the features that the framework manage by itself are:

The framework automatically parallelises computation in each machine using all available core, and manages communication, synchronisation and data transfer between machines during both training and inference.

I couldn’t find too much information about it, only what it is written in the paper.

They have applied two algorithms: SGD (Stochastic Gradient Descent) and L-BFGS. These algorithms usually works well, but they doesn’t scale with very large data sets. That is because they introduce some modifications to them. The paper gives you more details about the optimisations in both algorithms that you can find interesting.

I was found really interesting the idea of distributed parallel computing working for very large datasets  in such algorithms.

You can read “Large Scale Distributed Deep Networks”, or if you are interested in the pdf version. Have fun!