Music Maker: How Evolved LSTMs Improvise on a Melody You Specify

Use the piano keyboard below to input a sequence of 5 to 15 notes (or alternatively use your computer keyboard). The timing does not matter; the sequence is converted into a sequence of quarter notes automatically. The network then generates an output sequence inspired by your input. Try out different melodies and see whether the network catches your drift/riff! Use the download button to save your joint human+AI creation.

What is going on?

In Paper 3, the evolved LSTM networks were trained on the language modeling benchmark, i.e. in the task of predicting the next word in a large text corpus. They performed 10.8 perplexity points better than the standard LSTM, and 2.8 better than previous state-of-the-art. Whereas performance in that benchmark is not easy to illustrate, the same approach can be applied to a related domain: predicting the next note in a large corpus of music. It is possible to use the prediction as the next input, and iterate, and that way generate an entire musical sequence from a given starting sequence. That is what is done in this demo.

The LSTM structures evolved in the language modeling domain were used to create a two-layer network with 100 nodes in each layer. This network was then trained with piano rolls from the Université de Montréal dataset that consists of 124 pieces of 100–300 eighth notes in duration, with up to 15 notes played simultaneously. In the demo, the network applies this experience to improvise from the notes that you give it as a starting point. At its output, the network generates a probability for each note to be played at each eighth-note timestep (thus multiple notes can be played at the same time). Your input sequence is included in the beginning, and repeated notes are removed to generate the final human+AI masterpiece.