AI Fractal Art Generator

Introduction

Human-like creativity in creating art is one area of artificial intelligence that remains elusive. Given that the notion of creativity is so broadly defined, this problem is nearly intractable without restricting work to a subset of the domain. This project investigated the use of machine learning techniques to generate images of three-dimensional fractal art by adaptively selecting features that are visually desirable. This desirability is determined via user feedback, which is fed into the machine learner to yield increasingly pleasing images over time.

This work attempts to provide insight into the question, “How can computers be used to generate art that is visually pleasing to humans?” Even when restricting the definition of art to digital imagery, this problem remains nebulous and ill-defined. Digital images are capable of representing everything from abstract shapes to outdoor environments, and this wide range of representable objects yields a highly difficult problem. For this reason, the domain for this work was restricted further to digital images of three-dimensional fractals. Fractals were chosen because they allow for a large variety of complex geometry to be deterministically generated from very few initial parameters. Specifically, this project explored the use of the quaternion Julia set, which has four floating-point initial parameters that define the shape of the fractal. In order to generate the images, a ray tracing procedure is performed using the scene description, which consists of the fractal, light sources, and camera.

In order to learn the types of images that users find pleasing, a machine learning algorithm is employed that maps attributes of the fractal and its surrounding scene to the desirability of the rendered image. Two separate machine learning algorithms were explored in order to determine the degree to which an abstract concept such as a pleasing image can be mapped to the artificial intelligence domain.

The main inspiration for this project was Dave McAllister’s Genetic Art project.

Background

Ray tracing is a technique for determining point-to-point visibility, and applies very naturally to rendering for computer graphics. Given a mathematical description of a scene (such as a sphere and its location and radius), a ray tracing renderer casts rays from the source (the camera “eye” point) and calculates the closest intersection with the scene geometry for each ray. If an intersection is found, the color shading at the point can be calculated easily, and more complex effects such as shadowing and reflections can be simulated using a recursive call to generate a new ray from the point of intersection. Ray tracing renderers are typically more flexible and more physically accurate in practice, but these advantages come at a large performance penalty. Despite being an embarrassingly parallel algorithm (since each ray can be processed independently of all others), ray tracing is used sparingly in applications that require real-time performance, such as video games. Currently, ray tracing is most commonly used in off-line applications such as special effects rendering for film.

Fractal geometry is commonly used in applications that require high amounts of procedurally-generated visual complexity with low human involvement. Despite being produced from simple mathematical functions, fractals can create geometry that is nontrivial and seemingly random at first glance. These properties are desirable for this project in order to create sufficient geometric complexity to produce meaningful results. Of the many fractals known in mathematics, the quaternion Julia set was used in this program for two reasons. Firstly, the Julia set is very general and configurable, and in fact the commonly-used Mandelbrot set can be derived as a cross section of the Julia set. Secondly, the Unbounding Volume method developed by John Hart is an efficient technique to generate images of quaternion Julia sets despite the fact that no known way exists to analytically find the closest intersection point of a ray with the fractal.

Machine Learning Algorithms

Two types of machine learning algorithms were explored in this work, namely MATLAB’s Support Vector Machine and Feed-Forward Net.

The Support Vector Machine is a linear classifier that is used consistently in classification tasks. The Support Vector Machine training algorithm attempts to maximize the distance between the training samples and the decision surface, thereby yielding a “cushion” around the decision surface that allows for new samples to be classified more accurately. The Support Vector Machine classifier performs well even in high-dimensional scenarios, and often low-dimensional tasks are mapped to a high number of dimensions in order to produce more accurate classification. This property is desirable given the high-dimensional nature of the classification in this project.

The Feed-Forward Net is a variant of an artificial neural network. Artificial neural networks have been used for a variety of tasks such as curve fitting, pattern recognition, and clustering. The Feed-Forward Net is designed to be a general-purpose neural network implementation that can fit any finite input-output mapping function given enough neurons in the hidden layer. This capability to fit an arbitrary mapping is useful for this project since the user’s ratings of a set of images are not necessarily predictable as a function of the scene generation parameters.

Program Operation

The first step of the process consists of the user initiating the scene generation procedure, in which values are assigned to the various parameters. These parameters include:

  • Four values for the fractal quaternion
  • One RGB color for each of three light sources
  • One RGB color for the scene background
  • One value to control the fractal specularity level
  • One 3-D vector representing the camera position
  • One 3-D vector representing the camera point of interest

Once these parameters have been generated, they are fed into the learning algorithm. This algorithm is pre-trained by the user via a starting set of 15 images representing a variety of image types. Using this knowledge, the learning algorithm is able to receive the parameters from the scene generation program and output a predicted classification or quality score, depending on the approach used.

After the classification or quality score is obtained, a thresholding operation is performed to determine whether or not those scene parameters correspond to a desirable image. For the Support Vector Machine based implementation, this thresholding procedure is trivial, since the classifier segments the space into two regions, namely “Liked” and “Not Liked.” The Feed-Forward Net approach operates using a floating-point rating scale, and an example threshold might be 0.5, approximately representing the half-way point in the quality domain (although this may not necessarily be the case if the user consistently rates images significantly above or below this value).

If the thresholding procedure determines that the image should be rendered, the generated scene parameters are fed into a ray tracing renderer which produces the output image. The user then judges the quality of the output image and assigns a vote/rating to it, and this assignment is then used for further improvement of the classification/mapping algorithms for future use.

Conclusions and Future Work

Given that the machine learning algorithms used the scene generation parameters as the feature vector directly, their success in generating similar images was almost directly tied to the consistency in the user’s initial votes/ratings of the images in the starting set. For example, the program performed best when the user consistently selected images based on criteria such as black and white only, a smooth fractal, or a highly chaotic fractal. Given the extra control of the continuous rating system over the binary voting system, the Feed-Forward Net resulted in fewer off-target images than the Support Vector Machine. This was due to the fact that when, e.g., one image was given a high rating and two images were given mediocre ratings, the neural network approach was more readily able to estimate the contribution of the different scene parameters toward that rating. Since the Support vector Machine operates using discrete classes rather than a continuous mapping, it was not as able to discern these more subtle differences.

Even though it worked, the single-stage learning procedure employed here is inadequate in terms of the original goals outlined in the introduction. This is because it is necessary to achieve a deeper understanding of what non-obvious qualities of an image a user finds desirable. The approach used here simply operated on the scene generation parameters, but a more robust implementation (despite requiring orders of magnitude more data) would be to have the learning algorithms extract various qualities of the images, such as variances in color or the way the fractal shape fills the image. In this way, the learning algorithms would be more able to produce more images that the user would find desirable even if those images have wildly different scene descriptions from the images initially liked by the user.

My Favorites