Derived Cinematic Environments: Machine Learning Algorithm

7461 words (30 pages) Dissertation

9th Dec 2019 Dissertation Reference this

Tags: Information Technology

Disclaimer: This work has been submitted by a student. This is not an example of the work produced by our Dissertation Writing Service. You can view samples of our professional work here.

Any opinions, findings, conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of

Derived Cinematic Environments



1: Introduction

  • 1.1: Expanded cinema, Postdramatic Theatre

2: Scenography

  • 2.1: Elements of Scenography

3: Design project Photogrammetry

  • 3.1: Design Report machine learning

4.1: Immersive environments

5: Conclusion


As quoted by Lev Manovich “the computer fulfils the promise of cinema as a visual Esperanto – a goal that preoccupied many film artistes and critics in 1920. In cinema where most users are able to understand cinematic language but not speak it (make films) all computer users can speak the language of the interface. They are active users of the interface employing it to perform many tasks” [1].  In today’s time can we not look at using the same computer interface to reinterpret cinema or cinematic experience in the 21st century.

The paper looks at using machine learning algorithm to interpret and create cinema and express them through immersive, multisensory, experiential environments as a medium of storytelling. The paper analyses the immersive environment from the perspective of scenography. Expanded cinema and post-dramatic theatre are considered as visual and experiential references. Towards the end the document criticizes the machine learning output, questions the creative ability of a machine as to whether a machine’s creation can induce a sense of emotion in the user through storytelling and visual aesthetics in immersive environments.


Cinema, Environments, Machine Learning, Experiential, Memory, Narratives

1: Introduction

The history of the cinema is one of technological experiment, spectator/spectacle relations and production, distribution, and presentation mechanisms [2]. From its earliest days cinema was an experimental medium and continues to be so as it is being technologically revolutionized. The new capabilities of the technological medium for the production and presentation of cinematic content are opening up newer avenues for further evolution of independent, experimental and expanded cinema.

The conventional cinematic mode of immersion derives from the darkened enclosure of the movie theatre. Immersive cinematic experience relies mainly on the realistic qualities of filmic illusion and their existence due to technological equipment. The interplay, between the visuals and the technology that projects the visuals, engages the spectator on an emotional level. Similarly, the level of immersion in a theatrical play is also on an emotional level. Theatrical play is the live narration of a story with the act unfolding in a real space with characters, props and sets. Large-format cinemas produces an immersive experience in which the audience experiences participation with the film or the act through a series of technologically driven simulations being produced on the screens. This projection lacks a physical contact between the audience and the visuals on the screen that leads to a lack of   immersion on a physical level. The physical and operational separation of the audience from these big screens remains a disengaging constraint in cinema and theatre [2]. The advancements in technology has improved the visualizations and the projection technique, but the framework of experiencing it still has the proscenium that separates the spectator and the spectacle.

The darkened enclosure of a cinema theatre has to evolve to incorporate newer method of experiencing cinematic content, a new relation between the spectator and the spectacle, a new level of physical and imaginative assimilation of the viewer within the image space.

The paper looks at extending these darkened enclosures in an immersive realm with the construction of navigable and immersive cinematic environments that are quite distinct from the types of representation we are used to in the conventional cinema or theatre challenging the fundamental aspects of film direction, the role of a scenographer in theatre or the progression of the narrative. The design project investigates into creating a virtual extension of the image space that the viewer has to explore to discover its narrative subjects; the navigable cinematic space allows the interactive viewer to assume the role of the actor and the viewer whereby experiencing immersion on a psychological and physiological level.

1.2: Expanded cinema, Post Dramatic Theatre

Expanded Cinema

In the experiments of early 20th century avant-garde filmmaking, media-technologies and performance art, ‘Expanded Cinema’ was coined. It is a film and video practice which activates the live context of watching, transforming cinema’s historical and cultural ‘architectures of reception’ into sites of cinematic experience that are heterogeneous, performative and non-determined [3]. Cinema is “expanded” in more than one sense, it could use multiple screens, involve non-cinematic mediums of expression. Radical experimentation started around 1960s with every aspect of film making and film viewing. The material aspect of film itself was analyzed. Instead of exposing the celluloid, artiste and film makers scratched it, perforated it, processed it with grease and paint, punched holes into it to create an abstract imagery on the screen.  The tools like the camera and the projector were deconstructed, reassembled, augmented and used in different ways. The extension of the single screen to multiple not only represented an expansion in visual horizon but also an overwhelming visual experience where the viewers had to interact with or juggle between different screens to conceive the whole image. One of the early successful examples by Charles and Ray Eames was the use of slide and projection onto multiple screens at the Moscow World fair in 1959 and usage of fourteen screens at the IBM pavilion in New York.

The multiple screen experience gave a new dimension to the way narratives were communicated and understood. The presence of multiple screens started to create an experience of being in an environmental setup. Expanded Cinema opened up questions surrounding the spectator’s construction of time/space relations, activating the spaces of cinema and narrative as well as other contexts of media reception [4]. The artistes and directors wanted to move away from the conventional way of narrating a story. In doing so they displayed an alternative by challenging the perspective of filmmaking, visual arts practices and the narratives of social space, everyday life and cultural communication.

Post Dramatic Theatre

Around the same time as expanded cinema, early twentieth century drama started to signal its dissatisfaction with the conventional methods of narratives and direction. Post dramatic plays offer actors and audiences a theatrical experience that is not tied to the change of either character or plot but one that aims to investigate broader issues, free of drama’s limitations [5]. The plays have a greater involvement of the audience into the meaning making dramatic process. The orientation provided by recognizable characters or plotlines dissolves, and spectators have to negotiate the production of post dramatic plays by working through a new set of conventions. The stage in the plays becomes a generator of shared experiences rather than knowledge, and spectators are confronted with the question of how they deal with such phenomena.

For example, Peter Handke’s Offending the Audience (1966) features no named characters, just four speakers who playfully articulate what the audience may or may not encounter in the performance itself. Heiner Müller’s The Hamlet machine (1977) is a dense fusion of text and inter-text, much of which is not directly attributed to a single named speaker. This kind of approach was termed as ‘Post-dramatic Theatre’, by Hans-Thies Lehmann’s in his book “Post-dramatic Theatre” [6].

Immersive Theatre

A thought process similar to post dramatic play gave rise to Immersive theatre. Immersive theatre is theatrical experience where the audience is not a passive viewer but is actively involved in the act. They are not just confined to their seats with the proscenium in between them and the actors. They are a part of the story at large and can move around the space, and participate in the theatrical act. Punch-drunk is one of the most celebrated immersive cinema company in London which exhibited numerous number of performances. The company creates an exploratory interior landscape with performances scattered over different spaces engaging with the environment. This sort of relation between the actors and real-life environment adds to the audiences’ experience. The company has explored various locations in London ranging from railway arches, warehouses, to old school buildings. They develop site sympathetic performances, where the act is based on the site. These sites significantly contribute to the audiences’ experience and brings authenticity to the performance. Because of this informal setup away from a conventional theatre, immersive theatre blurs the boundary between the viewer and the actor and between life and performance. It transcends the viewer into an imaginative world crafted by the scenographer and conducted by the director. The viewers can freely roam around in the space and can choose what to see and even enter different spaces of the setup; this gives them a sense of freedom in the story of the act. The immersion in the immersive theatre is purely due to the visual aesthetics of the setup. Due to an overwhelming response of experiencing a play in new environment, the viewers don’t notice the subtleties of the setup. The immersive theatre director wants to create elaborate surrounding for every scene like a cinematic frame. What one views in a cinematic frame is just the cinematographer’s perspective but in a play the scenographer doesn’t have this advantage. The scenography must be so perfect somewhere between being explicit like a cinematic frame but within the limitations of theatre. If one observes, all the seven aspects of scenography are expressed to the fullest in immersive theatres.

Both the above examples look at the involvement of the audience in the act.  The actor follows the narratives, maneuvers around the space while the spectators pursue them. A strong presence in the environment, the physical proximity with the characters gives the viewer a sense of being on the inside of something that is normally hidden to them. The hidden thing is something behind the cinema screen or behind the stage as immersive theatre is equally dependent on the movement of the audience and the movement of the characters. In a very real sense the viewer is physically inside the crafted interior environment but the aspect of immersion is in the perception of the space in his mind. The viewer realistically views the fake interior setups as real and soak in the drama of the space in real time. This gives the viewer a perception of being in a known environment, where he transcends the boundary of who he himself is.

Inspired from both the examples where the respective movements deconstructed the fundamental aspects of film and theatre. The design project deconstructs the fundamentals of film making, the linearity of storytelling and ways of perceiving a film. A cinematic experience is more of a viewer driven one, where the viewer and the film are in a personified conversation.

2: Scenography

Scenography as defined by Josef Svoboda is “the interplay of space, time, movement and light on the stage” [7]. In theatre and cinema scenography relates to the study and the development of visual, experiential and spatial composition of performance. The main objective of the scenographer is to immerse and elicit emotional and rational engagement of the audience in the performance. . For a scenographer, the organization of characters around the plot (story) and the spatial arrangement are two equal dramatic means of translating the narrative quality of the picture. These two major aspects can be broken down into seven elements that are space, research, text, colour and composition, direction, performers and spectators.

In theatre and cinema, the structuring of the scene spatial is more important than an aesthetic depiction; a setting can stimulate and open the idyllic horizon of the beholder. It has the tendency of being simplistic in order, to be grasped by its beholder with ease [8]. It also gives space to the actors to emote and express their character and gives the audience a breather space in the visual frame. A chaotic aesthetically beautiful frame can still lack engagement of the audience compared to well-structured frame. In addition to the structure the scenographer should add the context (historical cultural backgrounds) to enhance the audience’s perception. The artistic accuracy of the scenography entertains and provides the audience with information about the quality of the act. Scenography is a joint statement of the director and the scenographer of their take on the story presented to the audience. Scenographer’s prime elements are performers as they bring life to the empty set. They weave the space with their movement, their dialogues and their expressions. The final element to complete the circle is the audience who occupies the shared darkened space of the theatre.

Scenography in Immersive Environments

The concept of scenography also has a place in immersive environments though the factors defining scenography in cinema or theatre and immersive environments are different, virtual environments can also use/ or uses stenographic elements for the immersive involvement of the audience either in game or art installations. For example, the virtual reality project ‘In the eye of the animal’ by Marshmellow Laser Feast, the visitors were given globe-shaped virtual reality headsets where they can explore the woods through the eyes of different animals in the forest. Wearing the VR helmets, viewers experienced the landscape through the eyes of one of three woodland creatures: a dragonfly, a frog and an owl. Visuals created the illusion of soaring over the treetops or wandering the forest floor. The 360-degree experience combined with binaural sounds gives the user a feel of being the creatures and experiencing the forest through his eyes. The virtual reality presence of the user is defined due to the strong visuals of the VR environment. The VR experience has a liner narrative where the user is guided through different scenarios of each wonderland character.

For understanding purpose, we can call it an animated movie in a VR headset. The choice of the colour palette to the picturing of the scene, the synchronization of the movements and the visuals, expressing the peculiar characters of each creature, to the way the camera moves in the example of the dragon character are all the aspects of scenography. Due to the complexities of the technology the role of the scenographer is distributed over various people involved in the production. This is a fine example of the role of scenography in the new media art, questioning the main target of scenography as there is no proper definition of the “actor” and the “audience”. There has to be no special efforts made to catch the viewers’ attention all the time. The role of the scenographer goes beyond what the audience sees to what it perceives. The spatial arrangement and the visual imagery are the key in a virtual environment, the paper theorizes on these aspects in the design project chapter.

2.1: Seven Elements of Scenography

A scenographer’s exposition is viewed in terms of a spatial interpretation, miseen-scene, colors, light, and shades in addition to the drama sequence depicted in the space. Out of this assembly comes out an environment that grabs the viewer’s attention and elicits emotion, inspiration and illusion. The visual elements also must have a philosophical, cultural, historical context to it, it must communicate a mood/setting to the viewers which not only acts as a frame for the characters, but it also influences the picture’s dramatic action as it inspires the action of the characters, or even the concept of the director. The following are the elements of scenography which influences and maintains a smooth communication between the audience and the act on the screen or the stage.

3: Design Work

The design approach was carrying out a series of experiments of deconstructing breaking down of the elements of scenography. The initial experiments are related to extraction of colour and three-dimensional volumetric spaces from cinema scenes. These examples helped the project get a grip on various aspects of film making and understanding the seven elements of scenography in relation to cinema frames. The experiments that followed used machine learning techniques of generation and classification.

3.1 Pixel Sorting

The initial design approach looked at deconstructing cinema frames. Pixel sorting was the first technique that was explored. Cinema frames were extracted from video sequences of a movie, these frames were put into a pixel sorting application which sorted the colour pixels of the image in a linear form according to the hue and saturation values. This was the first artist machine collaboration test, where the team could control some aspects of the pixel sorting algorithm and the rest was computers whims. The pixel sorting experiment yielded several glitchy images of movie scenes. The algorithm calculated a mean of all the colours and arranged the colour pixels horizontally according to their hue value from the left to the right. The image created was an average mean of colour tones of the cinematic frame. The visuals of the experiment display the importance of colour tone in a cinema frame. The pixel sorted images still had the essence of the scene in the form of the colour, it still effectively communicates the mood and the ambience of the scene. The generated image still holds a visual association/ similarity to the original image. This was the deconstruction of the colour and composition element of scenography on a two-dimensional level. The input and the output of the processing algorithm were two-dimensional images.

3.2 Photogrammetry

This method explores the three-dimensional aspect of a cinematic frame. The cinematographer converges the elements of scenography of a cinema frame from a three-dimensional design set to a two-dimensional image. This technique looks at extracting back the three-dimensional aspects of a cinema frame in a digital medium.

Photogrammetry is the science of making measurements from photographs, especially for recovering the exact positions of surface points.  This technique was used to extract the three-dimensional volume of a cinema scene. A set of movie frames were fed as input in the algorithm and the output is a three-dimensional point cloud visualization of the machine. The interesting aspect is the interpretation of the machine: of a three-dimensional space from a sequence of two-dimensional images. In the following examples, one can observe what elements of scenography can the machine identify and how well does it output it. The following are the photogrammetry examples of few a movie scene.

These are the cues that shows the machines intelligence,

  • Movie: The Shining

This the famous scene from the movie The Shinning, in this example the machine is accurate in its interpretation of the scene. It has got the symmetry of the scene, the colours, the furniture, the overall composition correct. The machine has also got the camera movement of the movie sequence accurately. If one remembers this scene in the movie the camera is slowly panning out in this scene which is efficiently captured by the machine. As the generated three-dimensional space converges towards the lift.

  • Movie: Stalker

The photogrammetry of this scene is a bit fuzzy and unclear. The machine has been successful in extracting the colour tone of the scene, but lacks the details and the composition of the scene. The three-dimensional volume also lacks depth of space which can be seen in the cinema frame indicated by the position of the characters. It might be that the machine was not able to differentiate between the shades of the colour tone of the scene and hence was unsuccessful as compared to the previous example which has distinct colour contrast in the elements of the scene.

3.3 Machine Learning

The term machine learning(ML) refers to the automated detection of meaningful patterns in data. It is programming the computers to optimize a performance criterion using example data or past experience [9].  The basic idea is to build algorithms that can receive input data and use statistical analysis to predict an output value.

To solve a problem using computation technique, one need an algorithm/ few lines of code. An algorithm is a sequence of instructions that the machine carries out to transform the input to an output. But there are tasks for which there is no algorithm. For example, to tell spam emails from legitimate emails. The input is an email document that in the simplest case is a list of characters the output for which should be a ‘yes/no output’ indicating whether the message is spam or not. But there are no standard guide lines to transform the input into the output. What can be considered spam changes in time and from individual to individual. What programmer’s lack is the knowledge of how to do it. But one can easily compile thousands of example messages most of which are spam and the computer can “learn” from them what constitutes spam. The machine automatically learns extract the algorithm for this task, this is called learning (in ML Language).

The machine learning models work on neural networks. A Neural Network is an information processing paradigm that is inspired by the way biological nervous system, such as the brain, process information [10]. A machine learning model requires a dataset, as explained before the machine learns from examples. The dataset is what the machine feeds on for the learning process. A good and elaborate dataset yields an efficient and accurate output.

The design project intends to create a new interpretation of cinema through the lens of a machine. The project expects the machine to understand and interpret a movie and construct a new iteration of movie out of it whereby giving the machine the responsibility of producing a whole new movie sequence. But due to the limitations in the advancements in machine learning science, it cannot fulfil all this tasks single handed. There comes the collaboration between the artist and the machine similar to the process of making a film which is also a collaborative approach of many artist put together. The design team aims to collaborate as artist with the machine as a tool and generate the output. The project explored three different machine learning models that leads to three different mediums of interpreting cinema.  Not getting into the technicalities of the ML model, the paper discusses the techniques and its working at large and investigates future usage of it in film production. The paper briefly discusses two machine learning models and details the third one as the final design project proposal is based on that model.

1: Generated Adversarial Networks

A generative adversarial network is composed of two neural networks. A generator and a discriminator, they are constantly competing against each other. The generative model is trying to produce fake representation of the input data and the discriminator is trying to discriminate between generated data and training(input) data. These two networks play a continuous game where the generator gets better at producing realistic samples and the discriminator gets better at distinguishing the generated data from real data. This continuous play culminates into generations of samples that will be indistinguishable from the real data. This model is capable of predicting the next scene of an inputted video dataset. The machine generates small Giff format which is the prediction of a scene.

Applying this algorithm to dataset of movie video clips the generator learns to predict the next frame of the videos and the discrimination tries to discriminate it. A data set of movies The shining was created of video clips of 20 to 30 seconds. This dataset let to the generation of the following giffs. Through this model the design project looks at predicting a whole new movie based on existing movie. Due to the limitations in the computation power and resources the generated images from the movie The Shining are quite blurring. In an ideal scenario with a perfect dataset and high computation power a machine will be able to predict a whole new movie. This would completely change the idea of film making as there would be no authorship to the produced movie.

2: RNN –  Recurrent Neural Networks

The fundamental of RNN is that it contains one feedback loop connection. The feedback loop enables the neural network to do temporal processing and learn sequences, so that it can perform sequence recognition and prediction. The model takes in a text file as input and trains a RNN that learns to predict the next character in the sequence. The RNN is commonly used in google text translations, where the algorithm is able to translate a given text.

A cinematographer decides the camera-working in a film, he studies the scripts and creates an efficient camera movement sequence which highlights the characters, strengthens the plot and communicates the film to the audience. The design project proposes replacing the responsibility of moving the camera with the machine. A cinematographer directs the camera, he understands the story but what kind of understanding can be developed in a machine. The action of camera motion is the common link between the machine and the cinematographer. The motion of the camera has x, y, z co-ordinates. Camera motion co-ordinates of the movie The Shining were extracted and were fed as a dataset in to the model. The machine generated a new camera motion co-ordinate and the  team used these coordinates to refilm the movie with the machine generated co-ordinates.

Three- dimensional photogrammetry spaces were created of a small video sequence from the movie The Shining. This same video sequence was reshot with the new camera motion co-ordinates generated by the machine. The generated co-ordinates are a predicted set  of co-ordinates, based on the camera motion co-ordinates of the whole film. The machine is learning from the cinematographer of the film and is predicting its own interpretation of the co-ordinates.

The generated video sequence communicates a whole new narrative of the same movie sequence. This example portrays the influence of the cinematographer on the film. The cinematographer along with the scenographer structuralises the scene. The placement of lights, the movement of characters, the focus of the scene. The only thing that the machine understands is numbers the x, y, z co-ordinates of the camera. The following are the snapshots of the re-filmed video sequence. The machine generated movie sequence depicts the movie as a horror film with fast moving and irregularly positioned shots contrary to the slow moving symmetrical shots of the original movie. This leads to a question – what kind of narratives will be interpreted of a machine directed movie.

2: RIS – Reverse Image Search

The reverse image search algorithm is like the Google image search interface. The machine when queried upon an image outputs a set of images which are nearest match to the input image. Nearest means images that are similar in composition, colours and characters as of the input image. The difference between the Google image search algorithm and the project algorithm is the dataset, which will be explained later in the paper.


The reverse image search algorithm is based on convolutional neural networks, which is very effective in image recognition and classification. Covnets are trained to identify salient objects and features of the image and store their reference as vector.

The image is a screenshot of the covnet classifier application, that shows the learning process of the machine. The machine identifies edges, lines, colours, pixels various aspects of an image. The machine breaks down image into parts and learns their features.


Above is the screenshot of the reverse image search application. On the left top is the input window and below that are the output images generated by the machine.


A ML model requires a huge dataset for an efficient accurate output. Movie video sequences were broken down in frames at around 24FPS per movie. The image dataset was created of top 1oo IMDB movie list of 2017 that totalled out to be around 100 thousand images. The algorithm was trained on that data set.


The large data set facilitated the machine to generate an efficient set of similar looking images. From the snapshots shown in fig 1.23 one can see the identification process of the machine. The input was a video sequence form the movie The Shinning. The images generated by the machine were like the input video, the machine could read the red colour of the main protagonist, to the floral pattern on the walls, to the appearance of two girls. The machine can look through the scenographers vison of colours, the background space, the furniture setup, the camera angle, the composition of the scene. In the scene with the appearance of the two girls, the machine very accurately classifies it and generates a similar scene from the movie The Great Escape. Looking through the generated images the paper views the machine as a master scenographer that it is able to classify most of the elements of scenography. Like the previous ML model the machine cannot understand the narrative of either the input or the output. The generated images are score well on the visual aspect of scenography but from a content or narrative aspect the machine fails completely. The following image is a diagram that measures the accuracy of the machine output in relation to the sonographic elements.

The generated output of the RIS were very individual distinct images. The machine generated images had to be curated and compiled into one sequence. Intuitively images were selected from the generated set. As the dataset was of top 100 movies, the generated images were from those movies. The selected images lead to the movie from which it was extracted. A few seconds of that movie were clipped out and all the small clips of the selected images were aligned sequentially. A change in the input video scene would reflect a change in the generated video scene. Thus, the process led to a creation of a new movie composed of smaller video sequences. This could be called as an interpretation of the computer of a movie. The original movie and the generated when viewed simultaneously, the latter one has some interesting moments in relation to the original movie which can be seen in the fig.21.  The final video is also an outcome of collaboration of the machine and the artist. The limitation of the machine is delivering a meaningful and finished product was overcome by the artist. The limitation of the artist to sort a data set of 100 thousand movie frames was overcome by the machine.

These models raise questions about content and visuals in films. A film, a narrative, a story created from clips of classic movies. The new generated video has a narrative of itself compared to the previous two models. The RIS ML model can weave some narrative around the generated images. This leads to an idea of a new relation between visuals and narratives a video sequence with multiple narratives stitched together.  A digital algorithmic scenographer deciding the elements of a film.


All the Machine learning models addresses one aspect of film making. The generated images of the GAN are still in their meta stage where the machine algorithm needs more development for accurate predictions. The generated images do not communicate much.

The RNN model is successful enough in achieving good visual aesthetics and in engaging the viewer. The video has immersive qualities that can engage the viewer. The viewer gets immersed in figuring out the narrative of the video based on the camera motion. This is an achievement for the ML model that it’s been able to hold on to the viewers’ attention which is also the main motive of a scenographer.

The RIS model is the one that completely takes over the job of a scenographer it effectively engages the viewer in deciphering the narrative of the new video sequence. The video has some interesting moments where one sees similar hand gestures between the generated and the original video, where both the sequences individually have different meanings. The similarity of scene with girls and with two men, the machine identifies the character composition of the scene. If one knows which movie does the scene belongs to it is funny to see them paired next to each other.  

5.1 : Immersive environments

A Physical space is often thought of as a three-dimensional entity that exist between objects. One experiences sensation in the three-dimensional space and defines his perception of the world by his relation to that space [11]. The space can be modelled, manipulated and deformed to create an aesthetic immersive experience. The team is speculating to create a curated experiential space displaying the videos created through machine learning models. To express the central idea of a collaborative art installation, involving the machine learning as a tool and the artist (the team) as scenographers. The following examples are a reference for the proposal.


There are two speculated proposals for the project. Creating an immersive navigable virtual reality environment or an immersive multi-screen/projection video installation. In the both the proposals the aims is to display a new way of experiencing cinema and narratives.

Immersive navigable VR environment:

In this scheme the output generated from the RNN model and the RIS model will be placed in a virtual reality setup. Photogrammetry spaces of the video sequences will form the visual aesthetics of the VR environment. The user would be able to walk around these three-dimensional volumetric photogrammetry spaces.

Immersive multi-screen video installation:

In this scheme the final videos generated by three ML models would be projected simultaneously on a large canvas. The canvas would surround the viewer and expand their cone of vision.

The team looks to base both the proposals around a central theme which feeds a common data into the respective machine learning models. Considering the factors that give efficient output of the ML models, Stanley Kubrick’s movie’s would be a great choice as the central theme. The director is known for his photography, the heavy focus on visual arts and perspective in his films. The famous one-point perspective shot, in which a scene’s art direction, action and camera movement lead a viewer’s focus to a very specific point. The cinematography in his films is always innovative, stunning and aesthetically pleasing with the symmetry in movie frames, the famous long tracking shot, prolonging sequences and slowing down the rhythm of the film, building emotion and suspense in the plot. These aspects of the movie work well with all the three ML models. The creation of photogrammetry volumes is also efficient, due to the presence of lot of elements in the movie frame the RIS image recognition is accurate and the GANs function well due to the prolonged slow-moving frames.

Virtual Environment –

The virtual environment is composed of three-dimensional cinema scenes, which are realistic replica of the original scene. The logic for the arrangement and the placement of the photogrammetry scenes is yet to developed. The logic would largely depend on the output video sequence of the RNN and the RIS models. The team also speculates the usage of sound in the virtual environment.


The RIS generated video has no defined plot line or a fixed narrative. The video is neither diegesis a narrative way of storytelling nor mimetic where the story unfolds in a linear form. The virtual environment carries these aspects of the video.  In the VR environment, a participative way of narration must be adopted. The one in which the narrative is constantly being built on the visual and the audio cues picked by the viewer, the narrative would emerge through the interaction of the viewer and the photogrammetry spaces.

Memory and Engram

The photogrammetry technique extracts most of the details of the scene but some of the aspects are left blank. As the user starts manoeuvring these cinematic compositions he might be reminded of similar visual scenes that he has seen or perceived in the past either in real life or seen in some film, triggering an associative retrieval or an automatic memory reminding process in his mind. The retrieval process occurs when a visual cue automatically triggers an experience of remembering a past memory. Memories in the mind are stored in a types of mental image pictures called engrams. Engrams are a complete recording, down to the last accurate detail, of every perception present in a moment of partial or full “unconsciousness.” The engram is the representation of a memory in the brain. The “memory” is a neural network model of unique pattern reconstructed from the cue and the engram [12]. Engrams during their creation have an emotion attached to themselves and an emotion also leads to the creation of the engram, where the patterns of the situation matches a stored engram the same emotions are experience by the viewer.

As the user starts to build a narrative he would unconsciously start filling up the gaps in the photogrammetry spaces with his imagination, triggering an engram eliciting the emotion attached to it. Episodic engram/memory in human brain is not a fixed 2-D picture but a highly dynamic movie serial, integrating information at both the temporal and the spatial domains [12]. These episodic engrams elicit multi-sensory experience in the person. A similar visual cue combined with the binaural sounds of the VR environment and a physical movement combined might trigger a specific engram, which might get enhanced or subsided by the subsequent scenes in the virtual space. This fills the empty spaces/gaps left by the machine in the cinematic scenes. The cinematic scenes familiar to the user trigger a narrative in the user’s mind which is the start of the story line of his experience. Then the engram, memory retrieval weaves the story together as an experience for the user.

Based on the above discussion of engrams, the team expects the virtual environment and the videos generated with the ML models to elicit emotions in the viewer. In this situation, the machine particularly has no intent in doing anything (eliciting emotions or reactions in viewers) but the generated videos express a strong intent. The machine by its self has no conscious but the whole process of classifying and sorting out images like the input gives an impression that the machine has a sense of empathy towards the input video. It gives an impression that the machine understands the emotions and the story of the input video and generates an equally meaningful output. But the truth is the machine will never know what is it to be emotional and be expressive like the viewers. And similarly, the programmer thinks he understands the machine and the way it works but he only understands it partially. The machine is trying to understand how humans elicit emotions and the programmer or the viewer is trying to understand the how the machine is generating an output. But they both beyond a certain point just presume to understand the other entity. Where a non-living machine is speculated to be showing empathy and a living human being is trying find consciousness in a machine.  I think this is the interesting aspects about this topic the relationship between machines, programmers and the viewers interesting. 

6: Conclusion

The paper presents three machine learning models which are three interpretations of cinema. The machine learning output as individual piece does not hold a strong aesthetic value; it cannot even stand as an individual piece. The aesthetics of the generated output can only be accomplished by a human perception. The photogrammetry spaces in the virtual world have no defined aesthetics to itself as it’s a replica of an actual movie frame. But it exists as a narrative when its actualised by the presence of a viewer in the virtual space.

Therefore, the aesthetics or the creative ability of the machine can be judged only upon the realization of the programme. The beauty of the generated outputs of the machine and the photogrammetry spaces is conceptualized by the process of curation.  Curating the kind of data that is fed into the machine as a data set, the selection of images from the RIS model, all of it is a kind of a process oriented machine art where I would argue that the aesthetics and the charm is in the process that in the final output. It is in the non-visual aspects such as narrativity, processuality, performativity, generativity, interactivity, or machinic qualities [13].

The speculated virtual reality proposal questions the existence of the terms “actor”, “audience” and the role of scenographer. In the proposal, the user of the VR environment plays both the roles of the actor and the audience. The proposal questions the relevance of the seven elements of scenography as the definition of each element will change in respect to its application in the virtual environments. In designing for virtual space, the sonographer is no longer bound by the dramaturgical limitations, as there is no interaction between the audience and the actor.

As the proposal is still its speculative stage I assume the cinematic environment created in the virtual space will elicit emotions in the user. From a scenographer aesthetic approach as described above , the photogrammetry environments score high on most of the elements of scenography. Discussed from the view of episodic engram and the automatic memory retrieval process, the presence of the user in the darkened virtual reality environment will induce a multisensory experience in the user.

The design project experiments displays the Meta stages of developing a movie with machine learning techniques. In the future with advancements in the ML science, can a machine really create a new movie which is visually appealing, having a narrative and can express itself without artistic curation? In that scenarios, it will be amazing to see what kind of content will movies carry. The content will be a mashup or a recycled composition of old classic movies. As machine learning algorithm works on examples and needs a huge database of movies to learn from to predict and generate content. It will be utopian condition where the mankind is only recycling content and not producing any new stuff.

Cite This Work

To export a reference to this article please select a referencing stye below:

Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.

Related Services

View all

DMCA / Removal Request

If you are the original writer of this dissertation and no longer wish to have your work published on the website then please: