Automatic Colorization of Digital Movies Using Decolorization Models and SSIM Index

Re-colorization of images or movies is a challenging problem due to the infinite RGB solutions for a monochrome object. In general, the process is assisted by humans, either by providing colorization hints or relevant training data for ML/AI algorithms. Our intention is to develop a mechanism for fully unguided (and with no training data used) colorization of movies. In other words, we aim to create acceptable colored counterparts of movies in domains where only monochrome visualizations physically exist (e.g. IR, UV, MRI, etc. data). Following our past approach to image colorization, the method assumes arbitrary rgb2gray models and utilizes a few probabilistic heuristics. Additionally, we maintain the temporal stability of colorization by locally using structural similarity (SSIM) between adjacent frames. The paper explains the details of the method, presents exemplary results and compares them to the state-of-the art solutions. NOTE: All figures are best viewed in color and high resolution.


I. INTRODUCTION AND MOTIVATION
C OLORIZATION of monochrome objects is an ill-posed problem due to the infinite number of RGB solutions for given grayscale data.Nonetheless, this topic holds notable practical and commercial significance, particularly in the restoration of historical photos and movies, e.g.[1], [2].
In general, the development of colorization techniques involves incorporating more and more human knowledge and expectations into the algorithms [1].To that end, earlier methods involved providing reference color images [3], [4] or manually scribbling important fragments [5], [6].Recently, AI-based techniques have dominated, with architectures designed to learn color patterns suitable for specific domains, semantics, and/or contents, such as [7], [8].
Some works consider recognizing/learning the image domain (or specific objects within images) to further improve the results, e.g., [9], [10].
Colorization methods for monochrome movies basically follow the same principles.In some works (for example, [2]), there is no distinction between the colorization of still images and movies.That is, each frame of a movie is considered a separate item and is re-colorized individually.However, such an approach may introduce temporal colorization discontinuities.Therefore, in recent works, including [14], [15] or [16], the issue of colorization continuity between adjacent frames is addressed, mainly by combining spatial and temporal consistencies of colors assigned to pixels with similar intensity levels.
Nevertheless, in all typical papers on re-colorization of monochrome objects, the algorithms are assisted by human knowledge/experience/expectations, even if the assistance is disguised as training datasets of relevant images/videos.
Therefore, the solution discussed in this paper may seem audacious.We propose a mechanism for fully automatic video colorization without additional metadata, manual assistance, learning processes, or domain identification.In other words, we aim to create plausible colored representations of gray worlds using only monochrome videos as the data source.By "plausible," we mean results that are visually attractive, statistically repeatable, and deliver convincingly rich sensations of colors.
It should be noted that we exclude simple pseudo-coloring techniques that use a limited number of colors corresponding to the number of intensity levels, resulting in a limited richness and diversity of coloristic effects.
The problem addressed appears to be relevant and practical because there are numerous domains, such as infrared, ultraviolet, ultrasound, MRI, X-ray, and others, where only singlechannel visual data physically exist.Then, realistic-looking automatic colorizations can be synthesized for various reasons, even if it is just for aesthetic purposes (see examples in Fig. 1).
In our recent papers [17], [18], we overviewed the results obtained for unguided colorization of monochrome images, and the method for colorizing videos is a natural extension of those results.Therefore, Section II revisits a number of assumptions and techniques adopted in monochrome image colorization and repeated in video colorization.
Section III presents steps that are specific to the colorization of videos.A summary of the experimental results and corresponding comments is provided in Section IV.Finally, the concluding Section V discusses some supplementary details (mainly related to limiting the number of solutions) and outlines prospective directions for future research.

A. Significance of Decolorization Models
Generally, colorization methods assume that the intensity of grayscale objects defines the luminance channel of the colored outputs, so only two channels of chrominance need to be reconstructed.It seems like a matter of arbitrary choice which chrominance models are used (e.g., CIELab in [8], [9] or YUV in [2], [5]).Sometimes, the model is not even specified, and only the final outcomes in RGB format are discussed.
However, almost no papers on re-colorization address the opposite question: How was the original (real or hypothetical) RGB object decolorized to obtain a grayscale object?
Nevertheless, given that we assume that colored objects only exist hypothetically, any combinations of non-negative coefficients k R , k G , and k B can be formally used (subject to the straightforward condition k R + k G + k B = 1).
Therefore, as shown in Fig. 2, a colored object can be converted into a variety of its monochrome counterparts, which may significantly differ depending on the adopted rgb2gray model.A colored image and its three monochrome variants obtained using three different rgb2gray models, namely [0.299, 0.587, 0.114], [0.44, 0.14, 0.42] and [0.14, 0.11, 0.75]).
Correspondingly, by assuming a certain rgb2gray model, we can restrict the re-colorization results in a specific way.
Any pixel with intensity I can only be assigned colors that satisfy Eq. 1 with the adopted [k R , k G , k B ] values.Fig. 3 shows examples of the same image re-colorized using the same algorithm, but with different rgb2gray models adopted.Fig. 3.The same monochrome image colorized (by the method discussed in [17], [18]) with different rgb2gray models adopted.

B. Colorization of Individual Pixels
In coloring monochrome objects depicting some nonexistent color worlds, we are not restricted by the propertiess of YUV or YIQ models, which provide the best consistency between the brightness of the color and its monochrome counterpart, as perceived by human observers.Therefore, in the developed colorization scheme, we assume that: Monochrome visual objects are derived from (only hypothetically existing) color counterparts by an rgb2gray model with arbitrarily selected k R , k G , and k B coefficients.
If we assume that the coefficients are uniformly sampled into n values each (for example, n = 101 for a 0.01 stepsize), it can be easily obtained that the total number of available rgb2gray models is: The first variant includes zero-coefficient models (e.g., [0.4,0, 0.6] or [0, 1, 0]).These models are excluded in the second variant, where each primary color must have a nonzero contribution to the intensity levels.
In the experiments, we use 0.05, 0.01 or 0.002 stepsizes, so that the total numbers of rgb2gray models are 231(171), 5151(4851) or 125, 751(124, 251).However, in the end, we only consider a limited number of 20 − 40 models (see Sections IV and V for more details), resulting in the same number of alternative colorization outputs.
In rendering digital images, we adopt finite numbers of intensities and colors.Thus, given a monochrome intensity level I p from the discrete range 0 to 255, it can only be assigned colors from a pool of colors that satisfy (subject to color discretization) Eq. 1.The size of those pools varies, depending on the intensity level.As an example, the numbers of colors assigned to intensity levels in two arbitrarily selected rgb2gray models are shown in Fig. 4.
The figure shows that the largest pools of colors are for mid-range intensities, with the number of colors gradually decreasing to a single choice for extremely dark or light intensities.(Monochrome white/black should remain white/black in color.)As an example, Fig. 5 displays the pool of colors for intensity level 208 under two rgb2gray models.Note that the first model is actually YUV, and all colors are perceived as having almost the same brightness.In contrast, the perceived brightness of colors varies significantly for the second model.With no prior information provided, all colors available to the I p level can be assigned to a pixel of that intensity with the same probability.However, if the pixel has an adjacent pixel with an intensity I 1 and its already assigned color C 1 , the probabilities of colors that could be assigned to I p should be influenced by the intensity and color of the neighbor.
Therefore, we propose a simple but (as shown later) surprisingly effective heuristic rule: The greater the difference in brightness between adjacent pixels, the higher the likelihood that their assigned colors will also differ significantly.
Under this rule, we prioritize colors from the pool available to the I p level, which are at distances from the C 1 color proportional to the difference in intensity levels ∥I p − I 1 ∥.
Let's assume a pixel with I p intensity, which has an already colored neighbor with I 1 value and C 1 color.Let C = {C p1 , ..., C pN } be the list of colors assigned to I p in the adopted rgb2gray model.
The neighbor (with I 1 value and C 1 color) contributes a color from the above C list.First, the list is ordered by the distances of its colors from C 1 , i.e.Cmod = {C pi1 , ..., C piN }, where It should be noted that inter-color distances are measured in the HSV space, as differences in this space are sufficiently close to represent perceived color similarities, e.g.[20].
Then, a uniform distribution is used to randomly select a color from a specific sub-range of the Cmod list.The location of this sub-range depends on the difference ∥I p − I 1 ∥.In general, for smaller differences, the sub-range is narrower and shifted to the left of Cmod, while for larger differences, it is wider and shifted towards the end of Cmod).Detailed description of this step can be found in [18].
In particular, if the neighboring pixel has the same intensity, i.e.I p = I 1 , the I p intensity pixel would be (preliminarily) assigned the same color C 1 .
Actually, images are colored incrementally (details in the following subsection), and it may happen that an uncolored pixel has several already-colored neighbors.Then, the color selection can be performed several times for that pixel, and the final choice is the mean of the colors obtained from all colored neighbors.
where K = 1, 2, 3 or 4.This step allows us to generate a larger variety of colors than from the (possibly limited) pools of colors that are assigned to individual intensity levels.

C. Incremental Colorization
Colorization of monochrome images is performed incrementally, starting from the darkest and brightest pixels, for which the unique color choice exists (see Fig. 4).They are considered the initial list of already colorized pixels for the algorithm, which is actually a randomized variant of a popular flood-fill method.
Pseudo-code of the colorization algorithm is provided below.
Step 4 (which does not exist in the standard flood-fill method) is introduced to randomize the expansion of colorized patches, i.e. to avoid unnecessary regularities in the colorization process.move a random pixel from Q to the front of Q; if ps (south neighbor of p) exists and is uncolored then Despite the heavy presence of randomizing factors, the results produced by the outlined method are surprisingly repeatable, depending only (as expected) on the adopted rgb2gray model.Actually, all images shown in Figs 1 and 3 are generated by the method, and many more examples can be found in [17], [18].
The visual plausibility of the results can be further improved by projecting the colors of each pixel p onto the corresponding YUV plane I p = 0.299R+0.587G+0.114B(or I p = 0.2126R + 0.7152G + 0.0722B).This is because colors assigned to the same intensity level can vary significantly in terms of their perceived brightness, see Fig. 5b.Projecting the colors on the YUV planes unifies the perceived brightness of colors with intensities of the original monochrome image, although it may alter the colors somewhat, as shown in Fig. 6.

III. FROM IMAGES TO MOVIES
Formally, the colorization of monochrome movies does not differ from the colorization of images, and both operations are often considered to be almost equivalent, e.g, [21].Each frame can be independently colorized to provide the corresponding frame of the color movie.Such a simplified approach may be used if image colorization performs nearly flawlessly, which can only be achieved in specific and well-defined domains.Nevertheless, this method has been successful in a number of works, including the commercial system described in [2].
In our application, we cannot use this approach due to randomizing factors in the image colorization scheme.In such cases, even if the same rgb2gray model is used, adjacent frames with nearly identical content may have noticeably different colorization.An example is given in Fig. 7. Thus, the temporal continuity/stabilization of rendered colors over sequences of similar frames is particularly important for our problem.Recently, more attention has been paid to the temporal continuity of colors in the colorized movies.Several papers, e.g., [14], [15], [16], consider the regularization or stabilization of colors in adjacent video frames.In [22], a more general problem of stabilizing any visual properties in processed video files is considered.
Nonetheless, in all these papers, the results are obtained by training dedicated neural networks on sufficiently representative ground-truth data.Therefore, these approaches are not applicable to the considered problem of colorizing monochrome videos for which the color counterparts never existed.
In the proposed solution, we assume that the colorization continuity should reflect not only intensity similarities between adjacent frames but also content similarities, particularly the local ones.
After evaluating several popular metrics of image quality and similarity, e.g.[23], [24], the structural similarity index measure (SSIM) has been identified as the top candidate, [25].SSIM is particularly suitable for monochrome images and is applicable to images of any content.
In general, the SSIM is defined by a weighted combination of three measures that broadly represent statistical similarities between the intensity, contrast, and structure of two image 846 PROCEEDINGS OF THE FEDCSIS.WARSAW, POLAND, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
samples, X and Y : where i(X, Y ) = 2µX µY +c1 σX σY +c3 .Depending on the area over which means and standard deviations are computed, SSIM can indicate image similarities either globally (i.e., between whole images) or locally (i.e., between small neighborhoods of the same coordinates (m, n) in both images).
When coloring the current frame in the context of the previous one, we actually utilize both aspects of SSIM.
First, we calculate the global SSIM measure between the current monochrome frame (to be colored) and its monochrome predecessor (already colored).If the value is too low (the recommended threshold is 0.5), we assume no perceptual similarity between the frames, and the current frame is colorized independently.
In practice, such situations occur infrequently, mainly when assembling longer movies from shorter unrelated fragments, and high values of global SSIM similarity between adjacent frames can typically be expected.For example, the SSIM values between the neighboring pairs of monochrome frames in Fig. 7 are as follows: 0.9511 (frames a,b), 0.9756 (frames b,c), and 0.9508 (frames c,d).
Therefore, we normally define colors of the colorized frame I k as weighted combinations of the independent colorization of I k and colors of the previous frame I k−1 , i.e., for given pixel coordinates (m, n) we use the local values of SSIM: Additionally, the colors computed by Eq. 6 are projected onto the corresponding YUV planes (defined by pixel intensities in frame I k ) as explained in Section II-D.
Fig. 8 provides exemplary effects of the proposed color regularization over neighboring frames.First, we display the local SSIM indexes between the monochrome frames from Fig. 7 in Figs 8(a-c).Then, the lower row of Fig. 8 shows the colored frames after the regularization.

IV. EXPERIMENTS
The proposed method of monochrome movie colorization involves heuristic assumptions, arbitrary model selection, and probabilistic computational schemes.Furthermore, the obtained results cannot be objectively assessed since reference or ground-truth results are assumed to be non-existent.
Therefore, the performance of the method and quality of its outputs can only be evaluated through extensive experimentation.In particular, the final results are typically assessed using subjective criteria such as visual plausibility, coloristic attractiveness, aesthetic value, etc.
One of the main challenges in conducting such experiments is the large number of potential rgb2gray models, as discussed in Subsection II-B.The sheer quantity of alternative colorizations may be overwhelming for human evaluators.Therefore, it is necessary to reduce the number of effectively considered rgb2gray models.
For moderate numbers of adopted rgb2gray models, e.g., 231(171) models with 0.05 stepsize, we found that the most plausible solutions are normally obtained from the 10 − 15% of results with the lowest value of colorfulness (details of this metric are provided in [1], [26]).Specifically, when the monochrome objects depict scenes from the real world, this subset typically includes solutions that vaguely align with human coloristic expectations (more information in [17]).
With a large number of models, e.g., 5151(4851) or 125, 751(124, 251), the models are preliminarily clustered into a recommended number of classes (20 − 40), and only the cluster medoids are used.Details of the clustering algorithm are outlined in Subsection V-A.
In any case, the users are presented with a limited number of suggested colorizations, from which they can select the most satisfactory option.Subject to the quality constraints of the original monochrome movies, the results always appear attractive and convincing, resembling scenes from 'fairytale lands'.Therefore, the preferred colorized version becomes a matter of personal choice.
In the experiments, the non-visual (IR) movies mainly come from FLIR ADAS1 and CAMEL [27] datasets.The visual-frequency movies, which are used to better highlight the differences between our approach and the 'traditional' re-colorization expectations, primarily come from personal collections.
This section includes short representative frame sequences as illustrations.For example, Fig. 9 once again confirms that colorization supported by SSIM-based regularization provides a natural-looking continuity of colored frames, free of flickering and artifacts.
Fig. 10 showcases a rather unusual (but occasionally possible) result of re-colorization of a real-world movie.It can be observed that the selected colorization option appears even more natural than the original color movie!Similar effects can be observed in Fig. 11, where the re-colorized frames of an underwater movie appear more authentic than the original shots.
Nevertheless, in typical cases, even if the monochrome movie has ground-truth colors, there is no correspondence between the original colors and their re-colorized versions.In other words, there is no distinction between colorizing monochrome movies with or without existing color originals.The results appear visually plausible, but they may or may not meet the ground-truth coloristic expectations, with the latter being more typical (see examples in Fig. 12).
Finally, Fig. 13 provides exemplary colorization results (arbitrarily selected from a number of alternative results) for a sequence of frames extracted from an outdoor IR movie.
Overall, the experimental results confirm that plausible video colorization can be achieved fully automatically, without the need for learning from relevant training data, human assistance, or supplementary metadata.In other words, it appears possible to synthesize realistically-looking colorful immersion into the 'gray worlds' of monochrome visual data.With the mechanisms provided for pre-selecting the 20 − 40 most promising rgb2gray models, users can choose their preferred colorization version from a limited yet sufficiently diverse number of alternative solutions.
However, considering that only subjective assessment criteria are currently utilized, presenting the actual video clips would be a more suitable approach to report the experimental results.

A. Comparing to SOTA
It is generally assumed that AI-based methods deliver stateof-the-art (SOTA) results for the colorization of monochrome objects.In particular, some works on video colorization, including [14], [16], use individual frame colorization by SOTA AI methods as benchmarks for the proposed algorithms.
Following the same approach, we colorized sequences of frames from the tested videos using the publicly available (at https://deepai.org/machine-learning-model/colorizer)tool which applies one of the most advanced re-colorization methods (outlined in [2]).
The results shown in Fig. 14 are utterly disappointing.While visual plausibility is basically unchanged compared to the monochrome images, the other subjective criteria, such as coloristic attractiveness or aesthetic value fall far below expectations.The color outputs are almost direct replicas of the grayscale values from the monochrome images.Apparently, when facing unfamiliar contents, the algorithm decides to keep the original monochrome colorization.In other words, satisfactory and visually attractive AI-based colorization is not possible for monochrome objects for which no coloristic knowledge or experiences are available.Therefore, the practicality of the proposed approach is somewhat boosted.

A. Limiting the number of alternative solutions
The main practical obstacle in prospective applications of the proposed approach is (as highlighted in Subsection II-B and Section IV) the large number of available rgb2gray models.Human observers are unable to assess all possible colorization outputs, so limiting the number of models to a limited (but sufficiently diversified in terms of the produced results) is an important issue.This is the outline of the proposed remedy.
As an alternative to the representations given in Figs 4 and 5, the colors assigned to the selected intensity I can be visualized as the polygonal intersection of the k R R + k G G + k B B = I plane with the RGB color space cube.An example is provided in Fig. 15 (note locations of the centers of gravity of the depicted polygons).
Thus, the 256 × 3 matrix of gravity center coordinates of such polygons for I = 0, ..., 255 can be considered a compact representation of the adopted rgb2gray model.The matrix can be nicely visualized by a (discrete) curve winding from black to white in the RGB cube (see examples in Fig. 16).Those curves will be referred to as mean-color curves.
Then, the rgb2gray models can be clustered by clustering their mean-color curves (i.e.256-dimensional arrays of 3D coordinates).Eventually, medoids of the obtained clusters are identified, and only the models corresponding to those medoids are used for colorization.
Therefore, we adopt a limited number of models (for example, only 32 clusters are built regardless the total number of models) which are as diversified as possible in terms of their statistical coloristic properties.
For the total number of 4851 rgb2gray models (i.e., 0.01 stepsize), the list of adopted model is given in Table I.It can be noted that one of the models, namely [0.26, 0.58, 0.16], is quite similar to the standard YUV model with [0.299, 0.587, 0.114] coefficients.

B. Summary
In this paper, we propose a method for addressing the ill-posed problem of colorizing monochrome movies without  Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.any direct or indirect human assistance.Our method builds upon our recent results in colorization of monochrome images, where we assume the use of arbitrary decolorization (rgb2gray) models.
The movie colorization process involves two operations: image colorization and temporal stabilization of rendered colors.First, individual frames are colored using simple probabilistic heuristics and a randomized flood-fill technique, starting from the initial queue of darkest/brightest pixels with deterministic color choices.In the second operation, we utilize the SSIM similarity index to determine whether and to what extent color continuity should be maintained between adjacent frames.
While a large number of rgb2gray models can hypothetically be used, we can pre-select a limited number of sufficiently diverse variants.Users can then choose their preferred colorization from these options, typically based on personal preference.
The method is primarily designed for colorizing monochrome movies in domains where no actual color data exists, such as IR, UV, MRI, etc.In other words, our goal is to transform the monochrome data into convincingly realistic color versions of these gray worlds.This may be necessary for various reasons, including aesthetic considerations.
In future work, our intention is to focus on the following problems that have not yet been adequately addressed: • Analysis of the mathematical properties of the method, which includes exploring alternative probability distributions used in the adopted heuristics, investigating the local (individual frames) and global (movies) convergence of colorization results, etc.
• Developing metrics for the objective evaluation of colorization results, including the selection of appropriate assessment criteria.
• Optimizing the code, including parallelization techniques, optimizing data structures, and other strategies.The ultimate objective may involve achieving real-time performance.
• Integrating the method with selected AI techniques to enhance its capabilities and explore potential synergies.

Fig. 1 .
Fig. 1.Examples of monochrome images from non-visual domains (IR and X-ray), and their color-rich and visually plausible colorizations.

Fig. 4 .
Fig. 4. Numbers of RGB colors assigned to intensities in two exemplary rgb2gray models.

5 :
get the front pixel p of Q; 6: ps to the end of Q; Steps 6-9 for pn, pe, pw (north, east, west neighbors of p); 11: remove p from Q; 12: end while D. Other Remarks

Fig. 6 .
Fig. 6.Original monochrome images (a) and their colorizations before (b) and after (c) projections on the YUV planes.

Fig. 7 .
Fig. 7. Four subsequent monochrome frames individually colorized (using the same rgb2gray model) by the method outlined in Section II.

Fig. 8 .
Fig. 8. SSIM maps (intensities proportional to the numerical values) for Fig. 7 pairs of monochrome frames: (a) for frames a,b, (b) for frames b,c and (c) for frames c,d.The bottom row displays the results after the color regularization.

Fig. 9 .
Fig. 9. Monochrome frames of an IR movie (two top rows), frames colorized individually (two middle rows), and frames colorized with the color regularization (two bottom rows).

Fig. 10 .
Fig. 10.The original color frames of a movie (two top rows), their decolorized variants (two middle rows), and one of the achieved re-colorization options (two bottom rows).

Fig. 11 .
Fig. 11.The original color frames of an underwater movie (two top rows), their decolorized variants (two middle rows), and one of the achieved re-colorization options (two bottom rows).

Fig. 12 .
Fig. 12.The original color frames (two top rows), and their exemplary re-colorization options (pairs of lower rows).

Fig. 13 .Fig. 14 .
Fig.13.An exemplary sequence of monochrome frames from an IR movie (top two rows) and its arbitrarily selected colorization (bottom two rows).
ANDRZEJ ŚLUZEK ET AL.: AUTOMATIC COLORIZATION OF DIGITAL MOVIES USING DECOLORIZATION MODELS AND SSIM INDEX 845 Algorithm 1 Monochrome image colorization Require: Initial list L of colored pixels Ensure: Colorized images 1: remove from L pixels which do not have uncolored neighbors; 2: copy L to Q queue 3: while Q is not empty do

TABLE I COEFFICIENTS
[k R , k G , k b ] OF 32 ADOPTED rgb2gray MODELS.