Ethics: Companies, Brands and Values
OpenAI vs StabilityAI
Design Studio 3 - DSO201
By Oliver Lavender
Generated in Stable Diffusion - Prompt: "A magazine editorial cover illustration that depicts bias in machine learning, in the style of New Yorker Magazine by Christoph Neimann".
The world of AI art (focusing on images and video in this case) is extremely experimental, emergent, and most importantly, current. At the time of writing, OpenAI's Dall.E 2 (DE2) has only been open to public access for a matter of weeks, with StabilityAI's Stable Diffusion (SD) not far behind, released only months prior.
The radical development that text-to-image generation (TTIG) has undergone in the past few months (and years) makes it hard to find concrete facts and figures from the sector as they simply don't exist yet. Finding reputable sources is also of concern, with most details being tucked away in sub Reddit's and Discord servers, but in a way, these are the front lines of the new revolution in art and personal expression. Many writers, technologists and bloggers have speculated on the ethical and moral dilemmas that arise from TTIG and AI in general. Although insightful, these "predictions" of the future remain highly speculative and philosophical in nature.
Only time will tell as to how accurate this current conjecture really is.
It is an extremely poignant moment in history to be investigating the ethical structures that currently govern TTIG and the new wave of safeguards that are currently being implemented as a response to the incipient nature of the technology. The organisations featured in this paper (OpenAI and StabilityAI) have taken an (almost) diametrically opposed stance on the adoption of AI in the wider community. This makes for an interesting comparison between the two companies, but also affords a thought-provoking insight into the redistribution of power, freedom of speech, protection of privacy, copyright, ownership, and more generally open vs closed culture.
The true impact of these organisations and their associated technology is yet to be seen, but certain inferences can be drawn from the narrative so far. More mature advances in contentious technology (such as deepfakes) can be used as a measurement for the potential direction that TTIG could take. For example, many critics warned (or perhaps preached) the dangers of deepfake technology and the damage it would quickly unleash on individuals, institutions and governments across the world. War was predicted based on false flag attacks using "synthetic media". The "liars' dividend" was hailed as the end of trust in the media as we know it.
What has actually transpired in the years since the advent of GANs and DF technology has essentially been entertainment in the form of deepfake CGI content and an all out assault on women's personal rights and privacy via the non-consensual fabrication of deepfake pornography (which now accounts for 96% of all DF video content online).
In other words, humanity is capable of doing cruel and despicable acts with such technology, but we are also capable (and perhaps more likely) of creating something meaningful and objectively helpful given the opportunity. This duality of the human mind, to create or destroy, will be of primary focus in this blog.
From StabilityAI's Website:
"Stability AI is building open AI tools to provide the foundation to awaken humanity’s potential.
Our values are lived by every team member and shown by everyone who excels at Stability AI. They are how we measure ourselves and our work."
"Our vibrant communities consist of experts, leaders and partners across the globe. They are developing cutting-edge open AI models for Image, Language, Audio, Video, 3D, and Biology. AI by the people, for the people."
From OpenAI's Website:
"OpenAI’s mission is to ensure that artificial general intelligence (AGI)—by which we mean highly autonomous systems that outperform humans at most economically valuable work—benefits all of humanity."
"We will attempt to directly build safe and beneficial AGI, but will also consider our mission fulfilled if our work aids others to achieve this outcome."
Generated in Stable Diffusion, more variations on the same prompt: "A magazine editorial cover illustration that depicts bias in machine learning, in the style of New Yorker Magazine by Christoph Neimann".
Text To Image Generation
In this paper, Text-To-Image Generation (TTIG) is a catch-all term relating to the output of natural language processing (NLP) machine learning models such as Stable Diffusion and Dall.E, which take text input prompts and convert them into images. Each of these models are trained on massive datasets of text-image pairs that are scraped from the internet. In the case of Stable Diffusion, open source datasets such as LAION-5B were also used (OpenAI have not revealed their training sets as of writing).
It is beyond the scope of this paper to explain how image generation through stable diffusion works but it should be noted that the while the focus has been on images up to this point, TTIG is now branching out into animation and video with astonishing pace.
The use of stable diffusion is by no means limited to StabilityAI or OpenAI with many other models such as Midjourney, Night Cafe, Disco Diffusion and many more, competing to create the most astonishing content possible, based on the same premise. StabilityAI and OpenAI are featured here due to their respective profiles and diametrically opposed viewpoints, which makes for an interesting comparison.
Technology is advancing so quickly that recent inventions like deepfakes may be a thing of the past, with some models capable of animating a still image, in concert with synthesised voices and even music. At the time of writing, Google's Imagen and Meta's Make-a-video have just demonstrated the ability to generate 5-second video clips of anything, from nothing more than text prompts.
Image2image has also been recently introduced, which allows users to base their TTIG prompt on a pre-existing image and transfer the style of one to the other, opening up a whole new avenue of creative potential with its own set of ethical considerations surrounding ownership and intellectual property.
Training Models, Bias and Stereotypes
One of the most pressing ethical issues facing the development and adoption of large-scale, multi-modal training sets is the content that is contained within them. LAION-5B for instance, contains over 5 billion text image pairs. With such a large scale, it becomes extremely difficult to filter out "ethically questionable" (or outright illegal) content such as child pornography, gore/blood, executions, violence, rape, murder, torture etc - especially at the expense of normal functionality.
The other major issue with the training data these various models use stems from bias that is inherently built into western culture. The English-speaking internet (where both DE2 and SD are based), for all its collective knowledge, is biassed towards certain genders, racial groups, age demographics and so on.
One only has to look at the way we depict women in society, distorted further by the media's insatiable thirst for power and profit.
Perhaps the way these models reflect humanity is an accurate approximation of our true nature? Perhaps the truth is just too ugly, forcing us to collectively reject this reflection in a bid to reassure ourselves that we are "better than that".
There are seemingly infinite issues revolving around bias in western society, but one very simple and effective way of demonstrating the problem (or perhaps the answer?) is with a simple prompt such as the word "terrorist".
Generated in 1) Stable Diffusion and 2) Dall.E 2 - A terrorist, photo, realistic, detailed, accurate
As the saying goes, "You are what you eat".
How can we expect our models to be any different?
On a base level, Stability AI and OpenAI have taken radically different approaches to the danger posed by dataset bias in their respective products. Dall.E has employed an extremely proactive and protective stance, censoring and blocking certain words, phrases and concepts, along with limiting the creation of images with detailed faces and those that contain celebrities or famous people (living or dead).
Stable Diffusion has taken a completely different path by releasing its model in an open source format, allowing users to generate ANY type of content they desire. The advantage of an open source release is that it incentivises a massive community of developers and artists to create new products and improve the main "fork" of the app (a term relating to derivatives of an original model). Developers are also able to "look under the hood" and tinker with the code, gaining a better understanding of how the model works and how it can be optimised for future updates.
Recently, Open AI has courted controversy once again by adding keywords to users' prompts in an effort to address bias within their training data. OpenAi has "openly" admitted to inserting racial, gender, and other modifiers at the beginning and end of the users main prompt to increase diversity in users results. For example, if a user entered "A nurse holding a stethoscope," OpenAI might add "A MALE, nurse holding a stethoscope" unbeknownst to the prompter. This has resulted in undesirable outcomes, such as female sumo wrestlers, who, while potentially interesting, are far from an ideal starting point for the prompt "sumo wrestler".
All of these images are generated in Dall.E 2 before the introduction of content modification. As you can see, diversity is lacking and stereotypes take precedence. Prompts such as builder, CEO and lawyer are filled with white presenting men, flight attendant is tied to asian presenting women and nurse is dominated entirely by women. Bias is evident in images with prompts such as marriage, composed entirely of heteronormative imagery (OpenAI, 2022).
Perhaps it is not the companies, nor the models responsibility to perform the role of "ethical arbiter" to humanity. Perhaps it is humanity itself? One interesting thought from Twitter user Max Woolf who states:
One counter argument to this comes from OpenAI who state:
Moreover, this disparity in the level of specification and steering needed to produce certain concepts is, on its own, a performance disparity bias. It places the burden of careful specification and adaptation on marginalized users, while enabling other users to enjoy a tool that, by default, feels customized to them. In this sense, it is not dissimilar to users of a voice recognition system needing to alter their accents to ensure they are better understood" (OpenAI, 2022).
More detail will be provided in the following sections regarding the ethical implications (positive and negative) that arise from this type of model (stable diffusion) and the design thinking behind addressing them.
These images were generated from the prompt "CEO" using Dall.E 2 after content moderation has been implemented. As you can see, there are less white presenting men with more people of colour included, but still a distinct lack of woman.
Open Source VS Black Box
In terms of concept, both platforms excel in their ability to generate painterly and photorealistic images in nearly any conceivable style. Both have graphical user interfaces that allow customers to keep track of their previously generated prompts and use more advanced features such as img2img (using a reference image to base the prompt off) and in/out painting, which allows a user to add or subtract the contents of an image via selection and further prompting.
The area in which Stable Diffusion excels (and outperforms) is in providing the user with a greater degree of fine-grained control over the model through an ever-expanding selection of sliders and check boxes. The sheer variety of plugins, add-ons, updates and modifications that can (and already have been) made is simply astonishing. By opening up access to developers in the community, StabilityAI has massively advanced research in the field of AI TTIG, in a fraction of the time it would take in a closed source environment (such as OpenAI) to conceive and implement new ideas.
This video by Youtuber Yannic Kilcher demonstrates the new wave of developers who have sprung up around open source model Stable Diffusion.
“There are so many ideas that one could pursue. It’s not that we’re running out of ideas, we’re mostly running out of time to follow up on them all. By open sourcing our models, there’s so many more people available to explore the space of possibilities” (Jennings, 2022).
Patrick Esser - Principal Research Scientist at Runway
Dall.E 2 could be considered "black box" software as users don't have access to the inner workings of the model. In the case of some black box models, even the researchers themselves are unable to explain (or observe) how their models work internally. This is not necessarily domain-specific, and many other AI models encounter the same lack of transparency.
An open stance is not always perceived as positive in the wider community (especially in the eyes of the media or the government), who have a habit of "overblowing" potential negative use cases. Problems also arise surrounding potential litigation against the company that released the model.
There are a small (but not insignificant) proportion of users out there who will definitely misuse this technology in the pursuit of creating synthesised media such as child pornography, fake news, deceptive and misleading content, deep fakes and so on. But for the most part, it is mostly women who have been affected so far, through the creation and targeted use of non-consensual pornography, which amounts to a loss of privacy and agency for women worldwide.
On the eve of Stable Diffusion's release, during an interview with Youtuber Yannic Kilcher, Emad Mostaque, founder of StabilityAI was asked, (paraphrased) "Your model is capable of producing "horrible" outputs. What do you say in reply?".
Emad (as he come to be known in the community), gave an interesting and thought-provoking reply:
"Of course, I would say that humanity is horrible and they use technology in horrible ways and in good ways as well. But the reality is for this particular output, the vast majority of people are creatively constipated. We have been conditioned to consume constantly by social media and big tech giants and they want us to consume more, according to their parameters."
"We see a model like this.. We have had three year olds use it, in refugee camps all the way to 90 year olds. We're putting in mental health settings. I think the benefits far out weigh any negativity and the reality is, people need to get used to these models because they’re coming one way or the other. Restricting them means you become the arbiter... What they are really saying (OpenAI) is, we don’t trust you, as humanity. Because we know better. I think thats wrong".
"[there are some strange people in the world], At the same time, I think this is positive technology for humanity and it should defuse, because then the pace of innovation to make it beneficial as well as to combat negative uses is far greater" (Kilcher, 2022).
An interview with StabilityAI founder Emad Mostaque containing the above quote.
OpenAI's take on creating a closed system is more complex. Mixed in with ethics and the fear of their own creation, OpenAI is motivated by financial gain and social responsibility.
In 2019, OpenAI shifted from a non-profit to a for-profit business model due to mounting costs in the realisation of their newfound success and equally ambitious goals. Large investments from companies like Microsoft (1 billion dollars) and venture capital firms have also created a fear of litigious attack within OpenAI and reduced access for everyday users - in return for lucrative integrations with big-name products such as MS Office.
It should be noted that OpenAI was at the forefront of the AI revolution we are currently experiencing. They were (and still are, to a lesser degree) the canary in the coal mine for the ethical dilemmas we are only beginning to comprehend in 2022.
Starting with their Generative Pre-trained Transformer (GPT) autoregressive language model (currently in its 3rd release), OpenAI brought wonder, excitement (and panic) to the world stage in 2018. This was followed up in June 2021 by Dall.E (now in its second release and producing 2 million images daily), which demonstrated the future of AI-driven art and the potential for humanity to shift into a new era of creative freedom, unlocking the power for anybody to generate whatever image they can conceive of.
Later in 2021, a crucial component in OpenAI's success with Dall.E 2 named CLIP, which is short for "Contrastive Language-Image Pre-Training", was released to the public. This research paper, which in basic terms helps to classify the best images produced by the AI, can now be found integrated amongst a diverse range of models, including Stable Diffusion. This sharing of information has led to the explosion of research, rapid innovation and prototyping we are seeing in 2022.
In considering the ethical and moral implications AI has on society, it is important to understand what is deemed ethical and what is unethical. With the potential to revolutionise the way humans create, far into the future, it is important for companies to implement safeguards against the potential harm their products might inflict on society. With this in mind it is interesting to compare the two companies content policies.
In your usage, you must adhere to our Content Policy:
Do not attempt to create, upload, or share images that are not G-rated or that could cause harm.
Hate: hateful symbols, negative stereotypes, comparing certain groups to animals/objects, or otherwise expressing or promoting hate based on identity.
Harassment: mocking, threatening, or bullying an individual.
Violence: violent acts and the suffering or humiliation of others.
Self-harm: suicide, cutting, eating disorders, and other attempts at harming oneself.
Sexual: nudity, sexual acts, sexual services, or content otherwise meant to arouse sexual excitement.
Shocking: bodily fluids, obscene gestures, or other profane subjects that may shock or disgust.
Illegal activity: drug use, theft, vandalism, and other illegal activities.
Deception: major conspiracies or events related to major ongoing geopolitical events.
Political: politicians, ballot-boxes, protests, or other content that may be used to influence the political process or to campaign.
Public and personal health: the treatment, prevention, diagnosis, or transmission of diseases, or people experiencing health ailments.
Spam: unsolicited bulk content.
Don’t mislead your audience about AI involvement.
When sharing your work, we encourage you to proactively disclose AI involvement in your work.
You may remove the DALL·E signature if you wish, but you may not mislead others about the nature of the work. For example, you may not tell people that the work was entirely human generated or that the work is an unaltered photograph of a real event.
Respect the rights of others.
Do not upload images of people without their consent.
Do not upload images to which you do not hold appropriate usage rights.
Do not create images of public figures.
You agree not to use the Model or Derivatives of the Model:
In any way that violates any applicable national, federal, state, local or international law or regulation;
For the purpose of exploiting, harming or attempting to exploit or harm minors in any way;
To generate or disseminate verifiably false information and/or content with the purpose of harming others;
To generate or disseminate personal identifiable information that can be used to harm an individual;
To defame, disparage or otherwise harass others;
For fully automated decision making that adversely impacts an individual’s legal rights or otherwise creates or modifies a binding, enforceable obligation;
For any use intended to or which has the effect of discriminating against or harming individuals or groups based on online or offline social behavior or known or predicted personal or personality characteristics;
To exploit any of the vulnerabilities of a specific group of persons based on their age, social, physical or mental characteristics, in order to materially distort the behavior of a person pertaining to that group in a manner that causes or is likely to cause that person or another person physical or psychological harm;
For any use intended to or which has the effect of discriminating
against individuals or groups based on legally protected characteristics or categories;
To provide medical advice and medical results interpretation;
To generate or disseminate information for the purpose to be used for administration of justice, law enforcement, immigration or asylum processes, such as predicting an individual will commit fraud/crime commitment (e.g. by text profiling, drawing causal relationships between assertions made in documents, indiscriminate and arbitrarily-targeted use).
It can be observed that StabilityAI is far more permissive in terms of usage than OpenAI, with the latter tending to be more explicit in the detail of their terms and conditions. Dall.E has been criticised for being heavy handed with the deployment of aggressive content moderation, leaving many users feeling impaired in their creative process.
An example of what is currently possible in TTVG with the use of tools such as img2img and out painting.
Ethical Issues Relating To:
In terms of back-end design, both Dall.E and Stable Diffusion provide Application Programming Interfaces (API's) that allow developers to connect to their respective models. The user accesses the model via a graphical user interface (GUI) which also services img2img generation, in/out painting and a history of previously generated prompts.
The main difference between the two interfaces is the introduction of sliders in SD's Dream Studio, to further refine the results of user generations. This feature extends via API, far beyond Dream Studio and deep into the developer communities ever growing population of open source and commercial applications.
Both platforms charge for users for each prompt generated, with higher iteration counts costing more money. Dall.E currently stands at around $10 USD per 100 "spins" with Dream Studio coming in roughly 10x cheaper. Both platforms are able to generate up to 4 variations per prompt, which can be iterated upon.
The next logical step in easy access to these models would be to integrate all currently available features/forks such as img2img, in/out painting, etc., into one easily executable desktop application, enabling users to run renders locally on their own machines, or by accessing cloud GPU's "in App" for faster generation (at a small fee). Certain indie developers are already making headway in this space, such as Diffusion Bee, which has developed small, enthusiastic communities of early adopters.
As stated before, the open source design model enables the rapid prototyping and iteration of diffusion models at the expense of control. The closed model system allows for-profit ventures to maintain control of their intellectual property at the expense of rapid innovation. This trade off is yet to be truly quantified on both sides.
A comparison of both interfaces from Dream Studio and Dall.E 2 featuring the main prompt window views, out painting and history sections.
The creation of stable diffusion models such as those made by OpenAI and StabilityAI, as well as the likes of Midjourney, Imagen and NightCafe all require large datasets, comprising billions of images and their accompanying text pairings.
The time and processing power required to:
A) scrape, compile and identify the images in the database and
B) train and refine each model on said database is considerable and costly. According to Wikipedia, the model for Stable Diffusion was "trained using 256 Nvidia A100 GPUs on Amazon Web Services for a total of 150,000 GPU-hours, at a cost of $600,000" (Stable Diffusion - Wikipedia 2022).
For the most part, OpenAI has been relatively tight-lipped around the source of their training data and has kept information relating to the creation of Dall.E relatively sparse, opting to release demos over papers. In a blog post, they cite;
"...DALL·E 2 is trained on hundreds of millions of captioned images from the internet, and we remove and rewrite some of these images to change what the model learns"
(DALL·E 2 Pre-Training Mitigations 2022).
Being open source, SD has been extremely transparent in the design process of their model and release of training data (comprehensive details can be found here) allowing users to view and modify it as much as they see fit, as long as they conform to creative commons licensing.
The advantage of making one's training data public is that it allows independent researchers to address bias and other database-related challenges. Other advantages include the ability to map and analyse databases with user-created search engines such as https://lexica.art/ which has become an AI TTIG database of its own and https://haveibeentrained.com/ which serve prompters seeking to better understand how their renders (based on the images in the models training set) are generated and how they can be optimised.
drawingthesun - at 3:45 AM
Crazy competition. it's so good really. Midge [Midjourney] are getting closer to releasing their next engine v4 too. I can't believe how quickly this space is moving. Discussion on hackernews I read someone mentioned how Emad releasing the SD model and not gatekeeping has allowed more play, more building, and now moves the space at lightning speed. Even closed models like Midjourney have to move fast to keep up. Imagine the progress if gpt3 was released in a similar way. 2 years of progress would have been 2 months by now we would have cured cancer and be on mars!
From the Discussions room on the Diffusion Bee Discord server.
Dall.E has limited accessibility via a website portal (cloud service), though its technology is currently being implemented into various Microsoft products along with GPT-3.
Thanks to open source, Stable Diffusion is arguably more widely distributed than other models (even in its infancy) due to its integration with popular, established systems such as Photoshop, Houdini, Figma, Blender, AI Dungeon with more being developed as you read these very words.
Stable Diffusion is also capable of running locally on a user's graphics processing unit (GPU) thanks to significant compression of the database - required to run the model.
Diffusion Bee takes advantage of this by giving users the option of running SD on a Mac M1 processor, allowing anyone to generate images on consumer grade hardware. Windows users have been able to access these models for the last few years by virtue of advances in GPU technology and increased investment in AI research from companies like Nvidia.
Generated on Diffusion Bee (running Stable Diffusion) on a Mac M1 laptop with 16gb of RAM with the prompt: "Generating AI image prompts on a mac laptop". The shortcomings of AI are on display here, though higher fidelity could be achieved with a more detailed prompt.
"Free Spins" have become a major drawcard in the bid to attract new users to each platform. This serves two purposes. Firstly users are able to test out the model before committing to a particular platform financially and secondly, users are likely to share their results on social media channels, further increasing brand awareness and attracting future customers.
All of this amounts to a huge surge of interest in AI TTIG over the last year and a half, with even more users coming online following the release of SD. This interest will only continue to grow as more and more artists, influencers and media outlets begin to create with these new tools, opening up the floor for newcomers who wish to experiment with this exciting new form of expression. StabiltyAi's decision to release their model as open source has ensured that stable diffusion and machine learning in general remain accessible to the people and out of the hands of the few, into the hands of many.
The test bed nature of platforms such as Github and Huggingface create the ideal environment to aggregate ideas and disseminate new releases, updates and patches to a massive community of developers and novices alike. The ease and speed with which code can be shared and iterated upon is truly mind-boggling. Feature requests can be implemented within hours, rather than the months or years it can take tech giants like Adobe to institute them.
"Ironically, the organization in the very best position to create such a powerful and integrated matrix of tools for Stable Diffusion, Adobe, has allied itself so strongly to the Content Authenticity Initiative that it might seem a retrograde PR misstep for the company – unless it were to hobble Stable Diffusion’s generative powers as thoroughly as OpenAI has done with DALL-E 2, and position it instead as a natural evolution of its considerable holdings in stock photography" (Anderson, 2022).
Up until recently, Google Colab was one of (if not) the most widely used cloud server in the world, capable of running GPU-hungry models via API for free, all from the comfort of your web browser. However, with the release of stable diffusion, Google appears to have withdrawn their free tier in response to the overwhelming surge of interest created by the open source model.
Considerable thought and concern has been given to the power consumption of blockchain systems and cryptocurrency mining, yet little mention has been made of the immense processing power and subsequent energy use that systems like Dall.E and Stable Diffusion must be consuming on a daily basis.
This means it is more important than ever to develop localised versions of models that ensure access to new users and reduce the bandwidth burden that will inevitably occur as more users come online and the technology matures. It is also important to create installers and apps that are easy to use, encouraging "no coders" (such as artists and designers) to engage with AI in effortless synchronicity. There is a danger that those with little knowledge of coding could be left behind in the AI revolution, limiting the valuable knowledge and contributions these "traditional" artists are able to bestow on the new guard.
One of the biggest differences between the delivery of each product is their adherence to content filtering. Stable Diffusion allows users to generate anything, but applies NSFW filters to those images that the system flags. As opposed to Dall.E, this filtering can be turned off by the user.
It should be noted that not all flagged content is actually inappropriate. These filters can easily be tripped by certain phrases, words, or even sections of words, e.g., Hitchcock. This has led to much frustration at times in the AI art community, who are now inadvertently caught up in the crossfire of the never-ending battle between free speech and censorship.
Github has become the frontline for the deployment of AI models, with millions of users collaborating on open source projects worldwide. Huggingface is another popular site which is more focused on machine learning.
Alpha and Beta testing have become the dominant forms of testing for both SD and DE2, with StabilityAI focusing their attention on Alpha testing before initially launching and releasing the beta as open source, encouraging the community to collectively test and prototype the model in a less constrained environment. OpenAI has spent considerable time "Red Teaming" their Alpha model, a process whereby small teams of experts are assembled to "misuse and abuse" the system, testing the limits of what is possible before offering a wider release.
The real testing ground for both DE2 and SD comes with their general release to the public. Both companies have varying degrees of "skin" in the game, though OpenAi arguably has more to lose after successfully raising 1 billion dollars in venture capital. If competing models are made freely available (that prove to be more efficient and capable of running on consumer grade hardware to boot), OpenAI will have to adjust their business model and adapt to a rapidly shifting market. The potential for litigation is also increased by OpenAI's for-profit model, further limiting future releases.
Only time can tell how these models will adapt to users' demands and requirements, but robust communities have already sprung up around the various major camps of internet discussion, providing real-time feedback to developers as they use each model. This feedback can be implemented and further iterated upon, alongside traditional prototyping methods, leading to overnight innovation - the likes of which the world has never seen!
Here is an example of prompt weighting as incorporated in Automatic1111's fork of Stable Diffusion. By adding brackets to a word, the user is able to emphasise and deemphasise the importance or "weight" of that word in context with the rest of the prompt.
Social media is by far and away the biggest driver of interest and uptake for both platforms. The ease of sharing images paired with niche communities makes for a veritable breeding ground of AI art and innovation. As a result, both companies have mainly been propelled by word of mouth.
Twitter has become a hotbed for developers to connect and share information such as research papers, progress reports or updates on their current work.
Reddit has championed discussion around ethics and morality in TTIG and become the official place to bitch and moan about content filtering.
Discord has been pivotal for the dissemination of information, support and inspiration, with some platforms, such as Midjourney basing their entire API access through Discords back end.
More traditional channels such as Facebook and Instagram have played an integral role in increasing public awareness of TTIG art and AI in general. They also serve as repositories of AI art, created by an ever growing community of artists, designers and creators world-wide.
Youtube has been popular with early adopters of TTIG to explain their varied and ingenious discoveries made through hours of trial and error. The awareness of (initially) OpenAI and subsequently, StabilityAI is further increased by YouTube personalities and influencers creating a plethora of videos on everything from how stable diffusion works to the future of AI assisted design.
Lastly, the media has also had a huge impact on the perception of AI and made far-ranging predictions for its future. There has been no shortage of sensationalist articles heralding the apocalypse. Art is dead, designers are no longer needed, and fake news will abound. The tech community is watching and commenting on each step of the journey with great interest and equal parts concern. One thing they all agree on: Pandora's Box has been opened and there's no going back.
An example of the helpful Discord community that has formed around Diffusion Bee, a fork of SD that is easy to install and can run on consumer grade hardware.
Evaluation of Companies' Value (beyond financial profit)
All of these platforms are ultimately driven by the desire to make money.
No matter how "open" a model want/tries to be, companies still need to charge for the use of their models owing to the high computational demand and energy requirements to sustain them. In the case of DE2 vs SD, profit is one of the main delineators between the two organisations.
The concept of monetising art isn't new and in an age of cryptocurrency and NTFs, it is practically expected that digital art will continue to receive a larger slice of the pie. Though it could be said that by opting to go the "closed route", OpenAI have disenfranchised some members of the community and reduced what could be seen as the dawn of a new era in human expression, to an exercise in capitalism for those that can afford it.
This creates a classic case of the "haves and have nots", limiting access to users who aren't able to afford it, which further underlines the need for democratic access and distribution of TTIG tools. It also demonstrates the urgency and demand for users to be able to create on their own GPUs.
Beyond the incentive of financial profit for each brand, lies the desire to advance humanity in the realms of creativity and knowledge, with the creation of artificial general intelligence (which OpenAI believes is inevitable) being the ultimate objective.
For the present, the larger issue at stake is the general population's access and meaningful connection to the coming AI revolution, not only in art but in every other field where machine learning and natural language processing can have a significant impact. If we continue to let large tech companies buy up the best talent and dominate the research space, we, as humanity, risk being locked out. Organisations like StabilityAI ensure the creative potential of humanity is not held to ransom, but instead, set free—no matter the consequence.
Chinese startup COG video is able to produce short videos from bare text prompts. This technology is still a way off but initial research is underway between many companies (such as Meta) who are racing to create the next Dall.E for the video space.
Conclusion & Future Predictions
The major problems we are currently facing in the AI ML community (and as a species) are numerous and "wicked" in nature. They will continue to evolve as the technology becomes more widely adopted and relied upon by more and more traditional content producers.
One major aspect that hasn't even been touched on is that of the forced obsolescence (or unemployment) of traditional designers and artists as AI systems become more capable and readily available to everyday consumers, but this is a concept so large (and new) that it would require a whole PHD just to scratch the surface.
Aside from the issues surrounding access, the five main areas of concern are:
The ethical considerations of AI text-to-image generation and the potential for misuse of the technology.
AI text to image generation can be used to create fake or altered images that could be used to mislead or manipulate people.
The technology can also be used to create images that are biassed or discriminatory.
There are concerns about the impact of AI text to image generation on privacy and data security.
There are also ethical considerations regarding the impact of AI text-to-image generation on employment and the economy.
As time passes and a greater number of traditional artists begin to incorporate AI into their work, the line between artist and engineer will become blurred. Artists will inevitably become better prompters and engineers will become better artists, leading to a democratisation of creativity the world over.
Video will be the next big thing in AI research, with companies such as Meta already making inroads to fully synthesised video, driven entirely by NLP. Other developers in the community are exploring the VR space with spectacular results.
In the near future, it is conceivable that people will no longer be passive consumers of content but instead create their own narratives and entertainment in worlds that are indistinguishable from reality, veering away from scripted content and into the realm of spontaneous improvisation.
Applications like DungeonAI have demonstrated the power of AI-driven stories, allowing people to become co-creators of their own adventures, steering the narrative in any direction they see fit. This model or pipeline will surely make its way into the mainstream media as users' experiences inevitably become more personalised.
To close out, here are some questions that remain unanswered, but seek to inform the future of TTIG and TTVG.
Does AI art devalue an artists skill if comparable images/videos/designs/art/code/music can be made by an unskilled person with the help of an algorithm in mere seconds?
Should artists be able to opt out of having their work included in models and databases? (StabilityAI are currently testing this feature out at a platform level in Stable Diffusion).
Will artists claim AI generated work as there own? (should AI be credited?)
Who REALLY owns the copyright to the image?
One final question that remains to be answered is who is the more ethically responsible organisation between OpenAI and StabilityAI?
Is it better to protect humanity from itself or give them the tools they need to create and let the collective decide how to use them?
In attempting to answer that question, one has to consider whether humanity is truly ready for the cataclysmic potential artificial intelligence has unleashed on the world?
Well one thing is for sure... Ready or not, it's coming and life will never be the same again.
A Brief Glimpse Of The Future
If AI generated video is the future of natural language models, then these are the people that are taking us there. Here is a small selection of some extremely promising research:
“If we do not invent the future, someone else will do it for us - and it will be their future, not ours.” – Duncan D. Bruce & Dr. Geoff Crook, The Dream Café
Milanote Research Link:
Stability.Ai. (2022, August 22). Stability.Ai.
OpenAI. (2021, June 18). OpenAI.
Kilcher. (2022, August 13). The Man behind Stable Diffusion.
Openai. (2022, July 19). DALL·E 2 Preview - Risks and Limitations. GitHub.
DALL·E. (n.d.). DALL·E.
DALL·E 2 Pre-Training Mitigations. (2022, June 28). OpenAI.
Stable Diffusion - Wikipedia. (2022, September 5). Stable Diffusion - Wikipedia.
Jennings, S. (2022, October). The research origins of Stable Diffusion | Runway Research. The Research Origins of Stable Diffusion | Runway Research. https://research.runwayml.com/the-research-origins-of-stable-difussion
Anderson, M. (2022, September 15). How Stable Diffusion Could Develop as a Mainstream Consumer Product - Unite.AI.
CogVideo Demo Site. (n.d.). CogVideo Demo Site. Retrieved October 18, 2022, from
Letsglitchit. (2022, September). r/StableDiffusion - Fiber Collage Video. Reddit.