Armchair Engineering Some Drupal Talks about AI

I am not a Drupal developer and I never work with it. However, Bob works with Drupal all the time and I thought I'd cover some of the talks he didn't get a chance to see at MidCamp.

Midwest Drupal Camp (MidCamp) just happened, and it's an annual event for the Drupal framework. Drupal is a framework for developing websites.

I'm not a Drupal developer but there are concepts that transfer over between any language or framework when it comes to some of the new AI technologies that are becoming more popular.

There were 4 talks that seemed to be AI related, and I thought I'd go over each one and give the short version of them while explaining concepts rather than too much detail. While also adding some other news intermixed that's related.

Revolutionizing Conversations: AI Chat Integration on YaleSites | Drupal.tv

Drupal.tv

Randy Oest Creative Director at Four Kitchens

Randy went over what Four Kitchens built for Yale University. It was hosted at ask.yale.edu and looks something like this:

It was built for https://hospitality.yale.edu and has that data in mind.

Yale approached them with a proof of concept they had running on Azure OpenAI Studio. Yale had a special message for Four Kitchens to create with AI and Randy likened it to the space race and JFK asking congress to make going to space a priority.

All-in-all they spent roughly 5 weeks with 3 people from Four Kitchens and 2 people from Yale working on this project. He guesstimated 32 hours per person per week.

They used:

GitHub - microsoft/azurechat: 🤖 💼 Azure Chat Solution Accelerator powered by Azure Open AI Service

🤖 💼 Azure Chat Solution Accelerator powered by Azure Open AI Service - microsoft/azurechat

GitHubmicrosoft

Which is a react app and they dropped it into a custom block within Drupal.

Defined successful launch as:

Webpage to show AI search at custom domain

Conversational tone of the chat interface that should have a personality

Data from Yale Hospitality Drupal website

Way to drop this in to any Drupal site that was YaleSites

Challenges:

First big AI project.

Prompts are not deterministic.

Poorly structured content resulted in wrong results.

Fun challenge to get tone and conversational

They “red teamed” the chat interface. Try to exploit the AI for problems.

It seems to me that they built was a RAG, which is short for Retrieval-Augmented Generation. Although, it was emphasized as AI powered search during the talk.

An interesting insight:

Things that make it more readable for a person will also make it easier for AI, using headings and lists. Content strategy practice has started to adopt AI as a user of content.

Randy compared launching to photo-shoot of clothing on fashion models to look great for photography and not that it fits great on person. Compared clips holding clothing on model to there being proverbial clips on their launch to hold things in place. Throughout this talk Randy wanted to emphasize the exploratory effort and ancillary benefits could come to YaleSites from understanding gained on this initial AI project.

GitHub - yalesites-org/ai_engine: Transform websites into AI-powered platforms.

Transform websites into AI-powered platforms. Contribute to yalesites-org/ai_engine development by creating an account on GitHub.

GitHubyalesites-org

One thing I noticed was that they relied heavily on Azure and OpenAI. I thought it was worth adding that there is a way to get free credits for people who have small and relatively new companies. You can apply over at Microsoft's website:

Microsoft for Startups FoundersHub

Now one of the interesting parts of the talk that perked my ears up was that they "red teamed" the chat interface and the ethics were "weirdly easy to solve".

So I thought it would be interesting to test out some prompts to circumvent their guardrails. You can check it out here in this separate post that explains it in more detail.

Vincenzo Gambino Drupal Developer from London, UK with 14 years of experience

Vincenzo gives a good primer that is more exhaustively detailed - in a good way. He gives very solid definitions and explaining of the basics, while also recapping how we got here over the last 14 months after the explosion of ChatGPT making people see the potentials of a new technology.

Vincenzo does a great job of building up to show an example of a RAG. He builds up to it but then gives a breakdown of what it looks like with an excellent block diagram.

You can think of RAG as a way to have a knowledge-base that is part of your communication with a non-deterministic LLM. Examples I think of when thinking about a RAG app:

PDF.ai | Chat with your PDF documents

We built the ultimate ChatPDF app that allows you to chat with any PDF: ask questions, get summaries, find anything you need!

Chat with your PDF documents

Have a chat with your documents. This is a simple but incredibly powerful feature.

I think about the upload button on ChatGPT and giving it any kind of text-based document. It's likely doing something RAG-like

It's worth breaking down some concepts and showing some good visualizations to kind of gain an intuition about what is going on. I stumbled across this video online that does a fairly good job of explaining very succinctly.

0:00

/1:07

Short form video from this channel https://www.youtube.com/@3blue1brown

Embeddings

An embedding is a vector (list) of floating point numbers. The distance between two vectors measures their relatedness. Small distances suggest high relatedness and large distances suggest low relatedness.

-Search (where results are ranked by relevance to a query string)

-Recommendations (where items with related text strings are recommended)

-Classification (where text strings are classified by their most similar label)

Vincenzo uses OpenAI tools to convert files and text into embeddings that then get stored within a managed vector database service like Pinecone.

I believe you can also use pgvector as an open-source alternative.

Within Drupal ecosystem it seems that they refer to dependencies as modules. The talk covers sub-modules from this module:

OpenAI / ChatGPT Integration

The OpenAI module aims to provide a suite of modules and an API foundation for OpenAI integration in Drupal for generating text content, images, content analysis and more.

Drupal.org

Walks through how to setup in Drupal 10:

composer require 'drupal/openai:^1.0@beta'

Then run:

drush en openai -y

Go and make OpenAI account and generate API key.

CKEditor Submodule

This sub module provides a CKEditor button to generate, summarize, translate and change the tone of a text through a dialog.

To install: drush en openai_ckeditor -y

OpenAI Content Tools Submodule

This sub module provides Changing tone, Summarise and Suggest taxonomy functionality.

Extend OpenAI Module

That uses the openai-php/client package.

ChatGPT Content Assistance Module

Create Content, Translate Content, and Create Images

Vincenzo shows quite a bit of what's possible and the last 15 minutes of his talk are worth a watch just to see the demos.

Jeff McWherter, Chief Operating Officer, from Gravity Works

Jeff gives a good talk with new insights I hadn't thought about before. As any talk about the ethics of a new technology, it usually does have a cynical and negative somewhat outlook. I appreciated his talk and he helps point out important things to keep in mind of how machine learning models can go awry.

Bias

Prompt about “two developers at drupal midcamp looking at a computer” for text to image.

Beards, glasses, flannel. Beanie on one head. Two Caucasian dudes.

Amazon ethical issue of internal hiring tool discriminated against women and it was quickly pulled. There’s more men in IT that get jobs therefore men are better fit is what the data says.

Why Amazon’s Automated Hiring Tool Discriminated Against Women | ACLU

The ACLU dares to create a more perfect union — beyond one person, party, or side. Our mission is to realize this promise of the United States Constitution for all and expand the reach of its guarantees.

American Civil Liberties UnionRachel Goodman

This wasn't mentioned within Jeff's talk, but I feel like it's worth mentioning how in one model it was found to have phallocentricity:

The central node of ChatGPT’s signifying network is… the phallus.

Lacan remains unbeaten. Nobody tell Zizek about this. pic.twitter.com/7hBXfKdiih
— the future lasts forever (@pourfairelevide) February 24, 2024

This makes you wonder about why that is and where that comes from. What's going on here?

Transparency

If AI is making a decision about your life, you should have an explanation. Why didn’t you get the job? That gets tough. Black Box Models. Transparency about the process and criteria.

Fairwashing

Concept first created in paper around 2019, the idea of a company designs the brain of computer keeps it hidden and claims it is fair.

Digital Epidermalization

That projected information onto the body. Facial recognition is optimized for Caucasian people. It’s the machine telling you that you’re strange. People assume that the machines are neutral but the machines are not neutral.

The term "epidermalization" in the context of discrimination refers to a concept from Frantz Fanon, a French West Indian psychiatrist and political philosopher, particularly known for his works on the psychopathology of colonization and the human, social, and cultural consequences of decolonization. In his influential book, "Black Skin, White Masks" (1952), Fanon uses the term "epidermalization" to describe the way racial identity is ascribed to individuals based purely on skin color, ingraining social and psychological hierarchies based on race.

Fanon discusses how the social, political, and economic circumstances associated with colonialism imprint a racial identity on the colonized, reducing their self-perception and societal perception to their skin color. This "epidermalization of inferiority" means that racial distinctions are deeply embedded as inherent characteristics, affecting how individuals perceive themselves and are perceived by others. This concept helps to explain the deep-seated nature of racial biases and the profound psychological impacts of racism.

Data Flattening Problem

Spawning | Have I been Trained?

Search for your work in popular AI training datasets

Have I been Trained?Spawning AI

Robots stealing from artists.

To the point that Jeff is making:

with all the Sora stuff happening right now,

I feel like I spent 20 years of my life swimming suddenly they let motorboats into the Olympics. pic.twitter.com/yChVRjcSgS
— Frog_Glasses (@Frog_spectacles) February 19, 2024

Distribution of Wealth

500 billion of new household wealth by 2045, but where is it going?

Jeff mentioned that Sam Altman, CEO of OpenAI, is a Georgist. He was paying people for their bio-metric data. He pays them in crypto. It's a strange activity and doesn't exactly instill trust.

Sam Altman Is Bringing Worldcoin’s Controversial Eye-Scanning Orb to Reddit and Microsoft

World ID has added integrations with Shopify, Minecraft, and Reddit alongside a slew of developer-focused updates that could expand the OpenAI founder’s blockchain-based “proof-of-personhood” service to more users.

CoinDeskSam Kessler

I had to lookup what a Georgist is exactly.

But he remains keenly interested in politics. Altman’s beliefs are shaped by the theories of late 19th century political economist Henry George, who combined a belief in the power of market incentives to deliver increasing prosperity with a disdain for those who speculate on scarce assets, like land, instead of investing their capital in human progress. Altman has advocated for a land-value tax—a classic Georgist policy—in recent meetings with world leaders, he says.

Sam Altman Is TIME’s 2023 CEO of the Year

Altman emerged as one of the most powerful executives in the world, the public face of a technological revolution.

TimeNaina Bajekal

Being a Georgist sounds noble, but so did being an Effective Altruist at one point.

Environmental Problem

Cost to train and cost to inference in terms of energy. Basically, this is data center costs.

To Jeff's point, Amazon just recently bought a new data center that is nuclear powered:

Amazon just bought a 100% nuclear-powered data center

One of the US’s largest nuclear power plants will directly power cloud service provider Amazon Web Services’ new data center.

ElectrekMichelle Lewis

The Alignment Problem of AI

“When people talk about ethical AI, instead of talking about Skynet; we should think about working conditions, climate change, and how to make the economy serve humans rather than the other way around.”

Martin Anderson-Clutz, Senior Solutions Engineer at Acquia

Martin gave a detailed talk as well, giving good definitions and interesting opinions. However, I will not be repeating some of the repeated topics due to trying to keep this more concise. I will be highlighting insights from his talk.

AI Will Transform the Global Economy. Let’s Make Sure It Benefits Humanity.

AI will affect almost 40 percent of jobs around the world, replacing some and complementing others. We need a careful balance of policies to tap its potential

IMFKristalina Georgieva

He casually asked whether or not they felt worried about losing their job and then cited the following joke:

In order to replace developers, customers will need to clearly state their requirements.

To Martin's point, there was Devin AI that came out with a very popular demo video showing off "the first AI Software Engineer" completing work on a freelancing site:

Introducing Devin, the first AI software engineer

but it was quickly debunked as more hype than reality:

Modules

Augmenter AI ← works with other large language models, augmenters as configuration entities works with Drupal 9 and 10

AI Interpolator ← Designed to work in the back end, generating field values. No creator widgets. Lots of service integrations.

During the Q and A

Towards the end of his talk, Martin briefly mentioned the idea of using a RAG as a “source” engine for writing content. I thought that was a pretty good idea and worth highlighting. Since many of the tools within these talks were about making content go on websites and they still require a human-in-the-loop.