Image Generation

While GAI-created images are becoming more impressive almost every few months, as of May 2024 it’s still fundamentally ‘art generation’ and we’re a very long way from AI being able to generate complex technical diagrams. That said it is possible where certain diagrams have strict design schemas which an LLM can generate code for via a prompt, e.g. simple process flows via Lucid GPT or Mermaid diagrams using the Claude Sonnet 3.5 ‘artifacts’ interface, here’s an example of the latter where the prompt was simply to generate a flow diagram for a social science research methods process of its choice: 

GAIR_Mermaid_Diagram_Methods

Here's an example of a more conceptual diagram to illustrate the value and risks of using LLMs given pre-existing domain knowledge, which can be presented as a flow diagram of sorts, but it’s a bit of a hack and more difficult to customise if you need to adapt relative spacing - for this example the layout works perfectly well though:

LLM value 1 High Value Zone

LLM value 2 Useful Zone

LLM value 3 Danger Zone

The Claude Sonnet 3.5 Artifacts feature also generates live web views using modern web user interface controls which can create visually appealing infographics from provided text which could be a useful way to create visually attractive presentation slides. Here's an example where the prompt was simply "Attached is an academic paper. Can you please generate a nice modern infographic using React elements in the artifacts interface that summarises the main findings from the research in a cool visual way?" based on an uploaded paper:

GAIR_Infographic

As far as academic research is concerned though, art generation generally is of limited value outside dissemination (presentations, posters, blogs etc.), though there is potential for generating images designed to elicit responses in psychological tests. Currently, the best quality images are produced by MidJourney, however it tends to be poor at following specific instructions and is notably terrible with text. Dall-E 3 (which can be accessed via MS Copilot,  formerly known as Bing Chat Enterprise, but is also built in to Chat GPT Plus or Chat GPT Team) is far better (but still far from perfect) at following instructions but the aesthetic results are lower quality visually.

Below is an example comparison asking for a suitable concept image for the title slide of a presentation for a work in progress research seminar on urban green space in London, the research for which comprises a combination of GIS and econometrics.

Here's the result from Dall-E 3 – which tried its best to tick all the keyword boxes, almost to the extent that it’s showing off! This results in arguably a less visually appealing image due to being so visually ‘busy’:

GAIR_Dall-E

Here’s the result from MidJourney for the same prompt, while still a little busy it's simpler and more appropriate for a conceptual PowerPoint title slide:

GAIR_Midjourney

The inherent mismatch between text and image means it will always be difficult to create the perfect visual from a text prompt alone, but with some experimentation it’s possible to get decent quality results. Here’s a more frivolous example for a presentation on digital transformation at LSE, asking it to take an existing photo of the LSE Campus and re-imagine it in the year 2300 with a high-tech aesthetic, absolutely fine for a fun artistic touch on a presentation or blog, but certainly nothing you would want to commission an architect firm for:

GAIR_Midjourney_LSE_Campus