9 ChatGPT and All That

(Chapter under construction last updated 23 July 2024)

ChatGPT was released 30 November 2022, Bard now called Gemini followed in 21 March 2023; they are Large Language Models (LLMs) designed for generating human like text, based on the input it receives. As this Chapter will show their ability to generate coherent and contextually relevant human-like text, in a short time frame, makes it invaluable for a variety of applications. Importantly ChatGPt and Gemini have the advantage that they are free to use¹.

9.1 Introduction

Large language models (LLMs), such as ChatGPT and Gemini, have emerged as powerful tools with impressive capabilities, but like any technology, they come with their own set of advantages and disadvantages:

Advantages:

Improved language processing: LLMs excel at tasks like text generation, translation, question answering, and summarization. They can process vast amounts of information and produce human-quality outputs.
Efficiency and automation: LLMs can automate repetitive tasks involving language analysis, saving time and resources. They can also analyze large datasets to identify patterns and trends that humans might miss.
Personalization and customization: LLMs can be fine-tuned to specific domains or purposes, making them adaptable to various applications. This allows for personalized experiences and tailored solutions.
Enhanced creativity: LLMs can generate different creative text formats like poems, code, scripts, and musical pieces, fostering new ideas and artistic expressions.

Disadvantages:

Bias and fairness: LLMs inherit biases present in their training data, potentially leading to discriminatory or offensive outputs. Addressing bias and ensuring fairness requires careful monitoring and data selection.
Lack of interpretability: Understanding how LLMs arrive at their outputs can be challenging, making it difficult to assess their accuracy and reliability. This lack of transparency can raise concerns about accountability.
Misinformation and factual errors: LLMs can sometimes generate factually incorrect or misleading information, especially when dealing with complex or nuanced topics. Careful fact-checking and human oversight are crucial.
High computational cost: Training and running LLMs require significant computational resources, making them inaccessible to many individuals and organizations. This raises concerns about equity and access to this technology.

LLMs hold immense potential for revolutionizing various fields, but they are not without their challenges. It’s crucial to be aware of both the advantages and disadvantages and to use responsibly. One way to address these limitations is to adopt the rule of thumb:

Don’t ask LLMs to do something that we couldn’t do ourselves.

9.2 LLMs and Reporting

Large language models can be valuable tools in writing reports by providing various forms of assistance. There are several ways in which they can help:

Content Generation: LLMs can assist in generating content for reports by providing information, summarizing key points, and suggesting ideas. It can help you articulate thoughts and structure sentences or paragraphs.
Research Assistance: By answering questions and providing information on a wide range of topics, ChatGPT can serve as a virtual research assistant. It can help you find relevant data and references to include in your report.
Grammar and Style Checks: Language models can assist in proofreading and improving the grammar, syntax, and overall writing style of your report. They can suggest corrections and improvements to enhance the clarity and coherence of your writing.
Idea Generation: If you’re stuck and need fresh ideas, LLMs can offer suggestions and prompt creative thinking. It can help you brainstorm and explore different angles for presenting your information.
Outlining and Structure: LLMs can help in creating outlines for your report, ensuring that your ideas are organized logically and coherently. It can provide guidance on structuring different sections and sub-sections.
Language Translation: If your report involves content in multiple languages, language models can assist in translation, ensuring accurate and clear communication.
Summarization: You can use LLMs to create concise summaries of lengthy passages or complex information, making it easier to present key points in your report.
Editing and Revision: Language models can help with revising and refining your report. They can suggest alternative phrasing, reword sentences, and provide constructive feedback to improve the overall quality of your writing.
Fact-Checking: While not infallible, language models can assist in fact-checking by providing information and cross-referencing details within their training data.

It’s important to note that while language models can be powerful tools, they may not be perfect, and their suggestions should be carefully reviewed. Additionally, they may not have the most up-to-date information, so it’s advisable to verify critical details from reliable sources. Integrating the assistance of a LLM into your writing process can save time and improve the overall quality of your reports.

9.3 Prompt Engineering

Prompt engineering involves crafting effective input prompts to guide large language models like ChatGPT in generating desired outputs. Some key points and hints for prompt engineering are:

Clear and Specific Instructions:
- Provide explicit and clear instructions in your prompt.
- Clearly specify the desired format, structure, or type of response you want.
Use of Context and Examples:
- Include relevant context or examples to guide the model.
- Illustrate the expected style or content through examples within the prompt.
Incremental Information:
- Consider providing information incrementally rather than all at once.
- Start with a general prompt and add details progressively to refine the response.
Tone and Length Adjustment:
- Experiment with the tone (mood and feel) to control the randomness of responses.
- Adjust the length (brevity or depth) to limit the amount of generated output.
Positive Reinforcement:
- Use positive reinforcement in the prompt to encourage specific behavior.
- Reward the model for correct or desired responses within the prompt.
Model constraints:
- Be mindful of model constraints.
- Keep prompts concise and/or incremental.
Task-Specific Keywords:
- Include task-specific keywords or cues to guide the model’s focus.
- Use terms that are likely to trigger the desired knowledge or behavior.
Experimentation and Iteration:
- Iteratively refine your prompts based on the model’s responses.
- Experiment with different phrasings and approaches to find the most effective prompt.
Clarification Queries:
- Include follow-up questions or clarification queries to guide the model’s understanding.
- Encourage the model to seek additional information if needed.
Bias and Fairness Consideration:
- Be cautious of unintentional biases in prompts.
- Regularly assess and adjust prompts to ensure fairness and impartiality.
Diverse Training Data Awareness:
- Understand the model’s training data diversity and potential limitations.
- Craft prompts that align with the model’s knowledge base.

By combining these elements, you can enhance the effectiveness of prompt engineering and guide large language models like ChatGPT and Gemini to produce more accurate and contextually appropriate responses.

A couple of acronyms SALT and LARF help with the prompt and LLM response respectively:

Prompt:

SALT as a guide to contructing the prompt.

Style - define the framework
Audience - tailoring the content
Length - brevity or depth
Tone - mood and feel

LLM response:

LARF can be used to assess the response from the LMM.

Logical consistency
Accuracy
Relevance
Factual correctness

9.3.1 Example Prompts and LLM Reponses

As illustration a few example prompts and the unedited LLM response are provided below. Have a go yourself! Entering the same prompt again is likely to generate a slightly different response (The LLM Gemini automatically generates three other drafts).

Tip

The LLM response is unlikely to be word perfect and will need some editing.

This gives the opportunity to review the LLM response for: logic; accuracy; relevance and factfulness.

Prompt:

Outline in a paragraph the protected species legislation in England relevant to the mammal bats.

ChatGPT:

In England, legislation specifically designed to protect mammal bats falls under the Wildlife and Countryside Act 1981, with subsequent amendments. Bats are designated as protected animals under Schedule 5 of the Act, making it illegal to intentionally harm, kill, or capture them. Additionally, damaging or obstructing access to bat roosts is prohibited. The Conservation of Habitats and Species Regulations 2017 further reinforces these protections by considering all bat species as European Protected Species. This legislation focuses on safeguarding bat habitats, making it an offense to damage or destroy their breeding sites or resting places. These legal provisions aim to ensure the conservation and sustainable management of bat populations in England, with violators subject to criminal charges and penalties. Collaborative efforts between wildlife authorities, conservation organizations, and the public play a crucial role in preserving the diverse bat species and their habitats.

Prompt:

In the style of William Shakespeare outline in a few verses the protected species legislation in England relevant to the mammal bats.

ChatGPT:

In fair England where law and nature meet,
A tale unfolds of bats, a winged fleet.
The Wildlife and Countryside Act, its decree,
In Schedule 5, bats find sanctuary.

No harm shall befall, no capture nor slay,
For bats are protected, by night and by day.
Their roosts and nests, a haven so dear,
Shall not be disturbed, as the law makes clear.

The Conservation of Habitats, a poetic guise,
In 2017, under regulatory skies.
Bats, of European Protected kin,
Their habitats shielded, a realm to win.

To damage or destroy, a forbidden endeavor,
Breeding sites and resting spots, now and forever.
Safeguarded by law, each fluttering sprite,
England’s legislation, in the bat’s protective light.

So, in the realm where Shakespeare’s quill once spun,
Bats find refuge ’neath the protective sun.
By legal verses, their wings unfurl,
A harmonious ode to England’s bat-filled world.

9.3.1.1 Structured LLM Prompts

Structuring LLM prompts with markdown offers several benefits for both the user and the model.

Tip

The LLM response is Markdown; the same format Quarto and RMarkdown documents use (see Section 8.2.2).

9.3.1.1.1 For the user

Clarity and organization: Markdown provides visual cues like headings, bullet points, and code blocks, making prompts easier to read and understand. This helps you structure your thoughts and instructions for the LLM.
- Headings: Use # and ## to create clear sections, making your prompt easier to scan and understand.
- Lists: Use * or - for bulleted lists and 1. for numbered lists to present information concisely.
- Code Blocks: Use triple backticks to format code snippets, improving clarity and preventing misinterpretations.
- Quotes: Use > for blockquotes to highlight specific instructions or important context.
- Bold and Italics: Use ** for bold and _ for italics to emphasize key points or terms within your prompt.
- Links: Use [text](link) to include relevant references or resources, enriching the context for the LLM.
- Tables: Use | and --- to create tables for structured data, improving the organization of complex information.
Consistency: Using a standardized format like markdown across different prompts ensures consistency and makes managing multiple prompts easier.
Emphasis and highlighting: Markdown allows you to emphasize specific parts of your prompt using bold, italics, and quotes. This helps guide the LLM towards certain aspects of your request.

9.3.1.1.2 For the LLM:

Improved understanding: With clear structure and highlighted elements, the LLM can better grasp the context and intent of your prompt. This leads to more focused and accurate results.
Reduced ambiguity: Markdown reduces the ambiguity that can arise from plain text prompts, leading to fewer misinterpretations by the LLM.
Enhanced creativity: Clear structure can guide the LLM’s creative process, allowing it to explore relevant paths within your specified framework.

Overall, structuring LLM prompts with markdown streamlines communication between you and the model, ultimately leading to better results.

Prompt:

A site in Devon UK has:
1. semi improved grassland
2. large pond
3. woodland
4. derelict buildings
What ecological constraints would be found in this type of habitat particularly the “protected species”.

ChatGPT:

A site in Devon, UK, with semi-improved grassland, a large pond, woodland, and derelict buildings may have specific ecological constraints, particularly concerning protected species. Here are potential ecological constraints and considerations related to protected species in each habitat:

Semi-Improved Grassland:
- Protected Species: Ground-nesting birds, such as skylarks or lapwings, may inhabit the grassland. Conservation measures should consider the breeding season to avoid disturbance.
Large Pond:
- Protected Species: Amphibians, including great crested newts, may inhabit the pond. These species are legally protected, and habitat disturbance should be minimized, especially during the breeding season.
- Waterfowl: Waterfowl may use the pond for breeding or as a stopover during migration. Nesting birds should be considered during habitat management.
Woodland:
- Protected Species: Bats often roost in trees within woodlands. Surveying for bat activity and roosting sites is essential. Some bat species are legally protected, and habitat disturbance should be avoided.
- Birds: Certain bird species, like woodpeckers or birds of prey, may inhabit the woodland. Breeding seasons should be considered during woodland management.
Derelict Buildings:
- Protected Species: Buildings may be used by bats for roosting. Surveying for bat activity and potential roosting sites before any renovation or demolition is crucial. Disturbing bat roosts is illegal.

Cross-Cutting Constraints: - Connectivity: The site may be part of important ecological corridors. Any disturbance or development should consider maintaining or enhancing connectivity for wildlife. - Invasive Species: Invasive plants and animals may threaten native species and disrupt ecosystems. Control measures should be considered to manage invasive species.

Conservation Considerations: - Ecological Surveys: Conduct thorough ecological surveys to identify the presence of protected species. This includes bird nesting surveys, bat surveys, and amphibian surveys. - Seasonal Restrictions: Implement seasonal restrictions for certain activities, such as construction or habitat management, to avoid disturbing protected species during critical life stages. - Habitat Enhancement: Implement habitat enhancement measures, such as creating additional ponds or installing bat boxes, to contribute positively to local biodiversity.

Compliance with relevant wildlife and conservation legislation, including the Wildlife and Countryside Act and Habitats Regulations, is crucial when managing sites with protected species. Consultation with ecological professionals is advised to ensure legal and ethical considerations are met.

9.3.2 LLMs and Your Writing Style

LLMs (ChatGPT / Gemini) can learn your writing and mimic that style in new text. This is particularly useful when you want to maintain a consistent tone and voice across different pieces of writing. Here’s how you can use LLMs to match your writing style:

Prompt:

Analyze the text below for style, voice and, tone. Create a prompt to write a new paragraph in the same style, voice and tone. [insert text here]

9.4 LLMs and Codeing

LLMs are adapt for generating snippets of computer code; for an example see Section 3.3.1.

Prompt:

Please produce a BibTex reference for the Bat Conservation Trusts “Bat Surveys for Professional Ecologists Good Practice Guidelines” 4th Edition 2023

ChatGPT:

@book{BCT-BatSurveys-4thEd-2023,
  title = {Bat Surveys for Professional Ecologists: Good Practice Guidelines},
  author = {{Bat Conservation Trust}},
  year = {2023},
  edition = {4th},
  publisher = {Bat Conservation Trust},
  address = {United Kingdom},
  url = {Link to the document},
  note = {Accessed: Date Accessed}
}

Prompt:

Write the SQL code for searching a GIS data set with “longitude” and “latitude” giving a defined “area”.

ChatGPT:

To search a GIS dataset with “longitude” and “latitude” within a defined area, you can use SQL with spatial functions. Assuming you have a table named YourTable containing spatial data with columns longitude and latitude, and you want to search within a specific bounding box (defined by minimum and maximum longitude and latitude values), you can use the following SQL code:

SELECT *
FROM YourTable
WHERE
    longitude BETWEEN min_longitude AND max_longitude
    AND latitude BETWEEN min_latitude AND max_latitude;

Replace YourTable with the actual name of your table, and replace min_longitude, max_longitude, min_latitude, and max_latitude with the specific values defining the bounding box of your search area.

If you are using a database that supports spatial types (like PostGIS for PostgreSQL, or SQL Server with spatial extensions), you may use the ST_Within or ST_Intersects functions for more complex spatial queries. Here is an example using PostGIS:

SELECT *
FROM YourTable
WHERE
    ST_Within(
        ST_MakePoint(longitude, latitude),
        ST_MakeEnvelope(min_longitude, min_latitude, max_longitude, max_latitude, 4326)
    );

This example assumes that the spatial reference system (SRID) is 4326, which is commonly used for latitude and longitude data. Adjust the SRID according to your dataset.

Make sure to check the documentation of your specific database management system for any additional spatial functions or syntax requirements related to GIS queries.

Prompt:

How do I read a .xlxs data file in RStudio with the package readxl.

ChatGPT:

To read an Excel file (.xlsx) in RStudio using the readxl package, you need to follow these steps:

Install and Load the readxl Package:

If you haven’t installed the package yet, you can do so using the following command:
```
install.packages("readxl")
```
Once the package is installed, load it into your R session:
```
library(readxl)
```

Read the Excel File:

Use the read_excel() function to read the Excel file. Specify the path to your Excel file within the function. For example:

# Replace "path/to/your/file.xlsx" with the actual path to your Excel file
data <- read_excel("path/to/your/file.xlsx")

If the Excel file contains multiple sheets and you want to read a specific sheet, you can use the sheet argument:

# Replace "Sheet1" with the name of the sheet you want to read
data <- read_excel("path/to/your/file.xlsx", sheet = "Sheet1")

Alternatively, if you want to read a specific range of cells, you can use the range argument:

# Replace "Sheet1" with the name of the sheet and "A1:C10" with the desired range
data <- read_excel("path/to/your/file.xlsx", sheet = "Sheet1", range = "A1:C10")

View or Inspect the Data:

After reading the data, you can view the structure of the data using functions like head(data) to display the first few rows or str(data) to get the structure of the data.
```
# Display the first few rows of the data
head(data)

# Get the structure of the data
str(data)
```

That’s it! You’ve successfully read an Excel file in RStudio using the readxl package. Adjust the file path, sheet name, or range according to your specific Excel file structure.

9.5 Summary

Large language models, showcase remarkable strengths in natural language understanding and generation. They excel at contextual comprehension, capturing semantic nuances, and generating coherent and contextually relevant text. These models demonstrate versatility across a wide array of tasks, from language translation to code generation. Their ability to generalize from diverse training data contributes to their broad applicability.

However, these models also grapple with limitations. They may produce outputs that lack factual accuracy or exhibit sensitivity to input phrasing. Concerns about ethical usage, biases in training data, and potential promotion of misinformation highlight societal challenges. Additionally, their sheer computational demands and energy consumption raise environmental concerns. Balancing the strengths and limitations of large language models requires ongoing research, ethical considerations, and continuous refinement to harness their potential responsibly.

9.5.1 In a Nut Shell

Large Language Models are good at incremental tasks solved sequentially:

Writing summaries
Answering questions
Imitating your writing style

Large Language Models are not effective at creative leaps:

Writing jokes
Developing scientific hypotheses
Creating new writing styles

9.5.2 Use LLMs in Your Workfow

Using LLMs is a skill achieved by integrating their application into your daily tasks:

Quickly generating written content (e.g. a paragraph in a report).
Drafting emails.
Summarising information.
Answering questions.
Generating snippets of computer code.

ChatGPT requires you to sign up for a free account. Gemini (formerly Bard) can be used free through your Google Chrome account.↩︎