Advent of Code [with AI]: Day 1

Advent of Code is an annual coding challenge where a new puzzle is released starting December 1st, all the way up to Christmas. Though AI is discouraged from those looking to formally “compete” on who can solve the puzzle the fastest, I think this is a great opportunity to solve the challenges (more or less) through prompting GPT-4 alone.

If nothing else, this should be a fascinating time capsule of the state of AI when we look back on this last year and how well it performed. We’ll be using ChatGPT Plus, which is currently running GPT-4 Turbo (Nov 6).

Day 1: Part 1

You can read the full day 1 challenge here. Wanting to see how well GPT handled this puzzle with minimal prompt engineering, I passed in the following –

Create a Python function that solves the following puzzle - 

On each line, the calibration value can be found by combining the first digit and the last digit (in that order) to form a single two-digit number.

For example:

1abc2
pqr3stu8vwx
a1b2c3d4e5f
treb7uchet
In this example, the calibration values of these four lines are 12, 38, 15, and 77.

*bolded text coming directly from the page itself.

GPT used the Data Analysis plugin and returned the following –

def find_calibration_values(lines):
    calibration_values = []

    for line in lines:
        # Extracting digits from the line
        digits = [char for char in line if char.isdigit()]

        # Combining the first and last digit if they exist
        if digits:
            calibration_value = int(digits[0] + digits[-1])
            calibration_values.append(calibration_value)
        else:
            # Append a default value (like 0) if no digits are found
            calibration_values.append(0)

    return calibration_values

# Testing the function with the provided example
test_lines = [
    "1abc2",
    "pqr3stu8vwx",
    "a1b2c3d4e5f",
    "treb7uchet"
]

find_calibration_values(test_lines)

Overall this approach looks good to me. Appending a default value of “0” might not be necessary and might not make sense but we’ll continue on.

Originally I planned on using a Jupyter Notebook to run the Python code it returned, but with the Data Analysis plugin it might be able to handle this all for us! I saved the 1,000 input strings to a .txt file and uploaded it to ChatGPT with the following prompt –

Take the attached text file and get all of the values for all of the inputs. Then sum all the values and return the result.

The result?

# Using the previously defined function to find the calibration values for each line in the file
calibration_values = find_calibration_values(lines)

# Summing all the calibration values
total_sum = sum(calibration_values)
total_sum

# The sum of all calibration values from the provided text file is 54,940.

I enter the result into AoC and… success! (Note: everyone’s input strings are different, so my correct answer won’t be your correct answer).

Day 1: Part 2

Part 2 of this challenge essentially asks us to convert text representation of numbers to digits.

We now need to modify our "find_calibration_values" function.

It looks like some of the digits are actually spelled out with letters: one, two, three, four, five, six, seven, eight, and nine also count as valid "digits".

Equipped with this new information, you now need to find the real first and last digit on each line. For example:

two1nine
eightwothree
abcone2threexyz
xtwone3four
4nineeightseven2
zoneight234
7pqrstsixteen

In this example, the calibration values are 29, 83, 13, 24, 42, 14, and 76.

Please update the function and then sum the calibration values again.

The first line and final line above are my own, the rest are from AoC directly. Here’s the code it gave me back to handle this new case –

# Re-importing the necessary library and redefining the updated function as the previous state was reset
def find_calibration_values_with_spelled_digits(lines):
    spelled_digit_map = {
        'one': '1', 'two': '2', 'three': '3', 'four': '4', 'five': '5',
        'six': '6', 'seven': '7', 'eight': '8', 'nine': '9'
    }

    calibration_values = []

    for line in lines:
        # Replace spelled-out digits with their numeric equivalents
        for word, digit in spelled_digit_map.items():
            line = line.replace(word, digit)

        # Extract digits from the line
        digits = [char for char in line if char.isdigit()]

        # Combining the first and last digit if they exist
        if digits:
            calibration_value = int(digits[0] + digits[-1])
            calibration_values.append(calibration_value)
        else:
            # Append a default value (like 0) if no digits are found
            calibration_values.append(0)

    return calibration_values

# Since the state was reset, we need to read the file again
file_path = '/mnt/data/day1.txt'

with open(file_path, 'r') as file:
    lines = file.readlines()

# Re-calculating the calibration values with the updated function
new_calibration_values = find_calibration_values_with_spelled_digits(lines)

# Summing all the new calibration values
new_total_sum = sum(new_calibration_values)
new_total_sum

This looks pretty reasonable and returns “53,595”, but the answer isn’t correct!

What happened? Is the code bad? Does ChatGPT not understand the requirements?

The issue is actually that the AoC prompt and test cases aren’t clear enough. There’s a tricky edgecase not captured in the examples they give us. That is an input like “oneight” should be translated to “18”. When using the “replace” method above, this will never work because the code will convert this to “1ight”.

I know this because this was a sticking spot for me and many others, so I wouldn’t blame ChatGPT for this unclear requirement. Another tricky aspect of adding this logic is we need to account

Knowing where the misstep is coming from, I first want to see if ChatGPT will solve this if we give it a single sample input/output, so I sneak my “oneight” example into the sample outputs and “18” into the relevant calibration value sample.

No dice, it uses the same approach and gives me back 53,595.

After many, many attempts with the prompt, I finally come to one that gets a correct result –

We now need to modify our "find_calibration_values" function.

It looks like some of the digits are actually spelled out with letters: one, two, three, four, five, six, seven, eight, and nine also count as valid "digits".

Equipped with this new information, you now need to find the real first and last digit on each line. For example:

two1nine
oneight
eightwothree
abcone2threexyz
xtwone3four
4nineeightseven2
zoneight234
7pqrstsixteen
In this example, the calibration values are 29, 18, 83, 13, 24, 42, 14, and 76.

Note that letters can be shared between words, and that "oneight" becomes "18" and "eightwo" should become "82". I recommend we build a function that will parse the string in order left to right for numbers and spell-out numbers, then build a separate string of those results.

Be sure to not skip entire words once they are found, since the trailing character of those words may start new words. Check each string index individually. Also remember that values with a single digit are valid and the single digit should be used for both the first and last number we end up summing. So 6 becomes 66.

Think out loud about your approach to solve this problem. Take a deep breath. I will tip you $500 for the correct answer. Make sure to insert the digit into the string where you find a match, do not do a replace. Please update the function and then sum the calibration values again.

This was a pretty painful process and not very efficient. I had to use my known, working solution and check the diffs of my calibrated values against the diffs of the code ChatGPT was giving me in order to find the edge cases it kept getting hung up on. These were mostly –

Replacing text in order of the word map instead of from left to right
Trying to skip ahead in the index once a word was found (skipping “eight” which means missing “two” in “eightwo”)
Forgetting that in the case of a single number, we use that same number for both the first and second digit of our calibration value.

Eventually, it gave me code that produced the right answer of 54,208.

def find_calibration_values_updated(lines):
    # Mapping spelled-out digits to their numerical equivalents
    digit_map = {
        "one": "1", "two": "2", "three": "3", "four": "4", 
        "five": "5", "six": "6", "seven": "7", "eight": "8", "nine": "9"
    }

    calibration_values = []

    for line in lines:
        digit_sequence = ""
        i = 0
        while i < len(line):
            found_digit = False
            # Check for each spelled-out digit starting at this position
            for word, digit in digit_map.items():
                if line.startswith(word, i):
                    digit_sequence += digit
                    found_digit = True
                    break  # Break to avoid overlapping spelled-out digits

            if not found_digit and line[i].isdigit():
                digit_sequence += line[i]

            i += 1

        # Extracting the calibration value
        if len(digit_sequence) >= 1:
            first_digit = digit_sequence[0]
            last_digit = digit_sequence[-1] if len(digit_sequence) > 1 else first_digit
            calibration_value = int(first_digit + last_digit)
            calibration_values.append(calibration_value)
        else:
            calibration_values.append(0)

    return calibration_values

# Calculating the calibration values for the lines in the file
calibration_values = find_calibration_values_updated(lines)
total_sum = sum(calibration_values)
total_sum

Final Thoughts

ChatGPT easily handled part 1 of this task, but struggled mightly on part 2. It took quite a bit of debugging and re-prompting to figure out where it was failing. Overall score – 10/10 on part 1, 3/10 on part 2. It was able to eventually get there on part 2 but it made many bad assumptions along the way that required my prompt to get very specific to the point where you had to already know the solution to get the right prompt.

Advent of Code [with AI]: Day 1

Day 1: Part 1

Day 1: Part 2

Final Thoughts

Leave a Reply Cancel reply