Initial Thoughts and Suggestions Regarding Implications of Large Language Models (LLMs) (e.g. ChatGPT) for Fall Final Assessments

November 14, 2023

The UC Berkeley Academic Senate’s Committee on Teaching (COT), in collaboration with the Center for Teaching and Learning (CTL),^{^[1]} offers the following initial observations, thoughts, and suggestions for how to account for ChatGPT and similar Large Language Model (LLM) tools into your final assessment planning this Fall.

This memorandum is not intended as a comprehensive report on the implications of “generative AI” for instruction. We hope to offer more guidance on the effects of these tools on day-to-day instruction, formative assessments, student learning through writing, and other topics in the Spring. The UC system will be convening a Congress at UCLA in February 2024 on the “Impact and Promise of Artificial Intelligence,” and COT and CTL will continue to engage these issues after the Congress. Given the variance in pedagogical approaches and goals between disciplines and instructors, guidance on generative AI or LLMs for instruction will necessarily be general. The limited goal of this memorandum is instead to address pressing concerns about academic integrity and accurate evaluation of student learning in final assessments this Fall.

^{^[1]} COT and CTL are grateful to Director of Digital Education Kara Ganter and Professor Chris Hoofnagle for valuable assistance with this memorandum.

I. What is ChatGPT?

ChatGPT is a “large language model” (LLM) created by OpenAI and debuted publicly in November 2022. There is a free version requiring creation of a free account (https://chat.openai.com/auth/login) and a paid version that costs $20/month (https://openai.com/blog/chatgpt-plus). According to numerous anecdotal reports, we assume many Berkeley students have access to these tools, have experimented with them, and may be using them in their personal or academic work in some way. The UC system does not currently offer institutional support for ChatGPT use, so instructors and students who want to use it in their teaching and learning are on their own in terms of technical support.^{^[1]} The UC system is also not liable for any accessibility limitations or privacy concerns that may be associated with usage of ChatGPT.

Once a user creates an account, the user can type in any question or demand of ChatGPT, and it will offer an answer. Users could even ask it to critique its last answer. There is no limit on the length or style of question.

ChatGPT also can summarize texts, search for concepts in texts, perform statistical analysis, clean data, translate text, and perform an increasing number of other useful tasks. Yet, concerns have been raised that its use in some contexts might undermine pedagogical goals both because it can shortcut the process of learning and because ChatGPT itself is not bound by traditional academic values, such as truthfulness of statements, reproducibility, tracing of precedent, and proper attribution of works and ideas. For example, ChatGPT’s training materials, which are undisclosed but thought to include a large scrape of the web along with a corpus of books, may contain out of date information.

Even without a ChatGPT or other LLM account, students might use LLMs without realizing it because the technology is being incorporated into other general purpose tools such as Microsoft Word, Google Docs (via auto-suggestion), and Google Search. See https://blog.google/products/search/generative-ai-search/.

Some instructors may wish to create a ChatGPT account to become familiar with how it works, to better understand its power and limits, and to get a sense of the format and tone of its responses. Some instructors might decide that doing so will help them make an informed decision about whether, and how, to (1) use ChatGPT and other LLMs in some way in their teaching and assessments and (2) adapt their teaching and assessment methods in light of students’ potential authorized or unauthorized use of ChatGPT and other LLMs. Instructors choosing to create an account should be aware that students might be using a different version, and that the University does not currently offer financial support for instructors and GSIs hoping to access the paid ($20/month) version.

^{^[2]} Some campus groups have offered preliminary guidance on use of ChatGPT. For example, the UC Berkeley Office of Ethics has posted guidance on “Appropriate Use of ChatGPT and Similar AI Tools,” https://ethics.berkeley.edu/privacy/appropriate-use-chatgpt-and-similar-ai-tools; the Center for Teaching & Learning has posted “Understanding AI Writing Tools and Their Uses for Teaching and Learning at UC Berkeley, https://teaching.berkeley.edu/understanding-ai-writing-tools-and-their-uses-teaching-and-learning-uc-berkeley, and the Law Library has posted “Generative AI Resources for Berkeley Law Faculty and Staff, https://www.la. w.berkeley.edu/library/legal-research/chatgpt/. UC Berkeley's Research, Teaching, & Learning Services (RTL) has also been offering a Fall Workshop Series on "Exploring ChatGPT,” https://rtl.berkeley.edu/events/upcoming.

II. How might students' access to ChatGPT and other LLMs affect final assessments in the Fall?

At least four concerns have been raised recently related to student use of ChatGPT and similar tools during final assessments:

Academic dishonesty: The first concern is that students might use LLMs in a way that violates campus academic integrity policies. For example, whether or not an instructor indicates to students that they are not allowed to access an LLM like ChatGPT during an examination, students might access one anyway, place examination questions into the LLM, and use the resulting answer in a way that the instructor would consider to be the equivalent of plagiarism or otherwise academically dishonest (passing off the answers as the students’ own original analysis, or answering multiple-choice questions through LLMs in a way that, even if not plagiarism or its equivalent, is dishonest).
Loss of diagnostic value of assessments: The second concern is that students’ use of LLMs to generate examination answers, whether or not it violates academic integrity policies, will destroy the examination’s value as a tool of assessing the student’s own knowledge, understanding, and insight.
Equity concerns: The students’ ability to access ChatGPT for these purposes during an examination also raises potential equity concerns, especially if some students use the paid ($20/month) version of ChatGPT, a tool that would not be accessible to all students and that, by all reports, offers significantly more sophistication and accuracy than the unpaid version.
Other issues that may or may not be raised by use of LLMs in certain contexts: Because ChatGPT is not bound by traditional academic values related to transparency, reliability of data, and the like, some instructors and other commentators have voiced concerns about whether using LLMs in certain contexts to generate content for research and learning purposes might be inappropriate more broadly. Handling these concerns will presumably be highly discipline- and context-specific.

Some types of assessment, such as in-person laboratory work, are not likely to be affected by ChatGPT in terms of these issues. Other types, such as take-home final examinations, might be significantly affected.

III. How will ChatGPT and other LLMs affect instruction beyond final assessments this Fall?

We hope to offer more guidance on this topic in the Spring, after the UC convening on AI in February. Some instructors on campus have chosen to incorporate ChatGPT and other LLMs into instruction, writing assignments, and formative assessments. For example, some instructors have encouraged or required students to use ChatGPT to answer a question, and then asked students to critique the answer given by ChatGPT in terms of accuracy, nuance, and completeness.The Center for Teaching and Learning’s page on “Understanding AI Writing Tools” also includes some examples of suggested writing prompts and activities (under the header “Suggested Writing Prompts and Activities”). Berkeley’s Center for Research, Teaching, and Learning (RTL) and Graduate Division are hosting a faculty-led workshop on this topic today, November 14, 1:30-2:30pm, as part of RTL’s “Exploring ChatGPT” Fall Series: https://rtl.berkeley.edu/events/exploring-chatgpt-practical-approaches-teaching-generative-ai/2023-11-14. This memorandum’s more limited purpose is to address ChatGPT’s potential effects on final examinations this Fall.

IV. What can I do if I'm concerned about students using ChatGPT and other LLMs on final examinations this Fall?

Our overall suggestions for instructors whose Fall assessments might be affected by student access to ChatGPT are to:

● Have a policy on ChatGPT and similar tools that clearly indicates to students the rules of access, and consider devoting class time or additional written language explaining to students why you chose your particular policy;

● Realistically consider how your chosen assessment type might be exposed to risks of academic dishonesty or loss of an assessment’s diagnostic value because of students’ ability to access ChatGPT, whether authorized or not; and

● Consider thoughtful changes that reduce these risks and maintain your preexisting assessment goals, while avoiding changes to your assessment type that are not appropriate to the discipline and being aware of any equity and inclusion issues raised by changes to assessment type.

● Refrain from using AI detection tools; these tools are currently understudied and carry significant risks both in terms of false positives and privacy of student data.

We offer more detail on each of these suggestions below.

IV. A. Have a policy on ChatGPT and similar tools that clearly indicates to students the rules of access.

Teachers whose assessments might be affected by ChatGPT should make clear in class before examinations, and in examination instructions, whether, and how, use of ChatGPT and similar tools is allowed. Here are examples of language one could use:

If not allowing any access: “Use of generative AI software or Large Language Models (LLMs), such as ChatGPT, is not allowed for any purpose during this examination. Any use of such software during this examination will be considered a violation of the honor code.”

If allowing limited access or use of ChatGPT on certain questions, e.g., asking students to critique a ChatGPT answer: “Use of generative AI software or Large Language Models (LLMs), such as ChatGPT, is only allowed during this examination where a question expressly directs you to use it for a particular purpose. Any other use of the software at any other time or for any other purpose is prohibited and will be considered a violation of the honor code.”

If allowing full access, but wanting to know how the student used ChatGPT: “Use of generative AI software or Large Language Models (LLMs), such as ChatGPT, is permitted during this examination. You may use it in helping you to formulate your own thoughts and ideas, just as you might use Google or the materials from class. If you do so, you must note that you used such a tool, note which tool, and note what content was derived from that tool. Any use of such software that would constitute plagiarism if the generative AI source were a human or organizational author will be considered a violation of the honor code.”

Departments might additionally consider having an explicit policy on ChatGPT and similar tools. Berkeley Law, for example, has added LLMs like ChatGPT to the formal academic integrity policy as follows, and sent reminders to instructors and students of the policy:

Berkeley Law Policy on the Use of Generative AI Software

Generative AI is software, for example, ChatGPT, that can perform advanced processing of text at skill levels that at least appear similar to a human’s. Generative AI software is quickly being adopted in legal practice, and many internet services and ordinary programs will soon include generative AI software. At the same time, Generative AI presents risks to our shared pedagogical mission. For this reason, we adopt the following default rule, which enables some uses of Generative AI but also bans uses of Generative AI that would be plagiaristic if Generative AI’s output had been composed by a human author.

The class of generative AI software:

● May be used to perform research in ways similar to search engines such as Google, for correction of grammar, and for other functions attendant to completing an assignment. The software may not be used to compose any part of the submitted assignment.

● May not be used for any purpose in any exam situation.

● Never may be employed for a use that would constitute plagiarism if the generative AI source were a human or organizational author. For discussion of plagiarism, see https://gsi.berkeley.edu/gsi-guide-contents/academic-misconduct-intro/plagiarism/

Instructors have discretion to deviate from the default rule, provided that they do so in writing and with appropriate notice.

IV. B. Realistically consider how your chosen assessment type might be exposed to risks of academic dishonesty or loss of diagnostic value.

As noted above, some in-person assessment types, such as laboratory work, musical performances, or oral examinations, are unlikely to be affected by ChatGPT and other LLMs. Others are much more likely to be affected. The most obvious formats to be affected might be take-home final examinations, whether they be short-answer or multiple-choice questions assessing knowledge, or more complex questions asking students to analyze a problem, offer an opinion on something, or suggest a solution. Even in-class final examinations might be prone to students’ unauthorized use of generative AI, depending on the proctoring situation and students’ ability to access the Internet during the examination; ChatGPT, for example, now has an iPhone app with audio input.

Some instructors have chosen to submit their examination questions into ChatGPT as a way of exploring whether the questions are prone to risks of academic dishonesty or are easily answerable by ChatGPT. Instructors have reported that this method has allowed them to get a sense of the language, coverage, tone, and accuracy of the answers. One COT member fed a short policy essay question into ChatGPT ten times (ChatGPT’s answer is different each time, even if sometimes only slightly different), kept track of recurring phrasing and arguments, and then was mindful of them while grading student work. For faculty grading with the help of GSIs, the GSI team could potentially do the same.

IV. C. Consider thoughtful changes that reduce these risks while maintaining your preexisting assessment goals.

In light of the risks of student ChatGPT and other LLM use mentioned above, instructors might consider thoughtful changes to their assessment methods that are discipline-appropriate and that do not unnecessarily create equity and inclusion problems. We explore a few possibilities below, as well as potential issues raised by these changes.

1. Adding explicit instructions to the examination prohibiting use of ChatGPT and other LLMs and making clear that such use is a violation of the honor code.

This possibility, discussed earlier with suggested language, may discourage students from unauthorized use of these tools. Additionally, instructors might choose to speak with their students before the examination about the reasons underlying the policy, helping to set a tone that further discourages student misconduct.

Instructors might also include explicit language restricting access to the Internet, to allow more effective enforcement of prohibitions on use of LLMs. For some examinations, a strict Internet policy might raise issues for students needing access to online dictionaries (especially for Multilingual Learners or MMLs), requiring additional explanatory language.

Some departments and course design teams have added language to assessment instructions giving students notice that they might be asked to complete additional assessments if questions arise as to academic integrity, such as: "Students may be asked to complete a short oral examination consisting of questions similar to ones that they completed correctly on the original examination." CS161, Remote proctoring policy, Academic Integrity, https://su23.cs161.org/exam/. Given the issues raised by this language (including jeopardizing anonymity and requiring additional DSP accommodations), instructors hoping to adopt such language should consult with their departments first.

2. Switching examination formats, such as changing to an in-class and/or handwritten examination.

Some instructors have explored switching from a take-home examination to a timed, proctored in-class examination. In-class examinations, in some ways, allow easier and more effective enforcement of academic integrity and Internet access rules. Other instructors have considered changing their examinations from being typed to being handwritten (in Bluebooks) to minimize student computer use during examinations.

Instructors should be aware that any in-class assignment or exercise that is timed and graded, whether or not formally labeled an examination or final assessment, will require that DSP accommodations be granted, such as additional time, a specific testing environment, or, for handwritten examinations, exceptions for students who must be allowed to use a computer. Instructors who are considering such changes should ensure that they have met internal department deadlines for changing examination formats and should inform students as soon as possible to register for DSP proctoring as needed in light of the changes.

Instructors should avoid changes to examination formats that might artificially reduce the risk of unauthorized LLM use but that would be inappropriate to the discipline or goals of the course. While more general guidance on how to rethink instruction in light of ChatGPT is beyond the scope of this memorandum, some guides already exist that some instructors might find generative of ideas for assessments and assignments in light of ChatGPT. COT takes no position as to whether these materials are appropriate for use in any particular discipline or context, but is noting them so instructors know they exist.^{^[1]}

3. Adding language to examination prompts or questions that might make unauthorized or undetected use of ChatGPT and other tools more difficult or less valuable.

Some instructors have added language to examination questions in an attempt to make LLM output less helpful to students considering unauthorized use, such as adding phrases like “based on what we have read and discussed this semester, why do you think that…” or “using the assigned materials from this semester, make the case that…” Some instructors have found, when submitting these questions into ChatGPT, that the output is much more easily detected as LLM output rather than the original thinking of a student who actually attended class and took the course. Such methods are, of course, at most an incremental improvement in terms of deterrence and detection of unauthorized use.

4. Submitting examination questions to ChatGPT in advance to be aware of the tone and content of LLM output on the question.

Some instructors have submitted their examination questions to ChatGPT in advance, not only to determine whether the questions are prone to be answerable by an LLM, but to become familiar with the output to better detect potential unauthorized student use. Because ChatGPT’s answers change each time, some instructors have submitted the question multiple times, keeping track of recurring phrasing or other patterns in tone or content. Potential downsides to this technique include that it requires the instructor to choose to create a ChatGPT account and use it; that it could result in false positives, in terms of unfounded suspicion of original student work; and that it could result in false negatives, in terms of undetected LLM use because the instructor is using a less sophisticated version of ChatGPT (say, the free version rather than the $20/month version) than the student uses.

^{^[3]} See, e.g., An Introduction to Teaching with Text Generation Technologies, https://wac.colostate.edu/repository/collections/textgened/; AI Text Generators and Teaching Writing: Starting Points for Inquiry, https://wac.colostate.edu/repository/collections/ai-text-generators-and-teaching-writing-starting-points-for-inquiry/; ChatGPT Assignments To Use in Your Classroom Today, https://stars.library.ucf.edu/cgi/viewcontent.cgi?article=1097&context=oer.

IV. D. Refrain from using AI Detection tools.

At this time, COT strongly discourages instructors from attempting to use AI detection tools to try to detect unauthorized student use of LLMs on examinations. These tools are understudied, often carry significant false positive risks (at least at this time), and raise issues related to the submission of student data or work without student or campus authorization. OpenAI (the creator of ChatGPT) discontinued its detection service after it erroneously labeled works such as the Declaration of Independence as LLM-generated and appeared to mark Multilingual Learning (MLL) students as using LLMs, because their writing may appear more formalistic. Berkeley’s Research, Teaching, and Learning (RTL) is currently undergoing a small-scale pilot of TurnItIn’s AI detection feature and will report its findings.

Instead, COT encourages instructors who suspect unauthorized student LLM use during examinations to use existing campus processes and resources for dealing with suspected student misconduct. Some existing resources with helpful advice on academic integrity and approaches to responding to student cheating include:

● https://studentconduct.berkeley.edu/

● https://www.chronicle.com/article/nobody-wins-in-an-academic-integrity-arms-race?cid=gen_sign_in

● https://www.facultyfocus.com/articles/effective-classroom-management/memo-students-cheating/

● https://citl.illinois.edu/citl-101/teaching-learning/resources/classroom-environment/dealing-with-cheating

● https://academicintegrity.org/resources/blog/99-2022/may-2022/369-cheating-academic-integrity

● https://www.depts.ttu.edu/tlpdc/Resources/Academic_Integrity/files/academicintegrity-magnawhitepaper.pdf