AI Evaluations Research Scientist (San Francisco) job at RAND Corporation San Francisco, CA, US

Job Description

Job Type: Regular

Overview

RAND's Technology and Security Policy Center (TASP) is seeking mission-driven AI Evaluations Research Scientists to develop and execute research projects and engineering efforts within our AI Capability Evaluations (ACE) team.

RAND's reputation for excellence is built on our commitment to high-quality, rigorous analysis and objectivity. TASP is at the forefront of research and implementation regarding the impact of high-consequence, dual-use technologies-such as artificial intelligence and biotechnology-on global competition and security. Our research has been used by the White House, government departments, the EU and UK governments, and industry leaders, among others. Our alumni have gone on to important roles at the NSC, Commerce, DOD, Congress, Google DeepMind, OpenAI, EU AI Office, UK AISI, other key think tanks , and founding mission-driven tech initiatives .

ACE develops and conducts evaluations of national security relevant capabilities of frontier AI systems, with a current focus on the intersection of large language models (LLMs) and AI agents with biological risk. We're hiring for people with research scien ce and /or research engineering skills to play a key role in work that assists public policymaker s at all levels in strengthen ing national security and mitigat ing catastrophic risks enabled by AI systems. They will work on complex problems at the intersection of AI and national security where technical details matter and will contribute to multidisciplinary project teams that include biosecurity experts, machine learning engineers, and policy researchers.

This position is initially structured as a focused 1-year appointment to create the urgency needed to drive ambitious change in this rapidly evolving field. Every day of your tenure will count toward that goal. The appointment may be renewed for up to a total of 3 years , with options for longer-term employment at RAND thereafter. Full-time and part-time (at least 20 hours per week) schedules will be considered, but with a strong preference for full-time.

Respons ibilitie s

Given the breadth of valuable work our team could do, there is some ability to align responsibilities with an individual's skills, interests, and career goals , including in terms of the balance of research scientist - versus research engineer -style responsibilities . Responsibilities may include but are not limited to:

Contribute to developing concrete threat models for high-consequence risks AI risks , working with internal and external partners

Design and execute rigorous, objective evaluations of AI capabilities relevant to key bottlenecks within those threat models

Develop and maintain the technical infrastructure r equired to support this research, working with relevant internal and external IT stakeholders

Develop and maintain code for fundamental evaluation components that can be used ac ross research efforts ( e.g. prompting, auto mated grading , statistical analysis )

Keep up to date with the latest advance s in AI evaluation engineering and the science of evaluations to continually im prove the rigor and efficiency of our evaluations

Contribute to setting strategic and research priorities, with an emphasis on the policy impact of evaluations

Communicate research results to policymakers and other key stakeholders at all levels through written products and oral presentations

A successful candidate could grow into leading a team and/or mentoring more junior staff.

Qualifications
All research positions at RAND require excellent analytic skills; the ability to communicate clearly and effectively in English, both orally and in writing; the ability to work effectively as a member of a multi-disciplinary team; and a strong commitment to RAND's core values of quality and objectivity.

Other r equired qualifications :

Strong i nterest in understanding and add ressing potential national security risks related to autonomy or high-consequence misuse of LLMs and AI agents , and in AI capability evaluations as a route to impact

P roficiency in Python

Familiarity with technical aspects of AI systems and related technologies, such as machine learning, computational infrastructure, or information securit y

Preferred but not required :

Experience with evaluations and evaluation frameworks for LLM s and AI agent s ( e.g. Inspect)

Experience with LLM elicitation techniques ( e.g. fine-tuning, retrieval augmented generation, tool-use integration, agent scaffolding)

Experience working on ML model development/deployment or working at/with leading AI companies

Experience with c loud computing , in particular Azure and AWS , including government cl oud environments

Familiarity with common LLM frameworks ( e.g. LangChain , LlamaIndex )

Aptitude for project management and/or mentorship

Strong communication skills, both written and verbal, tailored to technical and non-technical audiences, or ability to rapidly develop that

Experience in government, intelligence community, other relevant decision-making offices, or policy analysis roles

E ducation Requirements
RAND is hiring for this role at associate, specialist, and expert levels of experience. Minimum education requirements at the associate level include:

A PhD in a rele vant field. This can include Artificial Intelligen ce, Machine Learning, Computer Science, Cybers ecurity, Electrical Engineering, Physics, Mathematics, Engineering and Public Policy, Security Studies, or similar .

A Master's degree in the fields listed above with at least 3 years of relevant professional experience .

A Bachelor's degree in the fields listed above with at least 5 years of relevant professional experience.

Security Clearance ]]>

Job Tags

Full time, Part time, Work at office,

Similar Jobs

Robotics Technologies LLC

Full Stack java Developer--Entry level Job at Robotics Technologies LLC

...years of development experience in Java OR Object-Oriented development.At least 2+ years of experience in designing, developing, and delivering full stack web application software from scratch in hands-on development.At least 2+ years of front-end development experience...

Speakeasy

Executive Communication Coach (Instructor) Job at Speakeasy

Join to apply for the Executive Communication Coach (Instructor) role at Speakeasy2 days ago Be among the first 25 applicantsJoin to apply for the Executive Communication Coach (Instructor) role at SpeakeasyGet AI-powered advice on this job and more exclusive features...

Cincinnati Children's

Genetic Counseling Assistant Job at Cincinnati Children's

...Client Care- Major responsibilities include assisting pre, during, and post clinic care as well as assisting in the genetics laboratory genetic testing workflow. Clinical... ...responsibilities include: preparing genetic counseling charts with all needed documents, including...

Rio Mar Hospitality Management

Laundry Manager Job at Rio Mar Hospitality Management

...Job Description Job Description Job Summary The Laundry Manager is responsible for ensuring the operation of the Laundry Department in an attentive, friendly, efficient and courteous manner, providing all guests with quality products and a clean and safe environment...

PURE Insurance

Director, Compensation, Benefits & Payroll (City of White Plains) Job at PURE Insurance

...also oversees payroll operations across the U.S. and Canada and manages two direct reports. What You'll Do. Lead the... ...team of two direct reports, fostering a culture of growth and accountability. What We're Looking For. Bachelor's degree with a minimum...