Constitutional AI: Developing Responsible Solutions and Formal Techniques for Ethical AI Integration

This PhD proposal aligns with the primary objectives of the STAI CDT by focusing on the development of responsible AI solutions and introducing new formal techniques for Constitutional AI.

The proposal aims to develop novel methods for modeling and fine-tuning Constitutional AI LLMs, which involve training AI systems with crowdsourced rules or principles, acting as a ‘constitution.’ This approach allows AI systems to operate within societally accepted norms and align with human values. The proposal addresses the gaps in previous work, particularly in: (1) the methods of crowdsourcing rules and principles from people with diverse backgrounds [1], (2) the process of identifying areas of consensus and dissent within these rules, and (3) the formal approach to refining Large Language Models (LLMs) using insights gained from such a crowd-sourcing approach [2].


Project ID