Methodology
Repsheet uses AI summarization to create unbiased summaries of Canadian MPs, based on their voting records.
This site, as well as the prompts used to generate the summaries, are all open source and available for you to see. We want these summaries to be as transparent and unbiased as possible, and welcome your help and suggestions to accomplish that.
View the source code
Provide feedback
Overview
Generating summaries of MPs is a five-step process:
- Data collection: We collect the voting records for each MP and bill records from the House of Commons.
- Bill summarization: We use AI to summarize the bill records (1st prompt).
- MP summarization stage 1: We split the bills into batches, and ask the AI to summarize the MP for each batch separately (2nd prompt).
- MP summarization stage 2: Next, we take the MP summaries from each batch and ask the AI to merge them into a final combined summary of the MP (3rd prompt).
- MP short summary creation: We condense this MP summary into a single sentence summary (4th prompt).
1. Data collection
- Current MPs are collected from ourcommons.ca.
- Bill records are collected from parl.ca. We only fetch bills from the last 4 parliaments.
- Votes are collected from ourcommons.ca.
You can view our data collection code here.
2. Bill summarization
Model: Gemini Flash 2.0
For each bill voted on by a current MP over the previous four Parliaments, we use AI to create a summary.
The bills are summarized into issues, which allows us to highlight how MPs vote on those issues.
Before prompting the AI, we cleanup the bill XML to remove some XML tags that do not contain bill content. We clean this up to decrease the size of the prompt.
You can view the prompt used for this step here.
You can view the code used to perform this step here.
We use Gemini Flash 2.0 as it can process a large amount of input text (1M tokens, about 4M characters) and some bills were quite large.
3. MP summarization stage 1
Model: Claude Haiku or Claude Sonnet
For each member, we take the summaries of the bills voted on when they were an MP, and combine them with:
- The bill title
- The bill number
- The MP’s vote (Yea/Nay/Abstain)
- The voting statistics of the MP’s party (yeas/nays/abstains, party name not included)
- Whether this bill was a private member’s bill sponsored by this MP
We then split this into batches and ask the AI to summarize the MP for each batch separately.
You can view the prompt used for this step here.
You can view the code used to perform this step here.
We use Claude Haiku initially for each of these as we found it provided high quality summaries and was fast and reasonably priced, as we need to run a large number of these summaries. However, sometimes Haiku produced invalid JSON, or broken bill links, in which case we re-run the summary using Claude Sonnet, which is more expensive but produces higher quality summaries, and less prone to error.
4. MP summarization stage 2
Model: Claude Sonnet
We then combine the separate “batched” MP summaries of the MP’s voting records into one unified summary of the MP.
You can view the prompt used for this step here.
You can view the code used to perform this step here.
We use Clause Sonnet as it is more sophisticated than Claude Haiku, creating higher quality summaries, so it is a better choice for the step that produces the final output.
Why two stages?
We found experimentally that Anthropic’s Claude models were able to give the most informative summaries, but they are limited to a maximum of 200000 input “tokens” (one token is roughly 4 characters). This means that we were not able to provide the entire voting history in one prompt. We found that splitting the voting history into batches of bills worked well, and allowed us to get around the token limit.
At the moment we are using 23 batches, which we calculated as the maximum number of batches that we could provide to the AI which merges the summaries. This final summary merging step also has a maximum token count of 200000, and this initial summary step has a maximum output token count of 8192 tokens, and 200000 / 8192 is 24 (rounded down), then we subtract 1 to allow wrapper prompt that comes before the summary. So, 23 batches gives 23 * 8192 = 188416 tokens of summaries, leaving some space for the prompt that comes beforehand.
5. MP short summary creation
Model: Claude Sonnet
To create a short summary of the MP for the top of the page, and the Open Graph description, we use a final prompt to condense our larger MP summary generated in the previous steps.
You can view the prompt used for this step here.
You can view the code used to perform this step here.