-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding first stab at a first draft of a proposal for campaign finance… #61
Conversation
Thanks for starting on this! paging @boblannon @evz @palewire, @gordonje When I look at a filing document, I want to know two things
Knowing the business rules is critical to interpreting the information in a filing, but I think those business rules should be represented separately. Been thinking about a |
Do we even want to represent the business rules at all? That seems like a lot of extraneous work, in particular since, as you said, the rules change all the time. I was thinking that we want to have a really loose/barebones model for a filing itself, and then some FilingType that describes that filing, but not encode the rules at all. The rules would be implicit based on the data contained in the filing - this Filing, of type ILContributionReport, contains Contributions. That's what we're getting from the Regulator, so that's kind of all that matters - if the rules say "this has to include all contribs over $250", we can't enforce those rules or even (in many cases) know if they're being violated, and keeping up to date with all the legislative changes would be quite a pain. |
@aepton I think that you are right that a Filing object need not be dependent on the existence of That said, it still might be useful to think about these as distinct models (even if don't get around to implementing FilintTypes), because it might help us avoid putting business rules into Filing objects |
I'm thrilled to see this ball rolling. @aepton, when you're ready for comments on your early submission please let me know. |
Ok, I think this captures where I'd like to start the conversation. Please have at it with any and all types of suggestions/tweaks/fixes/jaw-droppingly-obvious omissions/subtle whatevers/I should probably just end this sentence, you get it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've mainly noted things that do not need to be part of this proposal because they already exist in OCD.
**optional** | ||
Date (and possibly time) when filing period of coverage ends. | ||
|
||
filing_regulator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This probably should be from_organization
to keep consistency with existing model https://github.com/opencivicdata/docs.opencivicdata.org/blob/master/proposals/0006.rst#implementation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM
filing_committee | ||
Committee | ||
|
||
filing_date |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
submissions by committees and publication by regulators and amendment should be actions, like those on bills.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so should it use the OCD model directly; import and extend it here; or should there be a FilingActivity type that covers submission and amendments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just suggesting you use same pattern as bills.
actions
A list of objects representing individual actions that take place on a bill, comprising the legislative history of the proposal in question. Actions consist of the following properties:
organization, organization_id
The organization that this action took place within.
description
Description of the action.
date
The date the action occurred in YYYY-MM-DD format. (can be partial by omitting -MM-DD or -DD component).
classification
A list of classifications for this actions, suggested values would be things like 'passage', 'introduction', etc.
related_entities
A list of all related entities (such as legislators mentioned by name in the action). Each entity has the following fields:
name
The upstream-given name of this related entity.
entity_type
'organization' or 'person' - the type of entity that is related
organization, organization_id
If the entity_type is 'organization' and the entity is resolved, will be the organization that is related.
person, person_id
If the entity_type is 'person' and the entity is resolved, will be the person that is related.
filing_regulator | ||
Regulator | ||
|
||
filing_url |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does this indicate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In some cases, like California, there are PDF or HTML documents online that can be linked via the database record's unique identifier.
For instance, the most recent filing by our governor's campaign committee contains a set of contributions made to ballot measure committees this past election day.
In the database, the filing's unique identifier is 2106282
, which can be combined with a common URL pattern to return the PDF version of a paper filing. (A small side note: The filing's amendment identifier is a crucial second component needed to guarantee uniqueness for all California records)
http://cal-access.ss.ca.gov/PDFGen/pdfgen.prg?filingid=2106282&amendid=0
Including this link in the OCD schema may not be mandatory, but from a practical point of view I can vouch for the fact that reporters and data journalists are constantly referring back to PDF records like these when analyzing campaign finance to verify and further scrutinize their findings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, so it looks like you want this to indicate a source This is what we have for bills
sources
List of sources used in assembling this object. Has the following properties:
url
URL of the resource.
note
optional Description of what this source was used for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, cool. Adopting this here.
filing_relevant_election_date | ||
Date of (nearest? next?) relevant election. | ||
|
||
filing_person |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you give an example of this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think generally, if not exclusively, this will be the date of the upcoming election, but I guess sometimes filings could be made relative to just-concluded elections.
So like, a declaration of candidacy for a specific upcoming election; then a contestation of results after the election has concluded (but clearly still referring to that specific election).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I meant "filing person" not "filing_relevant_election_date"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, like the treasurer or whoever signs off on a given campaign disclosure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. Give some examples of who this type of person could be in the proposal.
**optional** | ||
Person responsible for the filing. | ||
|
||
Committee |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
everything here can be done with a normal OCD organization with possible exception of purpose and candidates (but see my notes about candidates).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think that's fine. What's the model for repurposing existing OCD types? We need a Committee model, so whatever's the best way to get Officers, Purpose and Status into an OCD Organization I'm fine with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to do anything special for
- officers (already handled in OCD Post model).
- status, OCD org model has start and end dates. is this sufficient
- purpose... this one is tricky. Let's create a section in this PEP for miscellaneous questions and stick that one in there for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I don't think OCD start/end dates quite cover what we need here, I'll add this to the questions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls feel free ping me on the PEP for "purpose". Toronto committees tend to have a short "focus" that the city likes to use, and so I'd be interested to track :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, what might be the options for "status"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Status is primarily "active"/"inactive" but I think in some states, some active Committees still have to file to announce whether they're contesting anything in a particular election. That seems like a status.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if we can infer the current status in 100% of cases, but the "active vs. inactive" distinction definitely exists in the California data, where committees can file "termination reports" that put themselves out of business.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like popolo's "date of founding", "date of dissolution" is sufficient here. If there committee has to file notice of intent of contestation, that seems like it should be handled by a filing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's a bit more nuance here, which you got at with the filings that indicate notice of contestation - there are a series of windows that apply to the status of a committee. Updating PR to reflect this.
name | ||
Name of the Committee | ||
|
||
candidate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this is the right place for it. The can be a many to many relation committees and candidates. I "Candidate Support" should maybe be a separate model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's a good point. Candidate Support/Opposition, really, so I guess Candidate Orientation. I'll add this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A good example from California would be ballot measure committees which can form to support one-to-many propositions, and can evolve over subsequent elections to support a variety of measures over time.
on a given election day. I suppose it's possible some Candidates won't have | ||
Regulators (God help us all). | ||
|
||
Jurisdiction |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool.
**repeated** | ||
Government Level with...jurisdiction over this Jurisdiction. | ||
|
||
Office |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool.
name | ||
Name of the Party. | ||
|
||
Regulator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can just be an OCD organization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just Party, or both Regulator and Party?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both.
**optional** | ||
If this is a primary, each Party involved in this Election. | ||
|
||
GovernmentLevel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the difference between a GovernmentLevel and Jurisidiction?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking of GLs as not specific instances of governmental bodies but as the levels themselves - federal, state, municipal, county, tribal, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When's an example of a case where you know the jurisdiction but you still want to know the "goverment level"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guess I can't think of one. Removing it.
@aepton thanks for this great start. I have three general comments at this point.
Basically, this comes down to representing a filing as denormalized row, versus as the relation between modelled entities. I would strongly prefer the denormalized representation. I think that we can have attach an optional ocd-person-id and *ocd-organziation-id' to the denormalized representations to make downstream processing much easier. |
Thanks, Forrest, this is really awesome and helpful. Updating this PR now and I'll pull out Election and Candidate into a separate proposal. I'm with you on modeling contributions and expenditures - as a data utility, we want to do nothing beyond providing what other folks are claiming. Then as journalists or whomever, we can use this data to model things and make assertions and inferences - and I want to make the latter as easy as possible without compromising the design of the former. |
Implementation | ||
============== | ||
|
||
Filing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm reading the Filing schema right, I see there are four date fields currently listed:
field | definition |
---|---|
filing_date | Date (and possibly time) when filing was submitted. |
filing_coverage_begin_date | Date (and possibly time) when filing period of coverage begins. |
filing_coverage_end_date | Date (and possibly time) when filing period of coverage ends. |
filing_relevant_election_date | Date of (nearest? next?) relevant election. |
Here are some thoughts I have on those:
-
So far they perfectly match the four date fields we currently have drafted on our early attempts at cleaning up California's filings, though we've given them slightly different names: date_filed, from_date, thru_date and election_date. I like your longer ones better for being more specific, but I wonder if there is a common date naming convention for other OCD schema we should be emulating. Is there?
-
Pedantic: There might some cases where the state systems record a slight variation between when a filing is "submitted" by filer and when it is "received" by the government. Do we need to worry about that distinction?
-
More serious: While I expect the "from" and "thru" fields will be common to most periodic campaign disclosures (like quarterly committee filings), I can imagine some other common campaign disclosures like late contribution reports, statements of intention (to run for office) and statements of committee organization or termination that do not have them. Is the aim of this schema to serve as a subclass for these other campaign-related forms, or only for periodic financial disclosures?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To your second question, I think whatever system is ingesting these reports and preparing them in this format should be responsible for deciding what corresponds the best to "filing submitted". If this system is handling claims political entities are making about the world, then "time they submitted their claims to a regulator" seems like the most useful approximation of "submission time".
To your final question, I think this schema should be able to handle both types, and any other coverage window information should be optional (as are the filing_coverage begin and end dates). A (perhaps) separate (and much harder) question is how to reconcile multiple reports that describe the same contribution/expenditure/event.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
If we treat submission, reception, and publishing by regulator as actions that resolves many difficulties. Adding first stab at a first draft of a proposal for campaign finance… #61 (comment)
-
I'm a little wary about putting nearest election date as an object on the filing, since that's typically not something that appears in the actual filing documents I've seen. I think that we can definitely make convenient queries thought to surface the same content.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice if this description would be extensible. For US federal data, there are two main characteristics that determine what rules the filer has to abide by. For example, you need the committee type and committee designation to determine if a committee is a Super PAC. In addition to this, you can also see the org type to find organizations like labor unions, corporations or individuals.
Here is the FEC API's basic schema for committees, if that is helpful reference. We include all filers and not just those that are technically a committee.
Sometimes it makes sense to have election date if that is something you want to represent from a line on a form, like the Form 3, but there are a lot of situations where this wouldn't apply. Especially in the primary season, on filing can apply to more than one election that will be on multiple days. Additionally, the election dates change, so if you add a day to each filing you will have to amend the election day on past filings or know that the same election can be represented as different days. We generally track the election type like primary, general, special or runoff election and the office that the election is for. You can then cross reference that election with a date from the date endpoint.
I think that the election is generally more useful on the transaction level. That is where you can see if a donation is attributed to the primary or general etc., which is important for keeping track of donation limits per donor, since those are per-election. Again knowing the election type and office is better than having the date.
For reference here is the FEC API's basic schema for filings, though I would like to note, we would like to move toward a schema that separates the summary financial information from the filings information.
There are also some practical examples of how other smart people approch US federal data with the fetch project
I'm not nearly as knowledgeable about state or international campaigns, so feel free to ignore anything I say that isn't as widely applicable. (Also I am just commenting in my personal capacity.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, a quarterly filing by a candidate committee in the state of California will include the "date of election." If you look closely, or click here on a 2014 filing by our governor's reelection campaign, you can see it in a center top of the first page.
As @LindsayYoung points out, this is not structured data. It is only a date string, which of course could be fallible. @gordonje is in the process of vetting the California data right now to get a grip on the links between elections, filers and committees, but I don't think we have a comprehensive answer on how reliable that information has been at the filing level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@LindsayYoung Would you mind expanding a little more on why you think summary information should be separated from its filing? Potentially including that here was one thing @gordonje and I had pondered, but we lack your depth of experience wrestling with these issues. So I'm curious to hear your thoughts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this is what Lindsay had in mind, but definitely agree about the utility of breaking out summary information with federal data for analysis. For example, presidential committees file on F3P; candidates on F3, pacs on F3X. Some filers report fundraising and spending, while others only report spending. But most folks don't really want financial details for only one of these forms, but for all of them. So it's really useful to have a standardized form of common elements (because there are so many that can't be standardized, the federal summary forms are really wide). Whether that sorta thing is within the scope of this doc, or how it fits into OCD is kinda over my head.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so it sounds like we should have a repeated, optional field indicating which election(s) this filing applies to, and since we're also introducing the notion of an Election object in a separate PR, this seems straightforward.
@LindsayYoung does that capture what you were looking for in terms of flexibility? I'd absolutely love to be able to use this to model federal elections/data.
@jsfenfen I think what you're describing is what I have in mind for how this system would work for an end user - we do the work of translating federal/state/local filings into this standard scheme, and each state/federal/local parser is responsible for doing The Right Thing for that jurisdiction, such that it's easy for people to do cross-jurisdictional comparisons.
For an end user, there should be some interface to this data, and since the jurisdictional parsers have done the heavy lifting of saying that "F3P gets processed this way, and F3X gets processed that way" then the end-user-interface system can make the comparisons users have in mind in a straightforward way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having the election as an optional, separate object sounds good.
@palewire The summary information can get pretty long and varies from filing to filing. The time frame for summary information can also vary depending on the type of filer. The totals for the coverage period are pretty straightforward, but there are also cumulative totals and those by calendar year for PACs and parties on form 3x but longer for candidate committees to match up with their respective election cycles on form form 3 and form 3p. Also, the financial information isn't applicable to all the forms and that can confuse people. We have seen that cause confusion that outweighs the convenience of having those financials there.
**optional** | ||
Person responsible for the filing. | ||
|
||
Committee |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would different committee types be handled?
Here is a short list of the different types of committees from California I can pull from the top of my head:
- Candidate committees that support the election of a candidate, which take in money to run a focused campaign for a particular office and can be remade for different election cycles. An example would be Brown for Governor 2014.
- Independent recipient committees that take in money from supporters and disburse it to candidate committees and independent political spending. They can exist for decades. These are sometimes known as "PACs" in federal parlance and can include corporate committees like Exxon Mobile, unions like Sheet Metal Workers Local 206 and what in some venues might be called "Super PACs" or "independent expenditure committees."
- Ballot measure committees that can support or oppose the passage of one to many propositions in one to many elections over time. An example would be Yes on 62, No on 66. Replace the costly, failed death penalty system.
- Candidate-controlled ballot measure committees or leadership PACs that allow candidates to raise money for their favored causes besides their own election. An example would be Brown's Ballot Measure Committee.
- State and local political party committees that raise money on behalf of political parties and move money into key and favored races as well as supporting general activities for the party. An example would the California Republican Party.
California has a couple other ones as well, like Slate Mailer Committees and Small-Contributor Committees, but I'm not sure how common those are in other jurisdictions. All of the above I expect to be common across the country.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had been thinking they'd be imputed based on the orientation(s) a Committee takes toward one or more Candidate(s). But this is a good question and one I've added to the list. It seems not obvious how best to handle this, since every jurisdiction will have different types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing to keep in mind is that a committee might be connected with a ballot measure rather than a candidate. We should probably consider a schema for those as well to go along with the Election objects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I almost forgot my favorite committee type: The legal defense fund!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we do need a committee type because different types of committees have very different rules on who they can contribute to, how, and when they the need to disclose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Popolo has a classification
attribute but within OCD these have been typically been used for pretty high level classes, like legislature
, executive
, community board
Maybe we should have an type
in addition to a classification
Do you have thoughts on this question, @jpmckinney ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there states with hybrid super pacs yet (aka carey committees)--two separate accounts, but only one committee? Not sure how many of these matter (esp. at the state level) but do you care about multi-candidate fundraising committees? Inaugural committees? Campaign cost committees? Convention committees? Dedicated accounts, like building, legal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm reading the FEC committee spec provided by @LindsayYoung properly, I believe the issue you raise @jsfenfen is addressed there by first classifying a committee with its "committee type" (e.g. Independent expenditure, Candidate) and then recording details about its relationship to candidates via its "designation" (e.g. Belonging to candidate, Authorized by candidate, Joint fundraising, etc.). Maybe that's something we should do here as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what @aepton is planning on doing with CandidateOrientation
I think that's the right idea, but the name isn't quite right yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, sounds like the name for CandidateOrientation should be CandidateDesignation, but otherwise the basic idea seems like it handles these use cases.
|
||
memo | ||
String (may simply need repeated "notes" fields for items of this type). | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for both Contribution
and Expenditure
it would be cleaner to come up with something that mirrors the concept of what the FEC calls a "transaction_type," which describes the kind of thing it is - in-kind, transfer to an affiliated committee, refund, etc. Otherwise we face having to define the various is_{some type}
fields, which seems more complicated.
Something else to consider: at the federal level, at least, there is a distinction between "receipt" and "contribution", with the latter being an intentional donation. All contributions are receipts, but there are receipts (offsets, investment income) that are not contributions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strongly agree that the two receipt is the better abstraction over contribution. Also like the 'type' idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are several other transaction types not captured here in the California data, like loans, debts and "miscellaneous" transfers between committees. That last one is a great place to find millions from Soros, George.
There is also, as I know @dwillis knows well, the key distinction commonly made between monetary and non-monetary contributions. Which gets at a secondary level of classification that may exist with any of the "receipt" types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a really good change; updating.
memo | ||
String (may simply need repeated "notes" fields for items of this type). | ||
|
||
Amendment (Section) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure the proper verbiage to use here, but I'm not sure that an amendment that fully replaces a previous version of a filing can be rightfully called a "section" of that same filing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's a suggestion:
- filings have something like a "version_count", indicating how many different versions of the filing are known to exist (in CA these count up from zero, not sure if that works for everywhere).
- each section has a "filing_version" attribute, indicating which on version of the filing all the truth claims contained in the section were made.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Section" may not be the right abstraction for an amendment. I like @gordonje's notion of versions, though I don't know if we need version counts - we primarily care about whatever the current version is. Secondarily, for a given claim we want to be able to say which version(s) it comes from.
Maybe we should think of amendments as a linked list, allowing you to get back to previous versions and superseding versions of the same filing from wherever you are.
Should amendments (in our system) contain all the previous data from prior versions of a filing, or just a diff? I'm inclined toward the former, but could be talked out of it - seems like it's conceptually more straightforward and would be easier to use, primarily coming at the cost of (cheap and ever-cheaper) storage.
I'll add a question about how to handle amendments, and leave a stub for them in the current PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aepton I also definitely prefer having amendments contain all previous data from prior versions instead of just the diff.
I was thinking "version_count", or whatever you would call it, would facilitate a simple sanity check about each contribution, expenditure or other claim, like they are not being attributed to a version of the filing that isn't known to exist. But that might be overkill for a lot of people.
------ | ||
|
||
id | ||
Open Civic Data-style id in the format ``ocd-cf-filing/{{uuid}}`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In California, the unique identifier of a filing is a combination of its "filing id" and its "amendment id." The first version of a filing has 0 as its amendment id. That number increments up one with each new version while all versions share the same filing id.
How would that sort of system be standardized to OCD in this schema? Would we combine those two numbers into this field to create a composite id?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding of the proposal (and I would really like @aepton to check me on this) is that this id would be what CAL-ACCESS calls the filing_id, sans amend_id. Then each amendment would be a different section to the filing, with the amend_id serving as that section's id.
First of all, I am reading this right?
If so, I'm confused about how/if amendment sections relate to contribution sections and expenditure sections. I think I get how this proposal would have us represent the fact that a given filing was amended and how many times it was amended, but I don't quite understand how this proposal would have us represent, for example, all of the contributions that were included on the first version of a filing separate from all the contributions that were included on the second version of a filing.
In practice for us in California, the contributions on the second version of the filing are mostly duplicates of the contributions found on first version. There's even a transaction id that is unique within the different versions of a given filing. The typical differences included modifications to amount or the contributor's name/info Maybe the second version has additional contributions. There are surely a least few cases where a contribution is removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First, I think this system is agnostic as to how the filing IDs are generated - it just assumes them to be unique. I think it's fine for each state/municipality/whatever to be responsible for creating its own filing_ids for precisely the reason @palewire outlined - namely, each jurisdiction has a different, logically-consistent system (or at least, many do). As long as the IDs are unique (maybe give each jurisdiction a namespace) it doesn't matter how they're generated.
@gordonje to your first question, I envisioned each version of a filing as a separate Filing object, each with an Amendment section indicating it was overwriting the previous Filing. Each Filing would have all the Contributions and Expenditures associated with that version.
I added Amendments to the Questions section because I think how this should work is still fairly unclear to me. For instance, presumably each Contribution has a transaction ID (at least in some states). So with each version of a Filing, the same ContributionID will be present on a Contribution, and that Contribution's details might change, so how do we version the Contribution object without creating a really cumbersome system?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OCD has typically strongly preferred OCD ids that are not human readable (with the notable exception of ocd division ids). But, it is important to preserve the source identifiers and that has typically been done with an identifier
attribute. Notice this distinction between the id
and identifier
in voteevents
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not know about this distinction between id
and identifer
. Is there any system by which the uuid for the id
field are chosen or generated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not specified in any OCD spec (AFAIK). The defacto reference implementation of OCD right now is pupa, and it uses uuid.uuid1()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, @aepton I totally agree amendments are complicated... Federal senate paper filings only contain lines that overwrite the originals (there's a lotta good reasons to not think about paper filings, but I bet this kinda partial overwriting thing isn't as standard as I want it to be). Also, there's sometimes inconsistency in how filing-level amendment is reported. Is it A amends B amends C, or A amends B, and then C subsequently amends B? One can fallback on whatever is reported, but it's potentially messier than one might hope.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fgregg ok, cool - added an original_id optional field to Filing (and a few other objects) to capture the state-generated ID in cases where we care about it, and otherwise I'm happy to assume the IDs of every object are not human-readable, beyond being namespaced.
@jsfenfen My current proposal is to have a field called invalidates_prior_versions that allows us to have amendment actions on filings which either wipe out everything previously disclosed, or don't. I'm thinking amendments should be handled using a combination of that field (if we even need it) and lists of all transactions, etc contained in that filing.
So if you have filing A with transactions B, C and D; and then you have amendment E with transactions B and D; then you've got a lot of duplication but you can also just look at the most recent filing to see all the current versions of the currently-disclosed transactions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like @jsfenfen alluded to, amendments are messy, and to do useful things like make totals and comparisons, it is more complicated than is ideal. It is often helpful to separate that out making it easier to have a wide array of summary information and makes it easier to build out additional filters to help guide people toward not making double counting mistakes.
The FEC API is rolling out some improvements to our filing schema and there are a few things that are pretty useful and might be helpful concepts here. We are adding a latest_filing_id, and a most_recent boolean, which is a useful short cut to make sure you are looking at the right filing. We are also adding an array for the amendment chain of a filing. It is straightforward for electronic filers, and we are adding logic infer it for paper filings.
Also, I would love to hear any specific suggestions any of you have to improve FEC efiling schemas, API schemas or even forms!
**optional** | ||
Date (and possibly time) when filing period of coverage ends. | ||
|
||
from_organization |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was initially confused by "from_organization" as a name for this. It sounded like the place where we represent the originator of the filing. I recognize that it's consistent with how OCD models the chamber from which a bill originates, but I wonder if the analog is really that strong.
Is there a reason we can't call it "regulator" or regulatory_organization"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's above my pay grade :) I agree "regulator" or something like that would be more intuitive, but I'm sensitive to the need to integrate this into the larger OCD world too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see it both ways. Let's go with regulator.
Open Civic Data-style id in the format ``ocd-cf-amendment/{{uuid}}`` | ||
|
||
filing_to_amend | ||
Filing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would this section require a Filing attribute while the others would not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's an oversight; I should have removed it during a refactor. Nice catch :)
I've added a few of my own comments (apologies for taking my sweet time!). Overall, really like the direction in which we are headed. Also wanted to touch on FilingTypes: Are we imagining this as a means of modeling what we at CCDC call the [Filing Forms(http://calaccess.californiacivicdata.org/documentation/calaccess-forms/)? These include:
If so, I wonder if "FilingForm" or "FilingFormat" might be a name that more specifically describes this object but is still general enough to cover all cases. Don't mean to quibble too much about names, but the business rules we are talking about (and their changes) are often most clearly described in reference to these forms. To that point: the instructions for completing and submitting the filings often are communicated directly on the forms. At least that has been my experience in CA. Maybe there are other examples of FilingTypes I'm not thinking about. The raw CAL-ACCESS data also has a concept of "Statement Type", which is meant represent a real mish-mash of categorizations, including
But maybe a lot of this stuff is accounted for elsewhere, though maybe not as directly as we would like. For example, one can infer the length of the filing period for the filing_coverage_begin and end_date attributes. |
…ove away from specific transaction objects to a general model
@gordonje yeah, I think the Filing Type object is meant to represent what you describe. I think Type is a better name than Form because many of these "forms" aren't really paper forms anymore; a lot is disclosed electronically. I don't want to tie us too closely to the notion of specific forms in particular jurisdictions; it's definitely meaningful to talk about what's in a specific campaign's report, or a last-minute filing report, or a quarterly report, or something like that, so it's worth modeling those in the DB. But they're essentially just bundles of claims responding to a given rule/requirement, so I prefer Type to Form. |
Curious what the next step should be - I can't merge this PR, but beyond that, should I start trying to implement a version of this spec, or move on to the campaign entities thing in PR 62? I'm new both to meaningful contributions to open source projects in general, and certainly to how y'all want to move forward on this particular project. |
@aepton yeah, come to think of it, the specific filing forms are probably way further into the weeds than most folks care to be (maybe I just want someone to come find me!). Especially, if you're doing analysis across states/jurisdictions. Categories like "quarterly filing" and "semi-annual filing" are plenty meaningful, and the forms are more like a means of satisfying legal requirements that say like "you have to submit this specific information every quarter" or whatever. |
@aepton I think there's a number of questions to be resolved, I think we are a point where progress will be furthered by an attempted implementation. |
@fgregg We are currently working through the process of refining our raw data into humanized models at the django-calaccess-processed-data repository. Is there are a particular piece we should try to implement as first pass? We're currently coming in at the problem around the edges and are closest to an "Election" model like has been discussed in #62. |
The current reference implementations for OCD models live at https://github.com/opencivicdata/python-opencivicdata-django/tree/master/opencivicdata/models It would be great to do a couple of things as you work on the calaccess data.
|
👍 Really like that you stuck close to the @votinginfoproject specification on that proposal. You may already know this, but, in turn, @votinginfoproject is collaborating with NIST and their public working groups. Hopefully, all these different-but-related lines of work stay in sync. |
Update campaign_finance_filings.rst
I'm happy to start work on a reference implementation of this proposal for Washington state. I have some work to do on my platform before I'm ready to start, but I should be able to get on it soon. Does anything else need to happen for this PR to be merged? |
I think it's ready to be merged as a proposal, but I don't have the permission bits for that. attn @jamesturk @jpmckinney |
Note: I haven't read the full thread. Just reading the document and searching through the comments: Filing
CommitteeThis should be a subclass of Popolo's Organization. From what I can tell, only I'm not sure why committee type is its own object. Perhaps in terms of the code implementation it makes sense to have a code list as an object, but in terms of the schema, a controlled vocabulary can be used for a committee's With respect to a committee type's Candidate DesignationI don't see any property on other classes that has designations as its range (possible value). How do other classes connect to this class? PersonPerson in Popolo is a real person, so you can't use it for corporations... Filing TypeSee comments about committee types. Transaction
|
@LindsayYoung Where can I see the FEC's schemas? |
Re: new elections models, see my comment popolo-project/popolo-spec#104 (comment) Anyway, let's not have an Elections discussion in this already-long issue! Please create a new issue. |
Great question @jpmckinney Here are the API schemas: Click through to the metadata for the other FEC schemas http://www.fec.gov/data/DataCatalog.do |
Filing
CommitteeChanged to start_date, note and statuses. Added note about making this a subclass of Organization; should we just provide the fields that are different here then? I think committee_type should be its own object because any given jurisdiction will have several different types that don't necessarily translate cleanly across jurisdictions. And in cases where they do, the rules will nevertheless be different - candidate committees in WA have different rules apply to them than do candidate committees in IL, for instance. Registration filings should be captured by the Filing object; the jurisdiction filed here is meant to reflect which locality(ies) a committee belongs to, and hence, which laws apply to it (among other things). Candidate DesignationThat was an oversight; added a field for that to Committee. PersonWhat should we use here, then? Subclass of Popolo Person for "campaign finance persons" who, thanks to our Supreme Court, may in fact be corporations? This is an ambiguity not easily resolved; most of the time from what I've seen, looking at a given transaction it's impossible to tell if it's a person or a corporation unless you're a human using human heuristics that I'm uncomfortable emulating in this system. Filing TypeThese are useful to model the actual filings committees submit, which have meaning in various contexts, and may help us construct the is_current_filing chain (certain types get superseded by other types, in certain states, at certain times of day, with Venus in the appropriate phase, etc.) And these filings vary titanically from state to state, so I think they're worth modeling as first-class objects. Transaction
|
@jpmckinney The spec for the actual forms that filers submitted are detailed here http://www.fec.gov/elecfil/vendors.shtml, though it helps to know a bit about the rules for submitting them. |
@aepton @jpmckinney +1 for filer rather than committee, because in some jurisdictions folks who have to file campaign finance reports are explicitly not committees, and do not have to register as such (and there's a number of ongoing lawsuits arguing that some filers really should be committees subject to committee rules, etc.) |
Is this spec targeting only the FEC? My understanding was the goal was broader. Otherwise I can do one more look over and merge. |
@jpmckinney This pull request was started by @aepton after we discussed common challenges dealing with Washington state and California campaign finance data. Our goal is for this schema to work with statehouses as well as the federal data as much as possible. |
@jpmckinney Yeah, +1 to what @palewire said. I'd love it to work with any campaign finance situation, ideally - the Toronto civic data folks seemed interested, for instance. |
Anything else need to be done for this, or can it be accepted? |
@aepton I was going to do one more read-through - ideally this weekend. |
Just pinging this :) |
Merging the draft 🎉 Going to follow-up in new issues/PRs. |
Who are the primary contacts among the contributors to this thread for future modification of this OCDEP? |
@jpmckinney I'm not sure what you are asking? |
I just want to know whom to keep in the loop. I don't want to |
… filing models
Just wanted to make sure I was on something like the right track before I filled in more details.