-
Notifications
You must be signed in to change notification settings - Fork 661
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Final analyses in Generalization of reimbursements using DeepLearning Keras #238
base: master
Are you sure you want to change the base?
Conversation
silviodc
commented
May 18, 2017
•
edited
Loading
edited
- Converting PDF files to png and then applying sift descriptors
- DeepLearning Keras to detect generalization in reimbursements
Hi @silviodc, thank you very much for the contribution! 🎉 Me, @jtemporal and @cabral took a closer look into it and found the best way to understand what you have done! Discussing in the public group helped us to find a way to do that! :) So you are looking for some answers, so here it is:
No it is not allowed, should be described on the receipt
No, it is not allowed that the receipt is altered (as the law says). It looks very suspicious. It is important to notice that the receipt was not changed but extra information was added by hand outside of the receipt while scanning it, and the CEAP clearly states that also can't happen. We have added an image that highlights part of the law bellow that proves that. Taking a closer look at the place, that is the value of at least two people meal Can we bring the discussion here? |
Hey @silviodc, I have some questions on the effectiveness of your model! First things first, thanks for sending a PR to serenata that includes CNNs and DNNs to this project, it's a great way to contribute to Brazilian politics! Keep doing it man! Let me ask you some questions (then it will help you to see if your work is really doing what it is meant to be or not):
|
Hi everyone, Going to the method:
I didn't try, but it is my plan to the future. I will use the 2483 suspicious reimbursements i got in my run to build a proper training set and out dataset.
So, verifying the first results and the 2483 files, it is suffering of overfitting. I thought that generating many images with The main point is: I used few data (500 reimbursements; 250 per class), i was really lazy to build the corpus haha. So, I already classified 1400 reimbursements of 2483 by hand. Maybe until august I will have the training dataset prepared. I guess with this dataset we can have a better view whether it’s a good approach to include. PS: thank you guys so much for this project! I'm very proud of you 👍 and happy for participate |
Hi @silviodc, so according the law, we have that:
According to the law, the congressperson have to give the original document, and the chamber will scan it. By knowing that, they cannot bring any scanned document to the Chamber.
The responsibility lies on the deputy, according to this piece of the law: " Art. 4º A solicitação de reembolso será efetuada mediante requerimento padrão, assinado pelo parlamentar, que, nesse ato, declarará assumir inteira responsabilidade pela liquidação da despesa, atestando que:
So you are thinking about classifying all those reimbursements to create a training dataset, and plan to use it as approach to include? Is there something that we can do to help, besides testing and giving opinion about it? |
Continuing the discussion. Take a look:
Nota deputado Marcon sobre refeição de 130 reais Did you find an answer to it?
PS: For how long we have to play the game: "I'm not guilty..." ? So, for the method....
Best, |
We’ve found a answer and we wrote a article in Portuguese about that. Creating news around it and developing social pressure will have good results and education. We’re aligning with lawyers the next steps.
We are not playing, they’re :)
We had an amazing experience with crowdsourcing before, and we've got plans to do the same here. Soon we’ll release a spreadsheet and drop a line through FB and Twitter. Thank you very much! 🎉 |
hi @silviodc and everyone! Just an update here: We are releasing today the file for the crowdsourcing ;) Pretty soon we'll have the files identified for testing the method 🎉 |
Hi @jtemporal |
Why not run the model on all ~160k meals receipts and make the spreadsheet available and ordered by the probability generated by the model? It will make the process faster. And after that, a better model can be built. More specifically, we can classify by hand only the receipts that have a probability between 0.75 and 0.35. |
Hi everyone! Finally i executed the code over the new Dataset! Take a look in the results they are amazing :D |
Thanks so much for follow this PR too. During the construction, we archived 86% in training and 94% in evaluation. Therefore, the built model can also be generalized to other data.
I think that now the model is pretty good to be used. We have 91% in an external test data. Regarding the probability value, mentioned here:
Well, this is the probability value regarding the model. It doesn't confirm any think about the reimbursement. In the end we will have a lot of reimbursements which must be validate by hand.
Thanks so much for your comments and interest. |
thanks for the response. but I think that a small test data does not capture all variability of the dataset. and applying in all dataset will not generate too many cases. With this accuracy, only a feel cases will remain in doubt. There will never be a model that prevents manual classification. So, the questionable classification can be iteratively corrected, like google us in they captcha system. Can you post the confusion matrix,sensitivity and specificityof the model? |
…-amor # Conflicts: # conda_requirements.txt # research/Dockerfile # research/requirements.txt
# Including pdf > png # png > sift descriptors # png > keras classifier
PDF to PNG ok PNG to SIFT (error in opencv)
Change the workflow for png references
Download files OK Split Files ok Testing trianing ...
…-amor # Conflicts: # research/Dockerfile
# Building Reference Dataset ok # Building Keras model and evaluation OK # PDF-> PNG OK
# Using dhash to detect near duplications.
# Inclusion of Fourier transformation to detect rotation, zoom, and filters.
…tecture-refactor Split logic: public admin and dashboard