Skip to content

Latest commit

 

History

History
515 lines (399 loc) · 23.8 KB

README.md

File metadata and controls

515 lines (399 loc) · 23.8 KB

100days

100 days of AI/ML. The requirement of the challenge is to study or apply AI/ML each day for 100 days. Previously I have done this twice, as documented in Round 1) and Round 2). The third round started on September 3rd.

Day 201 [2019-09-02]

Today I worked on another way of outputting probablity distributions from the neural network. The approach I have been taking so far is creating a grid in the 1D output space and predicting the probability in each grid cell. The output is then constrained using cross-entropy loss. Instead I experimented with using a Gaussian Mixture Model (GMM) as the last layer in the network. This is not provided by PyTorch, so I needed to partially implement something myself. Below is an example of an output:

GMM probability distribution

The important part here is to not automaticall assume this is actually a probability distribution. It could simply be a continous line peaking somewhere around the correct answer, but not without a proper probabilistic interpretation. After quite some back and forth, it looks like it works when fitting straight lines. I also had a test on some simulated galaxies, comparing to the more traditional way of outputting probability distributions. So far it seems promising, but I need to test using real data.

Day 202 [2019-09-03]

Read some further blog posts about mixture density networks (MDN) and large parts of the original paper Bishop 1994

Day 203 [2019-09-04]

Experimented more with these types of networks. The Bishop1994 paper had one very simple example, generating 1000 datapoints with two variables. For certain input values, two output values are equally likely. There are simply not enough information to fully give the output. Here the posterior should be multimodel. I have polished up the notebook. Below is a reproduction of Fig.7 in the paper

Bishop1994 Fig.7

Day 204 [2019-09-05]

The mixture density network used an exponential to keep the width of the distribution. Alternatively, one can use 1+elu, which I saw elsewhere. Here elu is the exponential linear unit. Above zero it equals ReLU (identity function), but has a smooth transition around zero and approach -1 when going to minus infintity. I tested training the network with both approaches five times each. The result is shown below.

exp versus elu

For this case, the elu converge faster. Further, I was reading up on Machine learning in astronomy while waiting.

Day 205 [2019-09-06]

Implemented a mixture density network in the distance determination pipeline, being able to compare with the traditional results. Some of the results does not make sense at all, with the network not training properly.

Day 206 [2019-09-07]

Today I managed to get it working. In the end, not fully sure what ended up making the difference. The results starting to make more sense when adding a non-linear layer (ReLU) directly before the MDN. This network is quite deep and I had skipped this single ReLU. Moreover, there was a significant performance problem when evaluating the MDN on the test set. Outputting the values on a grid was extremely fast. For some reason, evaluating the MDN was rather slow. This ended up becoming a serious bottleneck, since earlier I evaluated the test set metrcis after each epoch. Only doing this every 50th epoch lead to a significant speedup. This allowed for creating a sufficiently large run and evaluating the performance. The MDN now give sensible results for a simplified test, which has removed some of the results of pretraining on simulations.

Day 206 [2019-09-08]

Modified the part of the distance estimation which was pretraining on simulation to also be able to use a mdn. I managed to make some different runs. While working, I did not achieve better results than constructing the posterior distribution by combining man different output classes.

Day 207 [2019-09-09]

Wanted to look at how to input errors to the network, but without much success. Read through blog post and some other articles.

Day 208 [2019-09-10]

Read through the noise2noise paper.

Day 209 [2019-09-11]

Read the noise2void paper. In the noise2noise paper, one would not need clean examples to denoise the image, but multiple noisy realization of the same underlying image. This is not always possible to get. Quite interesting for one of our applications.

Day 210 [2019-09-12]

Managed to find some references on how to treat uncertainties in the neural network. The Lightweight probabilistic deep learning paper is explaining how to let the activations be probabilistic. For other approaches the weights would be interpret as Gaussians. This approach is supposed to be significantly faster and also avoid repeated forward passes, which many probabilistic methods rely on. I also attempted making another implementation of the MDN network. Evaluating the probability function is now extremely fast, but the training seems to be affected. Not sure why.

Day 211 [2019-09-13]

Found out that the amount of dropout used is critical for getting a good performance. For classification tasks, you can easily use 20-50% dropout for good results. I ended up using 2% in the early layers. More dropout degraded the results.

Day 212 [2019-09-14]

Watched interview with Jeremy Howard.

Day 213 [2019-09-15]

Continued tweaking of the model, experimenting with the results. The results are looking quite close to the previous ones when considering all objects. However, it looks like the result is improving compared to the previous results when only considering the best 50%.

Day 214 [2019-09-16]

Worked on writing an application today, which will contain a significant amount about deep learning. Is deep learning hype? I read found this article and Howard interview quite constructive. Also read on transfer learning

Quote from the paper: "As an example relevant to ICF, researchers at the National Ignition Facility (NIF) [18] have used transfer learning to classify images of different types of damage that occur on the optics at NIF. There are not enough labeled optics images to train a network from scratch, but transfer learning with a network pre-trained on ImageNet [13] produces models which classify optics damage with over 98% accuracy."

That is quite a strech of domain.

Day 214 [2019-09-17]

Watched Yann Lecun interview.

Day 215 [2019-09-18]

Read about transfer learning in astronomy and some other papers.

Day 216 [2019-09-19]

Looked at a video and
presentation about transfer learning and multi-task learning.

Day 217 [2019-09-20]

Looked at panel debate about probabililistic networks.

Day 218 [2019-09-21]

One idea I explored earlier was using neural networks for density estimation. The earlier attempts, some semesters ago, was not working very well. Attempting again, I read through paper on density estimation and made an implementing in the notebook.

Day 219 [2019-09-22]

Read through a paper from Google AI on unsupervised learning. It was voted the best paper on ICML2019. The most interesting part was actually the style. Many papers are focusing on achieving a minor improvement on some benchmark. Here the authors was giving some proofs and a lot of tests that a general unsupervised separation was possible. They has a ridiculles number of pages in the appendix.

Also read the res2next paper

Day 220 [2019-09-23]

Continued with the notebook from day 218, attempting to have it working. Did not function either after attemping different things for an hour. However, late I realized at least one problem. Hopefully that will solve the issue.

Day 221 [2019-09-24]

Watched Regina Barzilay: Deep Learning for Cancer Diagnosis and Treatment interview. There was some interesting points, like her thinking about medical problems and the potential for using deep learning for early cancer diagnosis.

Day 222 [2019-09-25]

Continue working on the reweighting, doing some expressions on paper and implementing them. As shown below, I have no success yet. Reweighting failure

Day 223 [2019-09-26]

Read through two papers (paper 1, paper 2) on using recursive neural networks for denoising gravitational wave observations.

Day 224 [2019-09-27]

Watch through Francois Chollet interview, at least partially.

Day 225 [2019-09-28]

Looked at the Turing Lecture talk.

Day 226 [2019-09-29]

Next video with Susskind in the AGI podcast series. He is a well known persons from the physics community. A bit too many vidoes in a row by now.

Day 227 [2019-09-30]

First test of actually using the PAUS spectras. Here I used the simulations to predict the distance from a subset of the observations. Without noise the results was better than expected from what I previously has read in the literature. A bit unecpected.

Distance test

Day 228 [2019-10-01]

Read through a paper which used denoising autoencoders to unsupervise feature extraction from galaxy data. This is relevant for what I am doing.

Day 229 [2019-10-02]

Worked on writing a proposal based on deep learning methods. While doing this application for a while, today I spent 6 hours on the main project part.

Day 230 [2019-10-03]

Wanted to look more into multi-task learning. I read an overview paper from Sebastian Ruder.

Day 231 [2019-10-04]

One problem I have looked on many times is using neural networks for speeding up the calculations of simulations. Basically, you have a large training sample with noiseless examples and would like to train a network to be able to produce a billion new examples, conditioned on some parameter. Training a network, it kind of work, but the error on the output was 2-3% at best. This is not sufficiently good for our application. Finding relevant literature was not easy. In the end, I figured out the magic searchword was "interpolation".

Day 232 [2019-10-05]

Watched Gary Marcus: Toward a Hybrid of Deep Learning and Symbolic AI interview. It was quite interesting hearing from an insider talking about what he considered some limitations of current deep learning systems. Abstract concepts seems to be missing in deep learning networks and it unclear how to build them in.

Day 233 [2019-10-06]

Continued looking at the interpolation network. Instead of only outputting the value, I also returned the error. The hope was to use this to downweight problematic points. In the image below, the three columns are the training loss, test loss and the relative error. It did not work very well. I also tried other sources of tweaking.

Not coverging

Day 234 [2019-10-07]

Last full day of writing a proposal which has a strong machine learning component. Looked up references on multi-task learning, including Multi-task learning book

Day 235 [2019-10-08]

Continuing about obsessing about the interpolating of measurements using neural networks, looking at [paper] (https://arxiv.org/abs/1906.05661)

Day 236 [2019-10-09]

Also played around with the interpolation, reading up some papers.

Day 237 [2019-10-10]

Continuing with trying to predict the fluxes. Both using ELU and trying to reduce the prediction to using a single band.

Relative error

This was after reading some paper which talked about a smooth non-linear transition would work better than a simple ReLU. That did not seem to be the case.

Day 238 [2019-10-11]

Looked into various sources of tranfer learning, including the paper A survey on transfer learning.

Day 239 [2019-10-12]

Not very effective. I was searching if there was some new trends, looking at various articles.

Day 240 [2019-10-13]

Read through quite some papers, being away from the laptop today.

In Shapely framework the authors introduce a way to determine if improved performance comes from a change in the algorithms or the data. Quite technical.

Foggy scenes tested the improvements when degrading scenes with artificial fog. This is possible when having a 3D model and a simple model of the fog, where the transparency is distance dependent.

Harware acceleration. Not the most interesting paper. The compared running on CPU or GPUs. The paper did not give a good impression.

Day 241 [2019-10-14]

Watched Watson interview on how they constructed Watson and beat the best human player in Jeopardy. He had an interesting perspective on how the project was run. With a difficult task, it is easy to assume achieving the goal would require inventing something completely new. Instead they mostly used existing technology and let different groupd invent on separate parts.

Day 242 [2019-10-15]

I wanted to test a specific transfer learning technique. For this I needed a simple example. For this I constructed a CNN which could determine the frequency of a wave. Testing this simple example, it did not work at all. Which should not be the case. At the end of 1.5 hours I had removed all complications, but the network was still not working.

Sinus recovery

Update: In the end, the problem was located in two matrices in the loss function being broadcasted differently than expected. Not too easy to detect, since I directly afterwards did a mean of the results.

Day 243 [2019-10-16]

Worked on adapting a new set of simulations and then pretrained the network with these. It was looking good so far.

Explanation of the missing days.

In the middle of the third round between 200 and 300 days, I started loosing the interest. Around the same time I worked on finishing up a research paper using deep learning techniques. Focusing on finishing this paper felt more productive at the time. This resulted in a very long time where I did not follow this good habit of working on deep learning each day. One pandemic later, I am finally back again.

Day 244 [2021-04-30]

Consider having a distribution and wanting to create a density estimator (like the KDE). Is it possible todo this with a neural network? Previous attempts that I had on this failed. By now I tested creating a neural network which gets a single constant input. The output is given by using a mixture density network. Below is a plot showing how this fits.

Model probability

Day 245 [2021-05-01]

Another problem with the MDN is when having a multi-variate distribution. One can in a simple way return multiple independent predictions. The problem is when there are correlation between the various predictions, which is very often the case. Below is an example of two correlated Gaussians.

Multi variate

In general there were not a lot of useful literature on the topic. The technical note Training Mixture Density Networks with full covariance matrices had some useful tips. Some (Cholesky factorization) was along what I though of doing, but it included also some other ideas (eq.10). Tomorrow I hope to make an implementation.

Day 246 [2021-05-02]

Worked on actually making the implementation, fitting the MDN to a 2D Gaussian with a correlation between the variable. The result

First try on 2D MDN

shows some more work is needed.

Day 247 [2021-05-03]

Worked on installing tensorboards. Should be simple, but for some reason it was not willing to connect.

Day 248 [2021-05-04]

Continued working on tensorboard. After some beating I had it running. Uploading to tensorboard.dev is a neat feature.

Day 249 [2021-05-05]

[VIDEO GTC intro]

Day 250 [2021-05-06]

Experimenting to find the problems with the GMM with covariance. The code looks fine. When reducing to one component, I manage to get the result below Somewhat working 2D MDN

which visually looks quite similar (did not test further). What is going on will be the topic for another day.

Day 251 [2021-05-07]

Day 252 [2021-05-08]

Day 253 [2021-05-09]

Worked on reading through the webpage that Christian sent.

Day 253 [2021-05-10]

Got the multi-dimentional MDN working and also worked on a calibration network for astronomical images.

Day 254 [2021-05-11]

Worked on preparing the input when systematically removing individual images.

Day 255 [2021-05-12]

Trained an autoencoder to remove noise in the zero-points.

Day 256 [2021-05-13]

Listened to interview on Spotify on MLDL in Heiniken.

Day 257 [2021-05-14]

Started testing the zero-point auto-encoder by downloading the data to be corrected and writing the code for managing joining the data. Unfortunately I ended up having a problem where some data was not exactly what was expected, which took some time to figure out.

Day 258 [2021-05-15]

Continued with the auto-encoder. It only gives a quite small improvement on the final numbers.

Day 259 [2021-05-16]

Listened to some interview on how NSF is investing into deep learning.

Day 260 [2021-05-17]

Created a conference poster on the photometric redshift with deep learning paper I published last year.

Day 261 [2021-05-18]

Extended the zero-point auto-encoder to work with multiple bands. In the end one only find a tiny improvement. I presented these results to the PAUS collaboration.

Day 262 [2022-05-19]

Watched "Using Deep Learning and Simulation to Teach Robots Manipulation in Complex Environments" from GTC2021 and the start of another video.

Day 263 [2022-05-20]

Experimented more with tensorboard. Among other things, I had a problem with using tensorboards inline in the notebook which did get fixed before deleting the content of the log directory.

Tensorboard inline

Day 264 [2022-11-04]

Experimented with trying to map out instrumental zero-points by directly predicting the zero-point per star. Below is a pattern

ZP trend

which shows up for all image IDs. This trend is probably because the network has not been train enough/correctly and so far it focus on getting the correct zero-point per image.

Day 265 [2022-11-05]

The problem yesterday seems to come from how the training set is sampled. We have 12096085 observations divided on 204920 images. When selecting 1000 galaxies in a batch, each of the stars most likely belong to a different image. The simplest way for the network to get a good fit is adjusting the zero-point for each image. This means it will never learn the pattern as a function of the position on the CCD.

Instead I have switched to loading all stars (~200) in a mosaic at once. Writing this custom dataloader was what ended up taking most time, since this needs to be computationally efficient. By now the network is training again.

Day 266 [2022-11-06]

Continued looking at the AI for medicine course from coursera. Currently in the first week. When getting to the weighting of underrepresented classes I had a look at an astronomical dataset with exactly this problem.

Day 267 [2022-11-07]

First spent some time attempting to get the visual debugger in jupyter lab working. I have seen it beeing installed by default, but never found out how it worked. It turnes out you need to install xeus-python.
Installing the binary worked, but I did not get how to get one of the conda environments listed in Jupyterlab. There did however exist a "xpython" option in the list, pointing to an anaconda installation. I could play around with this. Also, the mamba package manager is a faster drop-in replacement for conda.

More importantly, I continued working on mapping out the zero-point variations accross the CCD. Creating a custom "collate_fn" function, I can now process multiple mosaics at once. Running the code with 10 mosaics was for some reason much faster than expected. By now the CCD pattern is different for different images.

Day 268 [2022-11-08]

Continued working on predicting the zero-points. While one should predict this on the position of the galaxies, I tested predicting a 50x50 grid and then averaged over all positions to get an image level prediction. This could then easily be compared with a classical algorithm. Plotting the histogram of image zero-point, I find that

Zero-point histogram

where the neural network prediction tends to center around the median. That seems to be a problem with using a L1 loss. Tomorrow I will attempt using a mixture density network.

Day 269 [2022-11-09]

Setting up some broken environment again and testing converting the output to a mixture density network. By now it at least trains. Later I will try to optimize this further.

Day 270 [2022-11-10]

Read through deblending paper before tomorrows journal club. Deblending paper

Day 270 [2022-11-11]

Watch the GTC 2021 keynote.

Day 271 [2022-11-12]

Finished week 1 of AI for medical diagnosis on coursera.

Day 272 [2022-11-13]

And then week 2. It is quite simple.

Day 273 [2022-11-14]

Kind of finished week 3 and the first course. Because of some technical issue coursera does not let me submit the coursework right now. A bit annoying.

Day 274 [2022-11-15]

Worked through week 1 and 2 of the second course.

Day 275 [2022-11-16]

Trained the MDN network for zero-points. While training I ended up going through some blog posts about MLOps.

Day 276 [2022-11-17]

Watched on of the GTC talks.

Day 277 [2022-11-18]

Worked on the zero-point prediction with MDN. By now it gives a reasonable distribution. Comparing the predictions directly, there is a lot of scatter, but it is uncertain what level is expected from the errors. Zero-point prediction