Skip to content

Collection of material used during the IA Summer Internship 2022

License

Notifications You must be signed in to change notification settings

pedro-acunha/Summer_internship_IA_2022

Repository files navigation

Summer_internship_IA_2022

Collection of material used during the IA Summer Internship 2022 for the project: Automatic classification of galaxies using the Galaxy Zoo data and supervised learning.

Supervisors: Pedro Cunha & Ana Paulino-Afonso

Relevant files:

  • data_preparation.py: Script for data preparation;
  • dl_model_GZ2.py: Script with data pre-processing and deep learning models. This showed be viewed as a starting point.

Detailed Plan:

Task 1: Exploring Galaxy Zoo data

The Galaxy Zoo project provided the classification results in the following website: https://data.galaxyzoo.org/

The first thing to do is download the data. You will get the dataset from Galaxy Zoo 2 from here: (https://zenodo.org/record/3565489#.YsglD9JByV6). Read the description page carefully alongside with this paper: https://doi.org/10.1093/mnras/stt1458. It is important to cross-reference the images with the classification from Galaxy Zoo. You can do it by using the ObjID. The class you will consider for the classification is the “gz2_class”. My recommendation is for you to identify the classes in the dataset and select a random sample of sources with that label (e.g. 2,000 galaxies classified as Er, etc). You are free to choose the number of classes you want to use (e.g, 2 for a binary classification, or all of them). Remember that the number of chosen classes will increase the size of the data set and the computation processing time. At the end, you should have a main folder with subfolders that corresponds to the classes of the galaxies. This will be helpful for later!

Task 2: Preparing the pipeline

After the data processing task, you need to start building the pipeline. Here I propose you to check the following examples:

Task 3: Testing your model

After you have your model ready, it is time for evaluation. Remember, we are doing a supervised task. This task is actually pretty linked with the previous one. You should build at least 2 models: (1) Baseline: this should be a simple one for comparison and to understand how complexity help the problem in hand; (2) CNN: Taking into consideration the model (1), you can try to add more layers to the deep learning model, in particular 2D convolutional layers. You are encouraged to test multiple models and achieve the best result possible.

Have fun!

About

Collection of material used during the IA Summer Internship 2022

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages