Author : Hrishikesh Shetgaokar(hrishikesh036@gmail.com)
Guide : Prof. Avinash Deshpande
Co-Guide : Dr. Kaushar Vaidya
Institute : BITS Pilani, Pilani Campus & Raman Research Institue
Reviewers : Akhil Jaini, Aleena Baby
This package was developed during my time at Raman Research Institute while working with Prof Avinash Deshpande ('desh'). Currently, refactoring the old code to make it more readable, usable and efficient. Also adding new features
The objective of the 4 month thesis project was to study the radio emissions of a drifter pulsar at different bands to better understand the magnetosphere. More information can be found in my Thesis Report (Drifting_Subpulse_Thesis.pdf)
The idea here is to use OOPs to build a simple data processing tool for pulsar data.
- MBR data
- Packet Level Synchronization
- Pulsar Information Utility
- MBR to Dynamic Spectrum
- Folded Dynamic Spectrum
- De-dispersion
- Pulse Mask
- Time Series
The multi frequency data were recorded using RRI-GBT Multi-Band Receiver (MBR). The time varying voltage data from the MBR along with a header are saved in '.mbr' files. These '.mbr' data files are to be placed in MBRData directory channel wise (as shown below) for processing.
MBRData
|--> ch01
|--> ch02
|--> ch03
|--> ch04
...
The '.mbr' file name should be of the form "ch01_B0834+06_20090725_114903_000.mbr" where:
- First 4 characters (ch01) tell the channel number
- Next 8 characters (B0834+06) tell the pulsar name
- Next 15 characters (20090725_114903) tell the date and time of observation (2009/07/25 11:49:03)
- Last characters tell the sequence number
Each mbr data file is made up of 2,027,520 mbr packets. The mbr packet is 1056 bytes long. The first 32 bytes in mbr packet is the header which stores information like Observation ID, Source Name, Local Oscillator (LO) frequency, GPS count, packet count, etc. and the structure of its header is shown in Figure 1.1. The 1024 bytes after the header contain raw voltage data sampled at 33 MHz. Of these 1024, 512 bytes are for X polarization and Y polarization each.
Figure 1.1: MBR packet header
Each observation band is 16 MHz wide and the central frequency of this band can be calculated from LO frequency by using the formula RF = LO ± IF. Of these two values select the one which lies in the band. The LO, IF and resulting RF (central frequency) values obtained for 8 channels is given in Table 1.1
Table 1.1: Central Frequency for channels
The recorded data also suffer from missing packets. These missing packets can be considered as bad data and are flagged (given a value np.nan in the dynamic spectrum) so that they can be avoided while calculating time sequence. The missing packets were detected by keeping a count of number of packets read and comparing that to the packet number present in the header of current packet being read.
For analysing signals received across all 10 frequency bands, it is extremely important that we synchronize the data from each band. This would ensure that we compare and study pulse signals from different channels having the same origin at the pulsar.
For this, the header of each packet is analysed and the GPS count, packet number and GPS pulse for each packet is recorded. GPS count is the number of seconds after 00:00hrs on the day of observation. The count is updated every second. GPS pulse is a very short (nano seconds) pulse. If this pulse is ‘high’ while writing the header, 1 is recorded in the GPS bit. Else zero is recorded. Plot the GPS count vs Packet Number for all packets and fit a line through those with ‘high’ GPS bit. Let us call this line ‘A’. Refer Fig 1.2
Figure 1.2: GPS counts vs Packet Number
The packet with ‘high’ GPS pulse in header is marked in red
Such analysis is done for all 9 channels. Now taking the first occurrence of GPS count as a reference start for our observation, note the corresponding packet number on the line ‘A’ for all channels. Treat this packet number as the first packet for our observations in all corresponding channels.
The synchronization code used is present in synchronization_all.py. This code calculates first packet of synchronization for all channels and populate the data in ChannelVsFirstPacket_*PSRDetails*.txt. This code internally calls synchronization_single.py which calculates the same for only one channel.
For the code to run properly, MBR data files should be present channel wise in MBRData directory. It accepts mbr file name details without the sequence number as input command line argument. Example command to run from project root for mbr data B0834+06_20090725_114903:
python3 -m AnalysisPackages.synchronization.synchronization_all B0834+06_20090725_114903
Output: Channel number and first packet of synchronization for all channels is saved in the path: ChannelVsFirstPacket_*PSRDetails*.txt (AnalysisPackages/resources/ChannelVsFirstPacket_B0834+06_20090725_114903.txt)
Idea behind this utility is to have a class which can be used to hold all configuration parameters and pulsar properties at a single place. This class also has methods to populate the following band specific observation properties in config.txt:
- Central frequency
- Sampling frequency
- First packet for synchronization
- Dispersion delay(in terms of number of packets) for synchronization
Usage: Make sure that the config.txt file is present in correct resources directory((AnalysisPackages/resources/)). It accepts mbr file name details without the sequence number as input command line argument. Example command to run from project root for mbr data B0834+06_20090725_114903:
python3 -m AnalysisPackages.utilities.pulsar_information_utility B0834+06_20090725_114903
There'll be a prompt to choose a option to populate config.txt.
1. Populate central frequency for all bands in config.txt
2. Populate sampling frequency and first packet for all bands in config.txt
3. Populate packets to skip for dispersion delay synchronization across bands in config.txt
4. Do all
5. Exit(enter Q or 5)
Choose a desired option by entering corresponding option value(1,2,3,4,5,Q or q)
A Dynamic spectrum is simply intensity as a function of frequency and time. It can be constructed by calculating the spectrum (intensity as a function of frequency) at different times. To calculate the spectrum, we take Fast Fourier Transform (FFT) of 512 samples each in X polarization and Y polarization. Let X and Y be the resulting spectrum for x and y polarization. Intensity spectrum for x polarization can be calculated as (X.X'), where X' denotes complex conjugate of X. Similarly, intensity spectrum for y polarization can be calculated as (Y.Y'). Note that the intensity spectrum has 256 channels. The resulting intensity spectrum (Figure 4.1) forms a column of the dynamic spectrum (Figure 4.2) with each column (i.e., spectrum) taken at successive time intervals. Also note that the entire bandwidth of 16MHz is divided into 256 channels. So, each channel is 6.25 KHz wide. We also calculate cross spectrum as (X.Y').
Figure 4.1: Intensity spectrum
Figure 4.1: Dynamic Spectrum
Obtaining dynamic spectrum for a particular sequence of .mbr files is done in get_dynamicspectra.py. Performs the following tasks:
- Reads properties from config.txt
- Divides single mbr file into n parts(specified in config.txt) for processing
- Computes the intensity dynamic spectra for X.conjugate(X), Y.conjugate(Y), real(X.conjugate(Y)) and imaginary(X.conjugate(Y)) where X and Y correspond to spectra for X polarization and Y polarization respectively. More clarity can be found in section 3.3 in Drifting_Subpulse_Thesis.pdf
- Saves these 4 dynamic spectra as .spec files in output directory channel wise. For example, the .spec data for mbr sequence "ch03_B0834+06_20090725_114903" will be saved in path: OutputData/B0834+06_20090725_114903/DynamicSpectrum/ch03/
Usage: Run the following command to view help on all optional command line arguments
python3 -m AnalysisPackages.mbr.get_dynamicspectra -h
Output describes how to use the command:
usage: get_dynamicspectra.py [-h] [-s] [-plotXX] [-plotYY] [-plotRealXY] [-plotImagXY] [-psrUtil] [-t] input_file_name
positional arguments:
input_file_name The mbr filename without the sequence number(eg. ch03_B0834+06_20090725_114903)
optional arguments:
-h, --help show this help message and exit
-s, --packetSynch Do packet level synchronization across bands
-plotXX, --plotXX plot dynamic spectrum for X polarization after processing each part of mbr file
-plotYY, --plotYY plot dynamic spectrum for Y polarization after processing each part of mbr file
-plotRealXY, --plotRealXY
plot dynamic spectrum for real part of cross (X.Conjugate(Y)) after processing each part of mbr file
-plotImagXY, --plotImagXY
plot dynamic spectrum for imaginary part of cross (X.Conjugate(Y)) after processing each part of mbr file
-psrUtil, --psrUtil run pulsar_information_utility for populating config file before computing the dynamic spectra
-t, --timer print time taken to process each part of mbr file
Not passing the optional argument will simply not execute the corresponding functionality.
In most cases, we will run:
python3 -m AnalysisPackages.mbr.get_dynamicspectra ch03_B0834+06_20090725_114903
Figure 4.2: Sample output of mbr.get_dynamicspectra
For obtaining the dynamic spectrum with packet level synchronization across all bands we use the following command. Apart from the synchronization mentioned in Section 2, this also includes synchronization to compensate for dispersion delay across bands:
python3 -m AnalysisPackages.mbr.get_dynamicspectra ch03_B0834+06_20090725_114903 --packetSynch
All optional parameters set to true:
python3 -m AnalysisPackages.mbr.get_dynamicspectra ch03_B0834+06_20090725_114903 --packetSynch -plotXX -plotYY -plotRealXY -plotImagXY -runPsrUtility -psrUtil
Note:
- Setting the optional plotting parameters (-plotXX, -plotYY, -plotRealXY or -plotImagXY) will plot the selected dynamic spectra after processing each part of mbr file. After this, the program will ask during runtime whether to continue plotting for subsequent parts(Y) or not(n). The default is 'n'. That is, if a 'Y' is not provided, plotting will be suspended for subsequent parts.
Individual pulse signals of a pulsar are generally very weak. As a result, they have a low Signal to Noise Ratio (SNR) and are hence buried in noise. In order to detect the pulse, we need a to increase the SNR. For repeating signals, this can be achieved by simply dividing the data into slices where the duration of each slice is exactly the Period of repetition. These slices are then stacked and averaged. The noise in the signal varies randomly near a mean(baseline). Hence, when the slices are averaged, the random noise cancels out and the pulse signal which occur in every slice survive. Therby increasing the SNR. This process is commonly called "folding".
Usage: Run the following command to view help on all optional command line arguments
python3 -m AnalysisPackages.spec.get_averagepulse -h
The output describes how to use the command:
get_averagepulse.py [-h] input_file_name ch_number polarization spec_chunk_size
positional arguments:
input_file_name The mbr filename without the sequence number(eg. ch03_B0834+06_20090725_114903)
ch_number band number (eg. ch03 for band 3)
polarization polarization for which average pulse profile is to be obtained ('XX' or 'YY')
spec_chunk_size number of rows to be picked from .spec file at once (default value is 5000)
optional arguments:
-h, --help show this help message and exit
Example:
python3 -m AnalysisPackages.spec.get_averagepulse B0834+06_20090725_114903 ch03 XX
Figure 5.1: Folded Spectrum for B0836+06
As seen in Figure 5.1, the folded pulse spectrum has a high SNR and hence we see a sharp dispersed pulse. Also note that the off-pulse region is "smooth". This is because over large time periods, the random noise has canceled out.
Note: .spec file created in mbr.get_dynamicspectra is the input file for this step. Hence, make sure that .spec files are generated before this step.
As the pulsar signal propagates through Interstellar Medium to reach us, it interacts with intervening charged particles and suffers from dispersion. Dispersion occurs because the refractive index of intervening medium is frequency dependent. Lower frequency radiations travel slower than higher frequency radiation. Hence pulses at lower frequency are delayed. As a result, even if the pulsar produces signals at all frequencies at the same time, we would receive higher frequency pulse earlier than the lower frequency pulse. For more information, you can be refer to section 3.6 in Drifting_Subpulse_Thesis.pdf
It is important that we correct for this delay before proceeding with our analysis. The correct DM is found when the SNR is maximum:
Figure 6.1: DM Estimation for B0836+06
Note the increased Signal to Noise Ratio (SNR) as we approach the correct Dispersion Measure value. Also, the depression on both sides of pulse is due to compression. More on compression in section 5 of Drifting_Subpulse_Thesis.pdf.
Usage:
python3 -m AnalysisPackages.spec.get_dm B0834+06_20090725_114903 ch03 XX
positional arguments:
input_file_name The mbr filename without the sequence number(eg. ch03_B0834+06_20090725_114903)
ch_number band number (eg. ch03 for band 3)
polarization polarization for which average pulse profile is to be obtained ('XX' or 'YY')
Note: Average dynamic spectrum file created in spec.get_averagepulse is the input file for this step. Hence, make sure that the average spectrum file is generated first.
The folded dynamic spectrum is also used to get a pulse mask. The pulse mask is used to flag data for decompression. More on compression and techniques to get rid of it in section 5 of Drifting_Subpulse_Thesis.pdf.
Figure 7.1: Pulse Mask for B0836+06
Usage:
python3 -m AnalysisPackages.spec.get_pulsemask -h ✔ │ pulsar-analysis-package Py │ at 10:49:39
usage: get_pulsemask.py [-h] input_file_name ch_number polarization
positional arguments:
input_file_name The mbr filename without the sequence number(eg. ch03_B0834+06_20090725_114903)
ch_number band number (eg. ch03 for band 3)
polarization polarization for which average pulse profile is to be obtained ('XX' or 'YY')
optional arguments:
-h, --help show this help message and exit
Example:
python3 -m AnalysisPackages.spec.get_pulsemask B0834+06_20090725_114903 ch03 XX
This is also used for getting average spectrum(.avs files) in x and y polarizations which will be used for gain correction in these two polarizations while calculating Stokes I.
A utility was also created to draw pulse mask on average pulse spectrum. This improved pulse mask is used to tackle more difficult cases of decompression(like in ch04 of B0809+74 due to low DM and wide pulse)
Figure 7.2: Draw Pulse Mask Utility
Note: Average dynamic spectrum file created in spec.get_averagepulse is the input file for this step. Hence, make sure that the average spectrum file is generated first.
Time series is obtained by integrating the dynamic spectrum for all frequencies in the band. Prior to this integration, we need to de-compress and de-disperse.The command has optional parameters to set/unset decompression. By default, decompression will be done.
Usage:
python3 -m AnalysisPackages.spec.get_timeseries -h ✔ │ took 18s │ pulsar-analysis-package Py │ at 11:47:31
usage: get_timeseries.py [-h] [-chunk <int>] [-decomp1 <bool>] [-decomp2 <bool>] [-plot <bool>] input_file_name ch_number polarization [decomp_1_width]
positional arguments:
input_file_name The mbr filename without the sequence number(eg. ch03_B0834+06_20090725_114903)
ch_number band number (eg. ch03 for band 3)
polarization polarization for which average pulse profile is to be obtained ('XX', 'YY' or 'I' for stokes I)
decomp_1_width percentage of signals to be flagged around max signal for decompression by method 1. For details, refer documentation (default value is 40)
optional arguments:
-h, --help show this help message and exit
-chunk <int>, --spec_chunk_size <int>
number of rows to be picked from .spec file at once (default value is 5000)
-decomp1 <bool>, --decompression_method1 <bool>
setting this to False can disable decompression by method 1 (usage: '-decomp1 False' default=False)
-decomp2 <bool>, --decompression_method2 <bool>
setting this to False can disable decompression by method 2 (usage: '-decomp2 False' default=False)
-plot <bool>, --plot_ds_ts <bool>
plot dynamic spectrum and corresponding time series after processing each chunk (usage: '-plot True' default=False)
Example:
python3 -m AnalysisPackages.spec.get_timeseries B0834+06_20090725_114903 ch03 XX 20
Figure 7.2: Time series showing the effect of de-compression
- decompression method 1 -> For each spectrum in the dynamic spectrum, we flag the pulse data by flagging data nearby the point with max intensity. For details refer to section 5.4 "De-compression Method 2 – Detecting spectrum peak and rejecting nearby channels" in Drifting_Subpulse_Thesis.pdf.
- decompression method 2 -> We flag the pulse data by using the pulse mask. For details refer to section 5.5 "De-compression Method 3 – Pulse Position" in Drifting_Subpulse_Thesis.pdf.