Searching for an FRB in SDM Data

A transient search is composed of a series of steps to read, prepare, and process data. These are wrapped in functions in the pipeline module.

rfpipe.pipeline.pipeline_scan(st, segments=None, cfile=None, vys_timeout=10)[source]

Given rfpipe state run search pipline on all segments in a scan. state/preference has fftmode that will determine functions used here.

rfpipe.pipeline.pipeline_seg(st, segment, cfile=None, vys_timeout=10)[source]

Submit pipeline processing of a single segment on a single node. state/preference has fftmode that will determine functions used here.

The pipeline_scan function simply iterates over calls to pipeline_segment for each segment. Candcollections from segments can be added together to produce new candcollections with an array that is the concatenation of the inputs.

As an example, we demonstrate a search of a small SDM of VLA data that contains a bright burst from FRB 121102 (see this ApJ paper for details). The SDM data and calibration tables are available publicly thanks to the Harvard dataverse. To reproduce these steps, download the SDM and calibration files, “16A-459_TEST_1hr_000.57633.66130137732.scan7.cut” and “16A-459_TEST_1hr_000.57633.66130137732.GN”, respectively.

To start simply, set up the state by reading metadata from the SDM and defining preferences for a modest search:

> import rfpipe
> st = rfpipe.state.State(sdmfile='16A-459_TEST_1hr_000.57633.66130137732.scan7.cut', sdmscan=7, inprefs={'dmarr': [0, 565], 'dtarr': [1,2,4], 'npix_max': 512, 'gainfile': '16A-459_TEST_1hr_000.57633.66130137732.GN', 'savecands': True})

This should produce a summary of the metadata and pipeline state, including lines such as:

Reading metadata from 16A-459_TEST_1hr_000.57633.66130137732.scan7.cut, scan 7
Metadata summary:
   Freq range: 2.488 -- 3.508
   Scan has 800 ints (4.0 s) and inttime 0.005 s
Pipeline summary:
   Using 1 segment of 800 ints (4.0 s) with overlap of 0.2 s
   Found telcal file /lustre/evla/test/realfast/repeater/16A-459_TEST_1hr_000.57633.66130137732.GN
   Using fftw for image1 search at 7 sigma using 1 thread.
   Using 2 DMs from 0 to 565 and dts [1, 2, 4].
   Visibility/image memory usage is 1.1501568/1.6777216000000001 GB/segment when using fftw imaging.

Be sure that the telcal file is available locally and is found here. It is also good to confirm that the memory required for the search is available locally.

Given that the SDM is small (only 800 integrations and about 1 GB in size), it is easy to search the whole scan with the rfpipe.pipeline.pipeline.scan function. Simply run:

> cc = rfpipe.pipeline.pipeline_scan(st)

This will log the data reading, preparation, and searching:

Reading scan 7, segment 0/0, times 16:18:58.220 to 16:19:02.220
Read telcalfile /lustre/evla/test/realfast/repeater/16A-459_TEST_1hr_000.57633.66130137732.GN with 1 sources, 3 times, 16 IFIDs, and 27 antennas
Selecting 432 solutions from calibrator J0555+3948 separated by 3.1963333406019956 min.
flag by badchtslide: 0/1600 pol-times and 0/512 pol-chans flagged.
Correcting by delay/resampling 37/1 ints in single mode
Imaging 763 ints (0-762) in seg 0 at DM/dt 560.0/1 with image 512x512 (uvres 104) with fftw
Got one! SNR 24.5 candidate at (0, 400, 0, 0, 0) and (l,m) = (-0.0003943810096153846,0.0005446213942307692)

Congratulations, you just found FRB 121102 with the VLA!

Note that the flagging algorithm is tunable via the Preferences and can affect the quality of the search. Also, the typical search will set the preference for timesub='mean', which subtracts the mean visibility calculated over the segment (after flagging). In the present case, we do not use mean subtraction because the FRB is very bright and biases the mean in this short segment of data.

Looking at the CandCollection:

> print(cc)
CandCollection for 16A-459_TEST_1hr_000.57633.66130137732.scan7.cut, scan 7 with 4 candidates

> cc.array
array([(0, 400, 0, 0, 0, 22.40985 , 4611.461, -0.00039438, 0.00054462),
       (0, 400, 1, 0, 0, 22.1478  , 4529.001, -0.00039438, 0.00054462),
       (0, 399, 2, 0, 0,  8.587416, 1359.625, -0.00039438, 0.00054462),
       (0, 400, 2, 0, 0, 21.640526, 4386.424, -0.00039438, 0.00054462)],
      dtype=[('segment', '<i4'), ('integration', '<i4'), ('dmind', '<i4'), ('dtind', '<i4'), ('beamnum', '<i4'), ('snr1', '<f4'), ('immax1', '<f4'), ('l1', '<f4'), ('m1', '<f4')])

> cc.candmjd
array([57633.67986366, 57633.67986366, 57633.6798636 , 57633.67986366])

> cc.prefs
Preferences(rfpipe_version='0.9.6', chans=None, spw=None, excludeants=(), selectpol='auto', fileroot=None, read_tdownsample=1, read_fdownsample=1, l0=0.0, m0=0.0, timesub=None, flaglist=[('badchtslide', 4.0, 10), ('blstd', 3.0, 0.05)], flagantsol=True, badspwpol=2.0, applyonlineflags=True, gainfile='16A-459_TEST_1hr_000.57633.66130137732.GN', simulated_transient=None, nthread=1, segmenttimes=None, memory_limit=16, maximmem=16, fftmode='fftw', dmarr=[560, 565, 570], dtarr=None, dm_maxloss=0.05, mindm=0, maxdm=0, dm_pulsewidth=3000, searchtype='image1', sigma_image1=7, sigma_image2=None, nfalse=None, uvres=0, npixx=0, npixy=0, npix_max=512, uvoversample=1.0, savenoise=False, savecands=False, candsfile=None, workdir='/lustre/evla/test/realfast/repeater', timewindow=30, loglevel='INFO')

You can see how the CandCollection has columns that are integers (segment, integration, dmind, dtind, beamnum). Those are indices that define the location of the transient in the data. For any given set of these five values, one can use the state to uniquely find (and reproduce) the state of the data associated with the candidate.

The other columns in this array are features of the data used to make the detection. The basic set of features are the SNR and location (l1, m1), however others are supported, include a range of statistical measurements of the image and spectra. The features are calculated at detection time, so they must be set with the state. You can see them like this:

> st.features
('snr1', 'immax1', 'l1', 'm1')