ASSESSMENT METHODS

Metrics

Runtime is a crucial parameter with regard to clinical applicability and shall be provided together with hardware requirements for all submissions.

Other parameters depend on the actual algorithm classes: 

Task 1: Aneurysm Detection

For the aneurysm detection challenge, we expect participants to submit for each case per aneurysm a point coordinate representing the location x_cA of the aneurysm (ideally the center of the bounding box) in world coordinates. Additionally, two vectors v_Ab1 and v_Ab2  may be provided that describe the orientation and extent of the bounding box.

If for a list of points X_c={x_c1, …, x_cn} for a case c at least one point coordinate corresponds to a voxel inside an aneurysm mask M_cA the aneurysm cA is considered detected: ∃ x_ci ∈ X_c: x_ci ⊂ M_cA. The point hitting the aneurysm mask is considered a true positive. Any additional point hitting that same aneurysm mask is ignored. A point that does hit any aneurysm mask is considered a false positive. A non-detected aneurysm is considered a false negative.

We intend to calculate the following metrics:

        a. Recall R(true positive rate, sensitivity): 

            

        b. Precision P(positive predictive value): 

            

        c. Coverage C_cAof aneurysms cA by bounding boxes BB_cA

         

        d. Bounding box fit F_cA(max distance of bounding box from mask along main axes of the bounding box)

            

The major goal in detection is to make sure that aneurysms, which may pose a stroke risk, are not overlooked, so the sensitivity is an important measure. On the other hand, if the whole image is marked, the aneurysms would be included but the information would be meaningless, so the precision is important as well. A bounding box is helpful if it supports visualization and postprocessing. To this end, it should contain the aneurysm but be as small as possible.

The ranking will be based the F_2-score that combines recall R and precision P considering recall twice as important as precision:

            
The bounding box metrics will only be used in case of an equal ranking. Results will then be further ranked according to the aneurysm coverage C_cA, and the bounding box fit F_cA if this is not decisive.
The ranking score is chosen such that sensitivity is weighted stronger than precision because missing a risk structure is considered worse than providing a false-positive result. The coverage and bounding box fit is considered as second and third level measures for ranking of results with equal F_2-score because they are considered less important. 
The authors are expected to perform cross-validation on the training dataset themselves.