New Paper: M. Liu, R.A. Calvo, V. Rus (2014) “Automatic Generation and Ranking of Questions for Critical Review”

M. Liu, R.A. Calvo, V.  Rus (2014) “Automatic Generation and Ranking of Questions for Critical Review”. Educational Technology & Society.  Volume 17, Issue 2, 2014.
Critical review skill is one important aspect of academic writing. Generic trigger questions have been widely used to support this activity. When students have a concrete topic in mind, trigger questions are less effective if they are too general. This article presents a learning-to-rank based system which automatically generates specific trigger questions from citations for critical review support. The performance of the proposed question ranking models was evaluated and the quality of generated questions is reported. Experimental results showed an accuracy of 75.8% on the top 25% ranked questions. These top ranked questions are as useful for self-reflection as questions generated by human tutors and supervisors. A qualitative analysis was also conducted using an information seeking question taxonomy in order to further analyze the questions generated by humans. The analysis revealed that explanation and association questions are the most frequent question types and that the explanation questions are considered the most valuables by student writers.

Automatic generation of natural questions is a challenging task. This article addressed this challenge and presented a novel AQG system for supporting academic writing by applying overgeneration-and-ranking approaches. The results indicate that this approach is effective since the ranking model improved the acceptability of the top 25% questions by 10%. In particular, RankSVM slightly outperformed Logistic Regression and the experiments revealed that the semantic features are important. However, if the document is poorly written and cita14-QuestionGeneration-Architecturetion sentences contain grammatical errors, it would cause the system to generate no questions or poor questions with grammatical errors. It is because the question generation is a pipeline process and any error occurs at one stage will influence the following stages. Citations sentence containing grammatical errors could influence the parser to correctly extract predicate verb for sentence classification and transformation. This could cause to misclassify the sentence or incorrectly transform the sentence into a question.

As expected, the top ranked questions generated by the system outperformed the generic questions and are as useful as human generated one if excluding some questions with surface errors. One reason the system generated questions are as good as the human-generated ones is because the system questions are specific and addressed critical thinking aspects.

Explanation and Association questions are mostly common used by human. Particularly, Explanation questions are more useful than other questions types because they normally trigger deep reflection and invoke critical thinking. This would inspire us to design an effective question template for the AQG system. However, it has been found that association questions are also frequently used by human supervisors and peers. These questions are still valuable to help students to understand key concepts described in the document. Our future work will focus on generating association questions from the key concepts by using information extraction techniques.

However, the question ranking model may not be applied to other question generation approaches since some of the defined features are only related to citations. To generalize this question ranking model, more generic question generation approaches and fine-grained generic features are needed. Despite these shortcomings, we believe that this AQG approach is effective and the evaluation meaningful because real academic writings were used.

 

Read the Full paper