Augmented reality in the educational setting: content and language integrated learning of English and Science

Kristina Cergol, Valentina Gučec

University of Zagreb, Faculty of Teacher Education

Foreign languages education and research

Number of the paper: 74

Original scientific paper


Augmented reality (AR) is a type of immersive technology by means of which digital items are added to a live view of the physical world to yield improved user experience. AR has found its place in education as various virtual elements can be generated by means of ready-made and custom-programmed applications for widely used devices (e.g. android mobile phones). Virtual elements are then integrated with the real-world environment creating a close-to-authentic learning situation that is in line with the situated learning framework. One precondition for this is access to visual images that encompass a number of attributes required for AR presentation. In this paper we focus on the five most prominent visual attributes: richness in detail, good contrast, absence of repetitive patterns, inclusion of letters, no gloss paper. Concentrating on these five attributes we use the content analysis methodology to investigate the possibility of using AR with the existing visual images in the English and Science textbooks in Croatian primary schools, grades 1-4 and find that the existing visuals already possess a high level of integration of the necessary attributes. Further, we propose a list of content and language integrated learning (CLIL) topics, integrating English and Science in the teaching and learning process in grades 1-4, and analyse the selected CLIL topics in terms of the presentability of the related existing textbook images by means of AR. Suggestions for the improvement of textbook images so that they may be presented using AR are given. Finally, some guidelines for the creation of the CLIL AR textbooks are provided as a suggestion for further endeavours in the field.

Key words

AR attributes; CLIL topics; CLIL AR textbooks; primary school; situated learning framework


This study tackles the possibilities of using cutting-edge technologies on the basis of the existing teaching materials. The theoretical basis used as the framework for this paper is twofold. We first draw on the postulates of the situated learning theory as the overarching theoretical framework. Moreover, we look at the possibilities of the usage of augmented reality (AR), as the now widely accessible technology, combined with the textbook as the primary English as a foreign language (EFL) teaching material. For the mentioned purposes we analyse the pictures contained in the selected textbooks to establish if they meet the criteria necessary for presentation by means of the AR technology. Even though the textbook is not necessarily the teaching material that has been designed for situated learning, we believe that the path we have chosen presents the first step towards a more inclusive usage of AR as a cognitively more easily accessible means of presenting the teaching material. In line with this, AR presentations can be used to envisage and create situations eligible for situated learning, taking into consideration the benefits of the physical or virtual context and based on the task-based approach to foreign language learning. Finally, to create a setting for our study, we investigate the augmented reality technology usage for EFL in the context of the recommended content and language integrated learning (CLIL) approach, merging the topics from the English language and Science subjects and focusing on early English language learning (grades 1-4; ages 6-10).

Situated learning   

The theory of situated learning focuses on the “situated character of human understanding and communication” (Lave & Wenger, 1991, p. 14), stressing that the best form of learning and instruction is realized in the situational context (or, we add, a simulation thereof) in which the learned materials and skills are to be further utilized, as well as the sociocultural context as a milieu particularly relevant for foreign language learning. We stress simulation as a link that can be realized by means of modern technologies, such as the augmented reality technologies, to recreate the desired real-life context in the situation of formal instruction. In this paper we focus on English language learning integrated with Science in primary education. Within the situated learning theory, the physical context presents itself as the decisive factor in learning English as a foreign language in the institutional setting as research has shown that the presence of physical context induces foreign language learning (Cergol, 2007). One approach that is in line with the situated learning theory in foreign language learning is the task-based approach. Tasks are defined as “classroom activities in which learners use language ‘pragmatically’, that is, ‘to do things’ with the overriding aim of learning language” (Bygate, 2015, p. 1). Along those lines task-based language teaching is defined as the approach to second language acquisition and learning in which tasks are viewed as critical activities yielding language acquisition and learning (Ellis, 2003). It is believed that this approach induces incidental language acquisition as the students focus on the task they are required to perform while they use the target language as a tool rather than the target itself (Ellis, Skehan, Shaofeng, Natsuko, & Lambert, 2020). Task-based activities realized by means of AR technology in which virtual elements are superposed onto the real-life context may be the future of technologically enhanced classroom learning.

Augmented reality in education

At this day and age situated learning can be enhanced by means of augmented reality technology. Augmented reality (AR) is a type of immersive technology by means of which digital items are added to a live view of the physical world and can be attached to real locations and objects. Immersive are such technologies that intertwine the elements of digitally simulated and physical worlds to yield improved user experience. Virtual reality (VR) is an immersive technology that refers to a digitally-generated setting into which a user is fully immersed in such a way that they feel no connection to the real world. Together with mixed reality, which is a combination of AR and VR, these technologies fall under the umbrella concept of extended reality (ER). Milgram, Takemura, Utsumi & Kishino (1994) propose a reality-virtuality continuum to provide a framework for the mixed reality concept. Mixed reality, being a mixture of augmented and virtual realities can fall onto any one point on the mentioned continuum, depending on whether they belong more to the virtual environment thus being classified as virtual reality, or to the real environment, a feature that classifies it as augmented reality. Augmented reality technologies are naturally connected to situated learning as they allow for the positioning of virtual objects in real space. Moreover, they provide learners with the possibility of 3D visualisation by means of user-friendly technology supported by widely accessible devices (e.g. the students’ mobile phones). In fact, it is now becoming more common for schools to allow for the “bring your own device” days when students can bring a personal device, such as a smart phone, to use in class, for the purpose of classroom activities (e.g. Annetta, Peters Burton, Frazier, Cheng, & Chmiel, 2012).  
Multiple benefits of AR for educational purposes have been reported. Parmaxi and Demetriou (2020) provide a systematic overview of studies on augmented reality and language learning in the period between 2014 and 2019 and conclude that the usage of AR in education can improve learning in terms of educational outcomes, have an effect on the affective and collaborative elements of the learning process and improved focus, and be used to support students with special needs (for an overview see Parmaxi and Demetriou, 2020, p. 1). In terms of the learning process, students can get a 3D view of the object from different angles. An example is the view of an object (e.g. a bird) denoted by a linguistic sign in foreign language vocabulary acquisition activities. Virtual simulations can be carried out using AR methodology. Finally, AR methodology allows for the superposition of an additional layer of information on top of the information presented in traditional textbooks (Casteleiro-Pitrez, 2021). In terms of the benefits AR books offer, Gopalan et al (2016) stress the potential benefits of AR books for enhancing students’ interest in learning about science by means of attractive material presentation, especially in their understanding of the complex scientific processes and phenomena that are difficult to visualize and present in the classroom. In terms of early EFL Mahadzir and Phung (2013) find that pop-up AR books provide users with a feeling of satisfaction and enhance their attention and confidence. This is why AR should be considered as an improvement of a traditional textbook. 

EFL lexis presented by means of AR in textbooks: material requirements

Clearly, EFL textbooks enriched with AR presentations can be viewed as a welcome and fresh means of presenting linguistic material in the study process. An obvious usage is the presentation of new lexis. In De Saussure’s terms, the AR presentation allows for the creation of connections between the signifier in a foreign language and the signified as a concept as well as the virtually presented real-world referent, due to its 3D nature of presentation. This presents an enrichment in the students’ experience that is normally achieved by means of 2D visual images in the textbook. In a nutshell, a user has a device that supports AR presentations, e.g. a smart phone. The camera is directed towards a visual item in the textbook (e.g. a picture of a bird). The 3D image of a bird then “rises out of” the textbook picture in its 3D form.
In order for the AR technology to be utilized the visual material needs to be adequately prepared. This involves creating applications for (mobile) devices targeted at the relevant lexis. There are various tools that can be used to develop AR applications for mobile devices, such as Vuforia, Wikitude, ARKit, ARCore, ARTolKit to mention just a few commonly used ones. The Vuforia Software Development Kit is a free tool that is commonly used for the development of AR applications. Moreover, Vuforia provides a useful list of attributes that need to be met to allow for the AR presentation of the chosen visual image (VuforiaTM Developer Library1). Vuforia has already been used by Croatian programmers for the purposes of designing an English language learning AR application (Balja, 2019). Drawing on Balja’s propositions and experience, we focus on the attributes of visual images deemed necessary by Vuforia for the successful design of AR applications for EFL.  
They are related to the visual attributes of the image targets that need to be met, at least to a certain degree. The first one is richness in detail. Examples provided by Vuforia Developer LibraryTM are a street scene, a picture of a group of people, collages and mixtures of items. The second visual attribute is good contrast, which is exemplified by images that have bright and dark regions. Also, Vuforia recommends well-lit regions. Third is the absence of repetitive patterns which refers to avoidance of symmetry, repeated patterns and featureless areas (VuforiaTM Developer Library1). Additionally, experts recommend inclusion of letters into the image (Lea Skorin-Kapov, personal communication, February 10, 2022) which is the fourth attribute we shall use in this paper, and the possibility of the inclusion of QR codes (ibid.)  that we shall not focus on. Matte surface but no gloss paper should be used (VuforiaTM Developer Library2) and this is the fifth attribute we shall look into in the present paper. We shall investigate whether the visual images contained in the textbooks for English and Science comprise the mentioned five attributes to establish if they can be used for representation by means of AR technology. We have chosen English and Science as we find that they encompass many overlapping topics available for use within the content and language integrated educational approach. This approach is known to be beneficial for pupils’ development of the English language skills as well as their knowledge of other subject areas.

Content and language integrated learning

As an educational approach, content and language integrated learning (CLIL) has a twofold and equally represented focus as it is directed to achieving both specific subject (e.g. Science) and foreign language (e.g. English) learning outcomes. As an umbrella term CLIL was first adopted by the European Network of Administrators, Researchers and Practitioners in the mid-1990s (Coyle, 2007). CLIL is carried out through thirty core features that can be summarized into six areas; namely multiple focus of pupils on various themes, providing safe and enriching learning environment, providing authenticity in terms of the learning materials and learning experience, enabling active learning where pupils are in the focus of the learning process, scaffolding where various skills and learning styles are being used and developed in search for knowledge and co-operation between the partakers in the process of education, such as teachers, parents, local authorities, etc. (Mehisto et al., 2008, pp. 29-30). The authors move on to point out that CLIL practice is based in the following four principles: cognition, community, content and communication (ibid.: 31). The authors of this paper feel that a combination of subjects taught via the CLIL principles and presented by means of AR applied to the existing visuals in the current textbooks may yield enhanced learning experience.

Aim and research questions

The aim of the study was to investigate the possibility of using AR applications with the existing visuals used in the textbooks for English and Science approved for use in Croatian primary schools, grades 1-4 and that are eligible for CLIL approach. To achieve this aim, we focused on the following research questions:
RQ1: Which common English and Science topics are to be recommended for content and language integrated learning (CLIL) topics?
RQ2: Are the visuals integrated into the textbooks covering English Language and Science, as well as the common CLIL topics in line with the recommended visual attributes of the AR friendly image targets (specifically richness in detail, good contrast, absence of repetitive patterns, inclusion of letters, no gloss paper)? 
RQ3: Are there differences between the English and Science textbooks vs. the chosen CLIL topics in terms of the AR attributes?


Textbook content analysis was used focusing on the visual images comprised in all of the textbooks for English and Science approved for use in Croatian primary schools, grades 1-4 (MZO, 2021). The analysed corpus thus encompassed 28 textbooks approved and currently in use for English and 24 textbooks approved for Science. There are 6 publishers that are represented. A total of 5404 visual images were analysed; 3333 for English and 2071 for Science. The details of the corpus are represented in table 1. Additionally, the overlapping topics between English and Science were highlighted for their CLIL potential. The attributes of the visual images covering the selected CLIL topics were then analysed. 
Only textbooks were analysed as not all the additional materials are available in colour, thus making it difficult to provide a uniform analysis of the visual images comprised in all the teaching materials. Scores were assigned for each visual image on a scale from 1 to 5 where 1 signifies the absence of a certain attribute (0% of the attribute), score 2 marks 25% of the attribute, score 3 50% of the attribute, score 4 75% and 5 absolute provision of the investigated attribute (100% of the attribute) that makes the image readily available for presentation by means of AR technology. 

Table 1: The details of the analysed corpus (English and Science textbooks): numbers of analysed visual images and textbooks per publisher and grade (G 1-4).

Publisher 1

Nr. of images / Nr. of textbooks

Publisher 2

Nr. of images / Nr. of textbooks

Publisher 3

Nr. of images / Nr. of textbooks

Publisher 4

Nr. of images / Nr. of textbooks

Publisher 5

Nr. of images / Nr. of textbooks

Publisher 6

Nr. of images / Nr. of textbooks




































Results and discussion

CLIL topics
The overlapping topics between the English language and Science were chosen for the possibilities of content and language integrated learning. The selection of CLIL topics was created on the bases of the national curricula for the given subjects (MZO, 2019a, 2019b, 2021a, 2021b, 2021c). The list of the selected CLIL topics is given below.
Grade 1: Family/Friends, Holidays, School, Health/Body, Food, Animals
Grade 2: Family, Holidays, School, Health/Body, Food, Animals, Weather
Grade 3: Family, Animals
Grade 4: Living beings, Health/Body
English and Science textbooks differ in the manner they introduce the topics they cover. In English topics are introduced in such a way that they cover a narrower scope of the required lexis, such as vocabulary for family members. The same semantic category is widened in further grades to encompass vocabulary for friends and other people pupils are in contact with. Thus, the same topics are repeated over the years and the previously acquired knowledge is built upon. As the nature of the learning outcomes is different, Science textbooks take a sort of a bootstrapping approach, covering topics that change from the more basic ones such as crossing the road to get to school to the more complex ones dealing with the geographical features of the country’s terrain. Correspondingly topics that are covered in science are later dropped, while English builds upon the ones introduced early on. Thus, the number topics that are interrelated between the two subjects drops as the grades get higher.
Already in the third grade and intensively in the fourth grade one notices that the topics in Science and English branch into subtopics related to geography, history and natural sciences in preparation for the corresponding separated subjects pupils are to take in the fifth grade. This reduces the number of potential CLIL topics as it becomes more difficult to integrate the language-related learning outcomes and subject knowledge, especially such that could be applied to the AR presentation of the textbook visuals. However, additional AR materials can be prepared to be taught as task-based units in which children use AR to solve situational tasks. In fact, this could present welcome additional AR material and such that stimulates situated learning (Wen & Looi, 2019). 
Cindrić and Hanžić Deda (in review) show that pre-service teachers studying to become both generalist teachers and teachers of English believe that Science has a good potential to be integrated with English, in fact Science has been estimated as the most susceptible course, alongside Music, to be presented in terms of CLIL. The further the students progress in their studies the more competent they feel to use the propositions of CLIL in the classroom. They further express that the selection of topics that can be taught using CLIL may present a problem, which is why in this paper we try to provide some possible answers to this query. 

AR attributes of the analysed visual images: English language, Science and CLIL 
The results show that the visuals in the analysed English language and Science textbooks approved for use in Croatian primary schools are moderately appropriated for usage with AR technology. Some attributes such as the absence of repetitive patterns, avoidance of gloss paper and good contrast between the elements comprising the images are more present than others such as richness in detail, while some are almost entirely absent as is inclusion of letters into the images. The representation of the attributes is rather steady across grades making it possible to generalize across the textbook materials. 
The best represented attribute was found to be the absence of repetitive patterns one (M=4.81 for English and Science). This is important as repetitive patterns deter the detection of elements in the visual image disabling its augmented reality representation as the application cannot “read” the image in order to enable a projection thereof (VuforiaTM Developer Library3). Avoidance of gloss paper is the second best represented feature (M=4.33). Science textbooks (M=4.59) score somewhat better than the English ones (M=4.07) on the no gloss paper feature, however both subjects use the material that is easy to read and use by means of AR technology. One should add that the flexibility of the printed paper could present a problem as the paper is easily bent. Therefore, one needs to make sure that paper surface is straight. Books are found to be a good AR target in the mentioned terms (VuforiaTM Developer Library2).
The good contrast between the elements in the images is somewhat better for English (M=3.92) than for Science (M=3.34) textbooks, the overall result being (M=3.63). In order to improve the contrast feature of the images, some changes may be suggested for better detection and tracking performance by the AR technology. First, in images with a few layers that may have poorer contrast between them, one of the layers (e.g. background) may be changed to a lighter colour so that the foreground stands more clearly and can even be adjusted by lowering its brightness thus making the contrast with the background more intense. To increase the contrast further, where possible, the background can be changed from lighter hue to white. This can be accompanied by increasing contrast along the edges of the darker details on the image and avoiding the slightest potential blur (VuforiaTM Developer Library3).
The images scored somewhat less on the richness in detail attribute both for English (M=3.11) and Science (M=2.91), the overall score being (M=3.02). Visual targets with rich detail are more susceptible to detection by AR technology. This is a feature that may require some adjustment in the current textbook visuals. Finally, as for the physical properties of the materials, most textbooks do not use gloss paper which is known to hinder target reading by the AR technology.
The inclusion of letters attribute is the least represented one (English M=1.81; Science M=1.46 and CLIL M=1.62). However, this is not to be forced upon the textbook creators as this feature can only be added if it serves the purpose of material presentation enhancement. 
Table 2 provides examples of pictures from the analysed corpus that were assigned lowest and highest scores for the investigated attributes on the scale 1-5. 

Table 2: The examples of pictures from the analysed corpus that were assigned lowest and highest scores for the investigated attributes.        



richness in detail

0% attribute provision

100% attribute provision

good contrast between the elements in the images

0% attribute provision

100% attribute provision

absence of repetitive patterns

0% attribute provision

75% attribute provision

inclusion of letters

0%  attribute provision

75% attribute provision

The analyses of the AR attributes of the visual images comprised in the English language textbooks by grade are represented in figures 1 (for English), 2 (for Science) and 3 (for CLIL). The results of the visuals chosen for the CLIL topics is in line with the trend of the English and Science textbook visuals (absence of repetitive patterns (M=4.90), no gloss paper (M=4.4), good contrast (M=3.60), richness in detail (M=2.85), and the inclusion of letters (M=1.62)).

Figure 1: AR attributes of the analysed visual images for English textbooks per grade (G 1-4): richness in detail, good contrast, absence of repetitive patterns, inclusion of letters, no gloss paper, and the overall score averaging the individual attributes’ scores. Scores are provided on a scale from 1 to 5 where 1 signifies absence of a certain attribute and 5 absolute provision of the investigated attribute that makes the image readily available for presentation by means of AR technology

Figure 2: AR attributes of the analysed visual images for Science textbooks per grade.

 Figure 3: AR attributes of the analysed visual images for CLIL in the analysed textbooks per grade.

The number of visual images used in the English language textbooks decreases as the grades become higher (see table 1 for details). The same pattern was observed in the Science textbooks as the topics become more complex. The visuals help pupils learn no matter the age and should be supported by the classroom materials, especially now that the extended reality opportunities offer a new angle of the presentation of the visuals in the teaching content.

CLIL materials and AR technology: textbook creation

A still rather recent area of interest in the content and language integrated learning approach to teaching is the development of CLIL materials (Mehisto, 2012). In the Croatian context the precondition for the creation of such materials would be the selection of topics between the subject areas that best allow for presentation using the CLIL approach and that are covered by the current curricular documents defined by the Ministry of Science and Education of the Republic of Croatia (MZO). This paper brings a proposal of those topics covered by all of the English and Science textbooks that have been approved for usage in the first four grades of primary education in the Republic of Croatia. Moreover, the analysis of the English and Science textbook visuals have mostly been found to allow for the AR presentation of the taught concepts in these subjects and, by extension, in the proposed CLIL topics. The next obvious step would be to set the bases for the creation of AR CLIL textbooks and additional materials which would implement the possibilities provided by the AR technology in such a way as to present the studied materials in their 3D form (for example see Figure 4 ).

Figure 4: An example of the technical implementation of AR in the AR CLIL textbooks.

 We draw on the “tentative checklist for CLIL textbooks” proposed by Lóp