LingTera, 9 (2), 2022, 1 – 12

Corpus linguistics allows researchers to discover the nature of language use through lexical bundles throughout genres, registers, and language varieties. With the fast-changing development of the internet, the language of websites has become one fascinating variety to investigate. In the contexts of German language studies, Goethe-Institut is a worldwide German cultural institution dedicated to teaching German and propagating German culture, with its website becoming a well-known source for German-related studies. Under these considerations, this research is interested in analyzing the German language patterns in Goethe-Institut website by examining their frequency and structure of lexical bundles. Using a mixed-method approach, the corpus was found to be dominated by lexical bundles within the ranges of three-and four-bundles, with the least quantity of lexical bundles in the range of five. The majority of the four-word lexical bundles on this site fell into the categories of noun, preposition, or verb groups. Meanwhile, the adverb, conjunction, and adjective groups were the fewest to appear in the four-word lexical bundles. The language in the Goethe-Institut was shown to contain semi-formal expressions according to the frequency of use of the prepositions, nouns, verbs, and the active sentence expressions. The utilization of standardized German language and basic vocabulary indicates that this website is designed to be accessible for everyone, including German language learners. The language usage also demonstrates that the Goethe-Institut is especially a user-oriented website with expressions that evoke a ‘sense of belonging’.


INTRODUCTION
Technological developments in the late 20 th century had accelerated further with the development of the internet in the 1970s which resulted in cultural changes that shaped today's world.Quoted from Jimenez-Crespo (2021, p. 375), technological developments have a significant impact on modern cultures which is marked by the development of computer technology in the 1980s, the invention of the World Wide Web (WWW) in the 1990s, and the social networking revolution in the early 2000s.Digital technology enables new types of text such as websites.The web is a medium written in HTML (Hyper Text Markup Language), which can display various types of documents such as text, audio, video, images, animation, and multimedia data.Websites are easily accessible via the internet on devices such as mobile phone, laptop, and others.Websites contain large amounts of data, such as texts, pictures, videos, and so on.The verbal component itself has its own characteristics.These characteristics can form language patterns from underlying factors such as cultures, registers, and genres.Websites have various genres with their respective functions such as portal, academic, business, blog, informational website, and so on (Shelly, 2010, p. 67-70).
The language of a website can be an interesting object of research.Large amount of textual data on a website can be investigated in terms of its structures, linguistic forms, and functions by employing corpus study.Corpus linguistics can be defined by its parts, corpus and linguistics.The term 'corpus' means data.Das Korpus in German can be defined as eine Sammlung schriftlicher oder gesprochener Äuβerungen or 'a collection of written and spoken explanations' (Duden, 2022).Generally, the data in a corpus have gone through digitization process so that it can be read by computers and software.Corpus linguistics can be defined as natural language utterances, their elements and structures, and a theory building based on the analysis of authentic texts, which are summarized in the corpus (Lemnitzer & Zinsmeister, 2015).Similar to this understanding, Cheng (2011) defines corpus linguistics as a study in the field of linguistics that aims to construct the use of language for specific purposes which can produce a theory about a language based on the actual use of it.To conclude, corpus linguistics focused on constructing language uses in a discourse using digitalization process.
The phenomenon of language use in various contexts can be described using corpus approach as an effort to conduct scientific studies.Linguistic studies that focus on various micro-and macrofields such as semantics, pragmatics, syntax, discourse analysis, and so on can use corpus approach to get a broader view in processing language data.Corpus linguistics can be used to find out how a language is used in a particular situation and find patterns in it.Studies of existing lexical collections involve corpus linguistic approach that deals with large amounts of electronic data and computer tools to investigate the consistent use of language in research articles.
To find language patterns in a discourse, researchers can examine the use of multi-word sequences formed on certain criteria.In contrast to the traditional perspective which regards word learning as the basic building block in language learning and processing, there is empirical evidence in recent study which shows that multi-word sequences can be considered an integral building block in a language (Yuliawati et al., 2021).Multi-word sequences can show the characteristics of the language in a discourse by looking at their frequency because they show the amount to which a sequence of words is saved and utilized as a prefabricated chunk (Biber et al., 2004, p. 37, in Fuster-Márquez, 2017).Several terms that have a similar concept to multi-word sequences are lexicalized sentence stems (Pawley & Syder, 1983), lexical bundles (Scott, 2001), phraseological sequences, phrasing, chunks, lexical bundles (Biber et al., 1999, p. 989) and N-grams (Stubbs, 2005).
Formulaic language is a multi-word sequence found in a particular discourse with a certain frequency that has meaning and function while lexical bundles are multi-word units that occur often in a register and are defined as bundles of words that indicate a statistical probability to co-occur (Biber et al., 1999, p. 989;Biber et al., 2004).Generally, studies on the use of lexical bundles are carried out in English language discourses.However, in recent years, research has been conducted on lexical bundles in other languages such as Bahasa Indonesia (Febriesti, 2022;Budiwiyanto & Suhardijanto, 2020;Yuliawati et al., 2021;Haq, 2021); Korean (Kim, 2009); Spanish ( Butler, 1998;Cortes, 2007;Tracy-Ventura et al., 2007); and Germany (Krekeler, 2020).
Pattern analysis of lexical bundles usage is generally carried out to determine writer's style in the field of academic writing or academic research because it can be used to determine the pattern of language use in teaching, those produced by students, teachers, or even researchers.Examples of studies in lexical bundles analysis in research of academic writing can be found in articles by Yuliawati et al. (2022) or in classroom discourse like Biber et al. (2004).Even so, researchers can analyze patterns of language use in every discourse including regulations (i.e., law) and legal letters (Putri & Suhardijanto, 2022); hotel websites (Fuster-Márquez, 2014); European Union legal text and website (Biel et al., 2019); and academic institution websites (Bernardini, et al., 2010).The use of websites as corpus has become frequent practice in the process of data collection for corpus linguistic studies.This approach is called 'web-as-corpus' or the creation and utilization of corpus that can be used for a variety of research goals, including terminology extraction (Castagnoli, 2006) and register analysis (Biber & Kurjian, 2007).The dynamic nature of a website and simplicity of its customization according to the website users' needs can be used as a research object for language patterns.
Until recently, no research has been conducted that focus on lexical bundles of German websites especially concerning academic websites.Therefore, the current research seeks to address this gap by analyzing the usage frequencies and structural forms of lexical bundles in one of the academic websites of Germany in order to identify underlying similarities and variations in the frequency and structures of the multi-word phrases used in the discourse.The German version of the Goethe-Institut website was used as the research object in this study.The Goethe-Institut is one of Germany cultural institutions which utilizes a website as their informational resource.
The Goethe-Institut is a non-profit multinational organization and German language educational institution with representations in 98 countries.To expand its impact, Goethe-Institut has localized its website into numerous languages, notably French, Turkish, Korean, and Bahasa Indonesia to achieve higher accessibility.As a cultural institution, Goethe-Institut focuses on teaching German language across the world by providing courses, certifications, and so on.Goethe-Institut's website is suitable to be used as a corpus because Goethe-Institut is one of the German institutions that has strong influence and is popular around the world, including in Indonesia.This website can reflect how German language is used by website owners to communicate and inform their visitors who mainly are German language learners.The researcher examined how lexical bundles are used in this website based on its frequency and morphosyntactic forms according to a theory from Brommer (2018).

RESEARCH METHOD
This research utilized a mixed-method approach which began with statistical data analysis of the corpus followed by qualitative interpretation.According to Greene et al. (1989, as cited in Creswell, 2011), mixed-method research consists of at least one quantitative component (in this case word collection) and one qualitative component (in this case data interpretation).The approach is used to examine statistical data from the corpus, which are then qualitatively interpreted (Kwary & Arum, 2011, p. 213;Yuliawati, 2014, p. 112).As a result, statistical findings can help researchers to analyze the data.This research utilized corpus-based approach (inductive) to find out the research objectives based on the former theories.The data were collected through the process of corpus building using Sketch Engine on September 26-27, 2022.Besides its function to analyze corpus linguistic data, Sketch Engine can also be utilized as a corpus maker by deploying WebBootCaT (a web-service version of the BootCaT tools).Therefore, researchers can create a corpus and analyze their data in a single software.After being collected, there are 780,435 tokens and 659,371 words in the corpus of German language version of the Goethe-Institut.
The N-gram feature of Sketch Engine was used to seek lexical bundles in the ranges of 3, 4, and 6 words.The researcher determined a minimum frequency to avoid lexical bundles that are written and potentially involve idiosyncrasies of a particular writer(s).For written corpora, the minimum frequency to consider can range from 10 to 40 per million words (Biber et al., 2004;Hyland, 2008).Nevertheless, since Sketch Engine could only extract data with round frequencies, the researcher rounded the normalized frequencies whether they're up or down, as the one conducted by Chen and Baker (2010).The researcher compared lexical bundles in the range of 3 and 4 words after the data had been successfully extracted by Sketch Engine.To analyze the structure, the researcher put emphasize on four-word lexical bundles and took samples by selecting the top 50 sets of data from the lexical bundles with the highest frequency.The structure of lexical bundles in the Goethe-Institut website corpus is examined using a theory by Brommer (2018) who divided morphosyntactic forms of pattern into seven groups, namely nominalgruppe (NGr), verbalgruppe (VGr), adjectivgruppe (AdjGr), partizipgruppe (PtGr), adverbialgruppe (AdvGr), konjunktionalgruppe (KonGr), and prepositional group (PrGr).Then, the researcher evaluated the data qualitatively to discover the patterns of the lexical bundles.

FINDINGS AND DISCUSSION
In this study, the lexical bundles were examined based on its frequency as this study followed a method by Biber et al. (1999) to examine the lexical bundles which is solely based on frequency.As a research object, the Goethe-Institut website corpus contains 916 documents, 780,435 tokens, and 659,371 words.This research employed frequency limit of occurrence with a normalized frequency to identify lexical clusters in the corpus.Since corpus linguistic research is conservative, it is necessary to calculate the frequency of occurrence of lexical bundles ranging from 10, 20 to 40 times per million words in five distinct texts.The calculation is illustrated below: 20/1,000,000 x 780,435= 15.6087 Since Sketch Engine can only extract data with integer frequencies, the normalized frequency constant is set to the closest number by rounding, in this case, sixteen ( 16).The roundedup frequency is known as the normalized frequency (Chen & Baker, 2010).The total number of lexical bundles in the corpus with the minimum normalization frequency can be seen in Table 1.With the majority of data ( 516), three-words lexical bundles are the most frequent to occur.The amount of three-words lexical bundles differ significantly, approximately four times than that of the four-words lexical bundles which only consists of 120 data.Then, there are the five-and sixwords lexical bundles which have modest percentage difference of 2.37%.According to prior study by Putri and Suhardijanto (2022), the fewer the words that make up a lexical bundle, the higher their level of usage are in a corpus.To discover about lexical bundles further in the corpus, see the table in Appendix 2 which displays the top 50 lexical bundles in three-and four-words ranges.
Appendix 2 table shows that three-words lexical bundles are mainly composed of three words strings such as "Kinder und Jugendlich" (Child and Teens), "auch in der…" (also in the…), and "Leben in Deutschland" (live in Germany).These findings are similar with a study of lexical bundles in academic English and Bahasa Indonesia by Yuliawati et al. (2022).When the 50 data with the highest frequency from each word range are compared, it can be seen that the most frequently used words in the three-and four-words lexical bundles are 'Deutsch', 'Fremdsprache', and 'Welt'.Overlapping lexical bundles such as "für Deutsch als Fremdsprache" (for German as foreign language), "und Deutsch als Fremdsprache" (and German as foreign language), and "Bereich Deutsch als Fremdsprache" (Area of German as foreign language) also occurs several times in the corpus.These findings indicate that the Goethe-Institut are focused on the topic of German as foreign language.
The lexical bundles in Goethe-Institut website can also be examined by its structure.The morphosyntactic form of patterns from Brommer (2018) was originally adopted from the theory of syntactic categories by Eisenberg (2014).This theory can be found in the book Sprachliche Muster.There are some adjustments made by Brommer.In particular, it only takes complex constituent categories such as Nominalgruppe (NGr) , Verbalgruppe (VGr), Präpositionalgruppe (PrGr), and does not include word class or part-of-speech categories.Brommer divided morphosyntactic or structural form of patterns into seven types namely Nominalgruppe (NGr) or 'Noun group' Verbalgruppe (VGr) or 'verbal group', Adjektivgruppe (AdjGr) or 'adjective group', Partizipgruppe (PtGr) or 'participle group', Adverbgruppe (AdvGr) or 'adverb group', Konjunktionalgruppe (KonGr) or 'conjunctional group', and Präpositionalgruppe (PrGr) or 'prepositional group'.When compared to lexical bundles with a span of three and five words, the structure of four-word word clusters can provide a more precise and efficient interpretation of structure and function for analysis (Hyland, 2008).Therefore, the analysis of lexical bundles structure in the Goethe-Institut corpus only focused on the four-words lexical bundles.
Based on table in Appendix 2 that contains lexical bundles and their structural forms, it can be seen that the frequency of occurrences of lexical bundles from Nominagruppen (38%) and Präpositionalgruppen (30%) were the most commonly found categories in the corpus.Meanwhile, the conjunction, adverb, and adjective group are the structure form with the least usage.In addition, the researcher did not find any lexical bundles of the Partizipgruppe structure.To understand the structure of lexical bundles in the corpus of the Goethe-Institut website, the researcher interpreted the structure of some lexical bundles data in each structural form.
Most of the lexical bundles that fall into the Präpositionalgruppen category begin with the preposition für, von, in, auf, and aus.Generally, the preposition is followed by a noun or adverb.The lexical bundles "für Deutsch als Fremdsprache" (for German as a foreign language) is the fourwords lexical bundle with the highest frequency in the corpus.This bundle begins with the preposition für (for) and continues with the noun Deutsch, conjunctions als (as) and Fremdsprache (foreign language).The prepositions themselves can be defined as additional words belonging to a word class called Funktionswörter (task words) (Helbig & Buscha, 1996, p. 401, in Khairani, 2013).Therefore, the structure of this lexical bundle is included in Präpositionalgruppen because the prepositions act as the core of the bundle.This can be seen in the sample text which shows that the bundle "für Deutsch als Fremdsprache" is generally located in the middle of the sentence to explain the previous statement.
In the lexical bundle "im Bereich Deutsch als" (in the field of German as), there is a preposition im which is a combination of the preposition in with the noun der Bereich and affected with the dative case.The word im serves to explain the previous statement which includes dealing with the 'German field as a thing'.In the taken sample text, the entire lexical bundles ends with the word fremdsprache or foreign language.So, 'German field' in this bundle can be interpreted as the German field as a foreign language.Therefore, this data can be categorized in a prepositional group that is followed by a noun.
Furthermore, the Nominalgruppen category in the corpus mostly corresponds to the institution name Goethe-Institut, Deutschland (German), sprache (language), and Language (language) in the word cluster "der Zentrale des Goethe-Instituts".It means "the main center/office of the Goethe-Institut" which came from the full text "Bei allen anderen Fragen steht Ihnen der Fachbereich Bildende Kunst in der Zentrale des Goethe-Instituts gerne zur Verfügung" which means "If you have any other questions, the Department of Fine Arts at the Goethe-Institut headquarters will be happy to help.".It can be seen that the bundle can be followed either before or after the preposition in.The bundle is used to refer to an entity, namely the Goethe-Institut.Another Nominalgruppen category, namely "Mein Weg nach Deutschland" (My Way to Germany) is one of the names for the rubric or section on the website that discusses information about life in Germany.As such, this bundle has a fairly high frequency.The structure of the lexical bundle "Deutschland Mein Weg nach" consists of the pronoun mein, noun Weg, adverbial nach, and noun Deutschland.
Conjunction group or Konjuktionsgruppe can be found in lexical bundle "und kann jederzeit widerrufen" (and can be revoked at any time) as shown in the sample concordance "Die Anmeldung ist freiwillig und kann jederzeit widerrufen warden" (Registration is voluntary and can be revoked at any time).This lexical bundle can be included in the 'boilerplate' or standard text strings that are often found on websites.An example of boilerplate text is a legal document on a website that contains privacy settings for the website visitor.The conjunction 'und' (and) is followed by modal verb 'kann' (can), jederzeit (any time), and verb widerrufen (revoke).Konjuktionsgruppe can also be found in lexical bundle "dass meine E-Mail-Adresse für" (that my email address for).The conjunction 'und' falls into the conjunction type as connectors, namely koordinierte konjugationen.
Similarly, the following lexical bundle is also a boilerplate text, i.e., in "Mit meiner Anmeldung stimme ich zu, dass meine E-Mail-Adresse für die Zusendung des abonnierten E-Mail-Newsletter verwendet wird" (with my registration, I agree that my e-mail address will be used to send the subscribed email newsletter).The conjunction dass usually connects subordinate clauses with main clauses in which verbs such as behaupten (assert), bestätigen (confirm), denken (thinking), glauben (believe), and hoffen (hope) (Duden, 2022).In contrast with the previous lexical bundle in Konjuktionsgruppe, the conjunction dass is subordinierde konjunktionen or conjunctions as subordinate.Along with this theory, this lexical bundle is also used to connect between subordinate clause as in "meine E-Mail-Adresse für…" and main clause "Mit meiner Anmeldung stimme ich…" Verbengruppe in the top 50 lexical bundles from Goethe-Institut website mostly begin with verbs and followed by pronouns.In the lexical bundle "Haben Sie gut aufgepasst" (have you pay attention) and "können Sie anschließend selbst" (then you can do it yourself), the writer used modal verb followed by pronoun Sie (salutation to one or more person in formal form).The difference between these two lexical bundles is that the first lexical bundle have a structure of verb + pronoun + adjective + verb in perfekt form, while the second lexical bundle is formed with modal verb + pronoun + adverb + adverb.
Another form of Verbengruppe can be found in the lexical bundles "kann jederzeit widerrufen warden" (can be revoked at any time) and "ist freiwillig und kann" (is optional and can).The lexical bundle "kann jederzeit widerrufen warden" has a structure of modal verb + adverb + verb + verb.Different from the other data, the lexical bundle "ist freiwillig und kann" consists of verb + adjective + conjunction + modal verb.This data positioned its modal verb in the end of the bundle.
At last, the least used structural forms of lexical bundles, namely the adjective group can be seen in the lexical bundle "freiwillig und kann jederzeit" (voluntarily and can at any time).This lexical bundle is also a boilerplate text used in the login page of the Goethe-Institut website.The full sentence containing this lexical bundle is "Die Anmeldung ist freiwillig und kann widerrufen warden" (Registration is voluntary and can be revoked at any time).The structure of this lexical bundles is adjective + conjunction+ modal verb+ adverb.
In the Goethe-Institut website, the researcher rarely found passive expressions.Constructions of passive structure with temporal tenses perfekt (perfect), plusquamperfekt (past continuous), and futur (future) such as "wird davon ausgegangen das" (it is assumed that), and "wie in bild dargestellt" (as shown in picture) were not found in this corpus.The verbs used by the writer(s) were mainly the vollverben, which can form a predicate on its own and be conjugated along with the subject or pronouns in the sentence.There were only ordinary prepositions that function as positional markers or connectors, such as auf, uber, in, and für.There were no other types of prepositions such as postpositionen (halber, hinaus), zirkumpositionen (um… herum), and ambipositionen (wegen, nach).The use of prepositions is very often found in groups of words other than prepositional groups, such as noun groups and conjugation groups.Lastly, the pronoun meine (mine) was utilized in several lexical bundles, it was included in the possessive pronoun, which indicates possession.The website aims to give its users 'a sense of belonging' and to encourage or make them comfortable by being a member of the Goethe-Institut.
There are several differences and similarities compared to the previous research from Krakeler (2018) who investigated lexical bundles in dissertations of machine engineering majors in Germany.In the study, he combined the types of noun groups with preposition group into a new type namely Nominal-und Präpositionalgruppen.These two types of bundles are combined because in German language prepositions are frequently used with the noun phrases.As a result, nouns can be found in prepositional groups in both the middle and final positions, and vice versa.In contrast to the previous research, the researcher in this study decided not to merge the two groups to learn about the features and structure of the noun and prepositional groups.Similar to the current study, as the percentage shown in Table 3, the Nominalgruppen and Präpositionalgruppen are the most common structure type in Goethe-Institut website.
This phenomenon can be explained by writer's tendency to use certain lexical bundles as connectors between one explanation and another or as an explanation for the previous statement.If the lexical bundles are positioned in front of the lexical bundles, they are usually used to clarify the previous sentences.As shown from the concordance lines "Begriffe und Redewendungen parat haben und ins Langzeitgedächtnis abspeichern können.Haben Sie gut aufgepasst?Hier finden Sie ein abwechslungsreiches Arbeitsblatt zu unserem Podcast."(Have you paid close attention?Here is a varied worksheet for our podcast.)which have terms and phrases at hand and be able to save them in long-term memory.The lexical bundle "hier finden sie ein" (here you find a) still connect to the previous explanation about self -language learning instruction using podcast from Goethe-Institut.
This function can also be found in lexical bundles with the highest occurrences in the machine engineering corpus, namely "in bezug auf die" (in relation to the).This lexical bundle is also classified as a Präpositionalgruppe and can be used to refer to the previous statement.Another example of lexical bundle which have similar functions is "für Deutsch als Fremdsprache" (for German as foreign language).
In the mechanical engineering dissertation corpus, the use of impersonal expressions is generally accompanied by the use of technical language.In German, it is called Fachsprache or the language used in a particular field.They also use more passive sentences, which makes their writing more objective and formal.Meanwhile, the Goethe-Institut website usually uses a semi-formal expression with standardized German.The use of casual language can be seen in the tendency to use active expressions and pronouns such as meine which indicates the subject of the sentence.Since the vocabulary used in the top 50 lexical bundles is based on standard German, it can be argued that it is easy for German language learners to understand.The majority of the words in the top 50 lexical bundles are related to education and are found on pages that provide information and news about language learning.Reflecting the organization's work as a cultural and language institution, the language in the Goethe-Institut website is designed to be user-centered.German language learner or anybody who wants to know more about Germany and has basic German proficiency will most likely be able to successfully understand and explore this website while also learning more German language through its contents.

CONCLUSION
Research on lexical bundles can help the development of language studies by revealing patterns of language used in the community.Through this research which explored the Goethe Institut website, it can be seen that this cultural institution (and also the center for German language learning or Deutsch Als Fremdsprache) reflects the use of standard German that can be understood by the users.Several lexical bundles that have frequent occurrences are used in similar discourse.Therefore, the use of this repeated bundles can make it easier for users (learners) to familiarize themselves with the language.The frequency of lexical bundles in the corpus is dominated by three-and four-words lexical bundles.Regarding the structure of four-words lexical bundles, the top 50 lexical bundles in the corpus were mostly divided into noun groups, prepositional groups, and verb groups and are mainly functioned to clarify previous statement.
The standard German language and the vocabulary usage in this website can be potentially understood by everyone, especially German language learners who want to know more about Goethe-Institut.When compared to prior studies of different research object, this study shows a significant difference particularly in the expressions employed, such as passive-active forms, impersonal and semi-formal expressions, and the usage of technical languages.As a contribution to corpus linguistics studies, this research can be utilized as a recommendation for website content writers, particularly for German language-based websites, specifically in academic genres such as the Goethe-Institut.This study could be used by translators of websites from German to Bahasa Indonesia to design user-focused language styles, as is done by the Goethe-Institut.Furthermore, while this research is limited to sampled data consisting of 50 lexical bundles, it is very likely to be continued and further developed.