the effect of the introduction of text mining analysis on the … · 2020-01-23 · the effect of...

22
지역산업연구Ⅰ제42권 제4호Ⅰpp. 235 256 The Effect of the Introduction of Text Mining Analysis on the Accounting and Finance Research and Proposals for Graduate School Education ? Lee, Kun Chang * Na, Hyung Jong ** ABSTRACT This paper addresses the impact of text mining techniques on the studies of accounting and finance sectors, and puts forward appropriate teaching methods accordingly. As we all know, traditional methods of teaching and researches in the accounting and financial areas have been limited to only analyzing quantified data samples and henceforth focusing on empirical analyses, ignoring AI methods such as text mining and sentiment analysis techniques. The reasons why AI mtehods must be taken into serious consideration as new topics of education and research are that many kinds of texts existing in audit reports, financial statements, and financial news are flooding decision makers in the accounting and financial fields. Therefore, this paper strongly proposes that text mining techniques, one of AI methods actively used in MIS fields, should be taught and adopted as one of AI methods in these fields. Besides, the text mining techniques need to be implemented into future curriculum of the graduate courses for accounting and financial studies. In this sense, the contribution of this paper is as follows. Firstly, the need to introduce text mining techniques in order to analyze huge amount of unstructured texts available in accounting and financial documents was emphasized. Secondly, the fundamental concepts and procedures related to text mining techniques were described in detail. Thirdly, the old-fashioned curriculum of the current graduate courses in the accounting and financial areas should be changed and updated to represent the latest trends of AI-driven revolution occurring in modern business world. |Keywords| text mining, accounting, finance, graduate education . Introduction This study examines the impact of text mining technology, which is an unstructured text big data analysis method among AI (Artificial Intelligence) technologies, on accounting and financial sector research, and tries to present appropriate graduate school education curriculum. * (First Author) Professor, SKKU Business School, Sungkyunkwan University, Seoul 03063, Republic of Korea, [email protected] ** (Corresponding author) Research Professor, SKKU Business School, Sungkyunkwan University, Seoul 03063, Republic of Korea, [email protected]

Upload: others

Post on 03-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Effect of the Introduction of Text Mining Analysis on the … · 2020-01-23 · The Effect of the Introduction of Text Mining Analysis on the Accounting and Finance Research and

지역산업연구Ⅰ제42권 제4호Ⅰpp. 235~256

The Effect of the Introduction of Text Mining Analysis on the Accounting and Finance Research and Proposals for Graduate School Education

?Lee, Kun Chang*․Na, Hyung Jong **

5

ABSTRACT

This paper addresses the impact of text mining techniques on the studies of accounting and finance sectors, and puts forward appropriate teaching methods accordingly. As we all know, traditional methods of teaching and researches in the accounting and financial areas have been limited to only analyzing quantified data samples and henceforth focusing on empirical analyses, ignoring AI methods such as text mining and sentiment analysis techniques. The reasons why AI mtehods must be taken into serious consideration as new topics of education and research are that many kinds of texts existing in audit reports, financial statements, and financial news are flooding decision makers in the accounting and financial fields. Therefore, this paper strongly proposes that text mining techniques, one of AI methods actively used in MIS fields, should be taught and adopted as one of AI methods in these fields. Besides, the text mining techniques need to be implemented into future curriculum of the graduate courses for accounting and financial studies. In this sense, the contribution of this paper is as follows. Firstly, the need to introduce text mining techniques in order to analyze huge amount of unstructured texts available in accounting and financial documents was emphasized. Secondly, the fundamental concepts and procedures related to text mining techniques were described in detail.

Thirdly, the old-fashioned curriculum of the current graduate courses in the accounting and financial areas should be changed and updated to represent the latest trends of AI-driven revolution occurring in modern business world.

|Keywords| text mining, accounting, finance, graduate education5

Ⅰ. Introduction

This study examines the impact of text mining technology, which is an unstructured text big data

analysis method among AI (Artificial Intelligence) technologies, on accounting and financial sector

research, and tries to present appropriate graduate school education curriculum. 1)

* (First Author) Professor, SKKU Business School, Sungkyunkwan University, Seoul 03063, Republic of Korea, [email protected]

** (Corresponding author) Research Professor, SKKU Business School, Sungkyunkwan University, Seoul 03063,

Republic of Korea, [email protected]

Page 2: The Effect of the Introduction of Text Mining Analysis on the … · 2020-01-23 · The Effect of the Introduction of Text Mining Analysis on the Accounting and Finance Research and

236 지역산업연구|제42권 제4호|2019.11

Researches in the accounting and finance areas are conducted primarily through empirical analysis.

Most of the samples use only quantified data in empirical analyses, and the results are derived primarily

from regression analyses. Of course, there are studies that present improvement measures for taxation in

terms of tax law or present new policies or regulations in the field of audit without empirical analysis,

but this paper discusses the need for text mining techniques and graduate school education curriculum

for those papers that study the accounting and finance sectors through empirical analysis.

Recently, we can collect information stored in various sources of semi-structured and unstructured

data with the support of big data technology. (Pezić et al., 2019). The limitation of the research

methodology in accounting and finance is that only quantified data has been used for empirical analyses.

This threshold also limits the scope of accounting and financial research. For more diverse and new

research, it is necessary to try convergence by introducing research methodologies in different academic

fields. Management Information System (MIS) has long been working on text mining techniques. In the

accounting and finance sectors, however, research is still insufficient to introduce and apply these

artificial intelligence technologies.

Therefore, the field of accounting and financial research needs to study more diverse research topics

by introducing text mining techniques to analyze unstructured text data. The following is a look at how

text mining techniques are available in accounting and financial research areas, and how they can be

used to make progress in accounting and financial research.

Among artificial intelligence technologies, text mining technology supports non-metered data such

as text to be available for empirical analysis research. In other words, unstructured data such as text

data can be quantified and variable through text mining techniques.

A great deal of information about firms cannot be expressed in quantitative terms. The firm's

financial statements represent quantitative information about the corporate's financial condition and

performance, and in accounting and financial sector studies, only quantitative data in financial

statements are used in the empirical analysis. Materials such as business reports and audit reports

express corporate information with qualitative data such as text. These qualitative data also contain a lot

of important information about companies, but so far it has been rarely used for empirical analyses.

If text mining techniques are introduced among artificial intelligence technologies, texts in business

reports or audit reports can be quantified and used for empirical analysis. In other words, research data

available will be expanded by the introduction of new methodologies. This will help pioneer areas that

have not been studied so far.

However, there are real barriers to using text mining techniques, one of the artificial intelligence

technologies, in accounting and financial research. Currently, researchers in accounting and finance area

are not trained in these artificial intelligence technologies, so it is practically impossible to use text

Page 3: The Effect of the Introduction of Text Mining Analysis on the … · 2020-01-23 · The Effect of the Introduction of Text Mining Analysis on the Accounting and Finance Research and

The Effect of the Introduction of Text Mining Analysis on the Accounting and Finance Research and Proposals for Graduate School Education / Lee, Kun Chang*·Na, Hyung Jong * 237

mining techniques in research. The fundamental solution to this problem lies in reforming the

curriculum. In other words, it is judged that education should be made in the graduate school master’s

course or Ph.D. course by adding a curriculum on text mining technology education to the accounting

and financial curriculum.

Professors in accounting and finance in the current business are not really an educated generation of

artificial intelligence technologies such as text mining, thus they cannot teach artificial intelligence

technology to potential researchers such as masters and Ph.D. students. Therefore, in order to break

down this situation, it is necessary to continuously train for master's and doctorate courses by recruiting

outside text mining experts. Then, as time goes by, the percentage of professors majoring in accounting

and finance who have acquired text mining skills will increase.

This study introduces the education curriculum of graduate school for text mining techniques that

can help with accounting and financial research. When accepting innovative new things without fear of

change, the areas of accounting and financial research can move forward one step further. This will

essentially require future accounting and finance researchers to educate them about text mining, one of

the artificial intelligence technologies.

The contributions of this paper are as follows. First, we insist changes in new research flows and

explained the need to introduce text mining techniques for unstructured text data analysis in the study

of the accounting and finance sectors as a way to counter them. Second, the concepts and application

methods of text mining techniques that are substantially helpful for the study of the accounting and

finance sectors are introduced in detail. Third, as a fundamental solution for the use of text mining

technology in the study of the accounting and finance sectors, specific solutions are presented to

education curriculum.

This study consists of the following sequence: Chapter 2 points out the limitations of research in the

accounting and finance sectors and describes the current changing research environment and trends.

Chapter 3 introduces prior literatures utilizing text mining techniques in the fields of accounting and

financial research. Chapter 4 explains the concepts of text mining techniques and describes specific

procedures for application to research in the accounting and finance sectors. And Chapter 5 presents

education measures for artificial intelligence technologies. Finally, in Chapter 6, we conclude.

Ⅱ. Limitations of research methods and changing research flows in accounting and finance

Research in accounting can be largely divided into financial accounting, management accounting,

taxation accounting, and auditing areas. In the field of taxation accounting, it is possible to point out

Page 4: The Effect of the Introduction of Text Mining Analysis on the … · 2020-01-23 · The Effect of the Introduction of Text Mining Analysis on the Accounting and Finance Research and

238 지역산업연구|제42권 제4호|2019.11

the problems of the current taxation system in terms of tax law and suggest improvement measures

without making a empirical analysis. It is also possible to present new policies or regulations in the field

of audit, pointing out problems in the existing audit system. However, most accounting studies use

methodologies to verify hypotheses through empirical analyses.

The research in the financial field studies stock prices or the firm's capital structure, derivatives and

various financial instruments, and capital increases etc. These studies, like in the accounting field, use

data about companies, and also mostly use methodologies to verify hypotheses through empirical

analyses. In this paper, the limitations of these empirical analysis studies are discussed, and

countermeasures are presented.

The limitations of empirical analysis studies in the accounting and financial sectors are as follows:

The scope of the study is limited, because the samples used in the field of empirical analysis in

accounting use only quantified quantitative data. The data used in accounting and finance are primarily

firm-year financial data and are obtained from KIS-Value (https://www.kisvalue.com), TS-2000

(http://www.kocoinfo.com), Fn-Guide (http://www.fnguide.com).

The firm's financial data is obtained from these open sources through downloads and other data is

collected directly by hand-collecting. Both the data collected through downloads and the data collected

through hand-collecting are common, and for statistical analysis only quantified data are used for

research, which in turn limits the scope of the study.

When a company's various information is disclosed externally, it is expressed not only quantitatively

but also qualitatively. The firm discloses the corporate's financial information quantitatively through its

financial statements and qualitatively through its business reports, audit reports or management

diagnosis statements and so on. Up until now, in accounting and finance studies, information expressed

qualitatively by a company (e.g., text data in a business report or text data in an audit report) could not

be used in the empirical analysis. This results in limiting the research topic to be studied in the

accounting and finance sectors. As an alternative to overcoming these limitations in the field of

accounting and financial research, this study proposes the introduction of text mining technology, one

of the artificial intelligence technologies.

Artificial intelligence technology has long been studied, and there have been attempts to apply it to

accounting and financial research. However, attempts to utilize artificial intelligence technology in

accounting and finance studies have recently been booming overseas. The following <Table 1> is on

the compilation of Scopus papers in the accounting and finance sectors based on data analysed by using

artificial intelligence technology (Cockcroft et al., 2018).

Page 5: The Effect of the Introduction of Text Mining Analysis on the … · 2020-01-23 · The Effect of the Introduction of Text Mining Analysis on the Accounting and Finance Research and

The Effect of the Introduction of Text Mining Analysis on the Accounting and Finance Research and Proposals for Graduate School Education / Lee, Kun Chang*·Na, Hyung Jong * 239

<Table 1> The number of Scopus papers in accounting and finance using big data analysis techniques among artificial intelligence technologies

Sources: Cockcroft et al. (2018)

The above <Table 1> include researches in the financial sector as well as in the accounting sector.

And it's a data count based on Scoupus papers. The number of research studies that analyzed big data

has increased since 2012, and it can be seen that research on big data analysis using artificial intelligence

technology has increased rapidly since 2013.

Indeed, in overseas accounting and finance studies, the number of papers applying these technologies

is on the rise. Research in accounting and finance is also being published in Korea, but it is not actively

studied as it is still being studied abroad. We introduce studies using text mining technology in the

accounting and finance area as below.

Ⅲ. Prior literatures in accounting and finance field using text mining technology

We introduce the accounting and financial sector studies for empirical analysis to quantify

unstructured text data using text mining techniques among artificial intelligence technologies.

Overseas, the accounting and financial sector studies used to analyze textual data, which is unstructured

big data, have conducted empirical analyses directly, and there are many papers that mention the

importance or explain the procedures. (Gupta and Lehal, 2009; Zhang et al., 2011; Al-Maimani et al.,

2014; Noh and Lee, 2015; Ravi and Ravi, 2015; Wu and Ester, 2015; Költringer and Dickinger, 2015;

Anand and Naorem, 2016; Campos et al., 2018).

The major overseas accounting and finance papers relevant to text mining are as follows. Fisher et al.

(2016) note that textual documents are often used to convey information such as management's

assessment of firms' financial performance, current and future corporate performance. Therefore, in

YearThe number of Scopus papers in accounting and finance

using big data analysis techniques 2007 82008 212009 332010 312011 892012 6602013 2,4292014 4,7632015 8,4122016 10,715

Page 6: The Effect of the Introduction of Text Mining Analysis on the … · 2020-01-23 · The Effect of the Introduction of Text Mining Analysis on the Accounting and Finance Research and

240 지역산업연구|제42권 제4호|2019.11

order to analyze these texts, they emphasized the importance of collecting and analyzing information

through Natural Language Processing (NLP), and mentioned the need for research in this field. Gaultz

and Mayo (2017) predict that big data analysis would further improve the audit environment. Big data

analysis helps identify associations between various information, analyze patterns of information, and

recognize numerous information. Rezaee and Wang (2019) describe the importance of a system that

uses text mining techniques to predict risks in advance. It said that it could establish a more effective

management system through risk assessment by analyzing documents such as audit reports and

corporate analysis reports as well as financial statements. Janvrin and Watson (2017) explain that the

role of providing enterprise information to internal and external decision makers was more effective and

efficient by utilizing big-data analysis. And it actually introduces software and analysis methods for

these big data analytics and a variety of examples. Byrnes et al. (2018) predict that since unstructured

big data analysis is possible through text mining technology, this would lead to a gradual reduction in

human weight in the audit process and an increase in the role of artificial intelligence systems. Boskou

et al. (2018) analyze the text in the financial report by text mining. The key words in the document

were extracted and the associations between similar words were analyzed to develop new indicators for

internal control functions. Wu et al. (2019) use text mining techniques to analyse the information

usefulness of news predicting the return of stock prices in Taiwan stock market. News variables have

proved to provide useful information in predicting Taiwan stock market returns. They found the

asymmetrical effect of economic news predicting stock market returns by producing results that

predictive accuracy is higher when the stock market is in recession than when it is booming. Yang et al.

(2018) analyze the enterprise's text reports using text mining techniques and extracted the firm's risk

factors through this process. Cheng et al. (2019) study factors that improve the accuracy of stock

valuations using text mining methods. Based on the correlation of key words found in text mining

analysis, a factorial relationship model was established.

The accounting and finance domestic papers using text mining are as follows. Lee and Kwon (2015)

used text mining techniques to extract the news article about the social responsibility of medium-sized

enterprises and analyze their association with corporate performance. Choi et al. (2015) utilized text

mining techniques to analyze news of companies that had gone bankrupt in the past. Thus, the

possibility of firm's prediction of bankruptcy was studied. The study found that words like 'recovery,

disclosure, funding, workout, embezzlement, capital increase, and creditors' were heavily exposed to the

news about two months before the bankruptcy. Yuk (2018) analyzed CEO messages in sustainable

reports using text mining techniques to study the relevance between CEO messages in sustainable

reports and corporate performance. The study reported that companies with lower corporate

performance tend to use more positive expressions in sustainable reports, have less readability, and

Page 7: The Effect of the Introduction of Text Mining Analysis on the … · 2020-01-23 · The Effect of the Introduction of Text Mining Analysis on the Accounting and Finance Research and

The Effect of the Introduction of Text Mining Analysis on the Accounting and Finance Research and Proposals for Graduate School Education / Lee, Kun Chang*·Na, Hyung Jong * 241

emphasize future performance. Yoo et al. (2018) selected bus-related complaints from 10,421 electronic

civil data during the 2015-2017 period and extracted the key words through text mining analysis. And

the degree of connection and the network between the major words was analyzed through the

association analysis. These studies provide data on countermeasures to reduce the number of bus

complaints. Kim and Kim (2017) analyzed the published textual data of the stock market analysis site

to derive information on the sensitivities of stock market investors. Jang et al. (2016) investigated the

titles of the analyst's investment reports using text mining techniques for the accuracy and achievement

of the analyst's forecast. The analysis results show that the more opinions of buying and selling are

reflected in the title, the more accurate and achievable the financial analyst is. Na et al. (2018)

examined company posting on the homepage of U.S. S&P500 companies through text mining. The

analysis found that certain key words have a systematic link to the firm's present and future financial

performance. Kim et al. (2012) analyzed the contents of the news through text mining and analyzed the

relationship between stock prices. The analysis found that there was a significant relationship between

positive and negative news content and the rise and fall of the stock index, especially the accuracy rate

was 70.0% when the stock price fell and 78.8% when the stock price went up. Na et al. (2019) noted

that companies that received appropriate feedback in the audit report did not have the same audit risk

and proved it through a empirical analysis. The content of the audit report was analyzed through text

mining to calculate the quantified sensitivity value according to the degree of positive or negative, and

the relationship between the audit value and the audit repair and audit time was analyzed. The analysis

results show that the sensitivity value of the audit report has a significant negative relationship with the

audit maintenance and audit time. This means that not all of the appropriate opinions in the audit

report are appropriate opinions that represent the same level of audit risk. Mo and Seo (2019) examined

and quantified the annotations of an enterprise's business report using a text mining technique to study

the volatility of the notes and the response of the stock market. According to the research results, the

contents of the notes in the enterprise's business report are reported to have a positive effect in a way

that increases the usefulness of information transfer, reducing the level of information matching of

equity capital expenses, stock trading volume, and profit response factors.

Ⅳ. Text mining technology for unstructured text big data analytics

(1) The concept of big data

Instead of simply understanding big data as a 'big + data' simple compound word, the Gandomi

and Haider (2015) explain that it is difficult to actually use in data structures such as structured data

Page 8: The Effect of the Introduction of Text Mining Analysis on the … · 2020-01-23 · The Effect of the Introduction of Text Mining Analysis on the Accounting and Finance Research and

242 지역산업연구|제42권 제4호|2019.11

and semi-structured data that is not yet available in the business environment (e.g., photography,

image, and text). In other words, the concept of big data means that although there is a large amount of

information, the forms of information vary, not only are much more information available than the

amount of information previously provided, but the type of information that was not available has also

become available.

Lee (2017) mentioned big data as follows: The characteristics of big data were described in terms of

size, variety, and velocity. First, size means the physical size of the data. The size of Big Data represents

data that has been extended to petabytes (PB)1). Second, diversity means the form of data. Depending

on whether the data is stored in a relational database (RDB)2) used in a traditional business data

environment, whether it is web log data, or unstructured data such as video, image, or text. Third,

velocity means data processing power. The process of acquiring, processing and analyzing data should

be handled in real time.

In other words, big data does not simply mean large-capacity data itself, but rather is a term that

places more importance on technologies that can effectively process and analyze various data. These big

data analytics technologies are designed to extract economically valuable data from large-capacity data

composed of various forms, such as M2M (Machine to Machine) sensor data, social data, and corporate

customer relationship data. It is important to effectively analyze not only structured data, but also

unstructured big data such as web documents on the Internet, text data on social media, e-mail and

videos on YouTube. The volume of big data was estimated to be around 2.8 zettabytes in 2012, and

will grow rapidly to around 40 zettabytes3) in 2020, with 20% of that expected to be structured data

and the remaining 80% to be unstructured data (Yoo, 2013).

<Table 2> below compares past and present data concepts. In the past, the form of data was

consistent with a particular format, but now there are no specific formats and they vary. The speed of

data has been a batch of data collected over a period of time, but it now processes data that continues to

occur in real time. In terms of the processing costs of data, it used to be relatively expensive, but now it

is relatively inexpensive. The purpose of processing data has been to analyze the results of past data in

the past, but now the focus is on system optimization or future prediction indicators development

through data analysis.

1) 1PB is 1,024TB (Tera Byte) as a measure of the throughput or capacity of a digital signal.2) A database that manages data using a two-dimensional table. A two-dimensional table refers to a table using

rows and columns, and the advantage of RDB is that data can be managed in a form that is easy for people to

understand. In addition, data can be controlled without using a separate programming language. In other words,

the ability to manipulate data, even if not an expert, has led to an increase in the users of the database.3) A combination of the prefix "zeta" meaning 10 to 21 square and "byte," a unit representing the amount of

computer data. A number equal to 10 years (垓) with 21 zeros per 1.

Page 9: The Effect of the Introduction of Text Mining Analysis on the … · 2020-01-23 · The Effect of the Introduction of Text Mining Analysis on the Accounting and Finance Research and

The Effect of the Introduction of Text Mining Analysis on the Accounting and Finance Research and Proposals for Graduate School Education / Lee, Kun Chang*·Na, Hyung Jong * 243

<Table 2> Changes of data concepts

Category Data concepts in the past Data concepts at the present

Data form constant to specific format No specific form, no variety

Data velocityConstant batch level: collected at regular

intervals data

Real time level: Handle continuous data

right away

Data processing

costrelatively high cost relatively low cost

Dataprocessing purpose

result analysis system optimization and future forecasts

Sources: Sivarajah et al. (2017)

At this point in time, artificial intelligence technology, which can be used most effectively in

accounting and finance research, is considered to be text mining. Below we introduce the concepts and

specific procedures of text mining techniques.

(2) Procedure and application of text mining techniques

Text mining is a technique for formatting semi-orthogonal or unstructured text materials based on

natural language processing (NLP)4). It is also an analytical method that allows useful information to

be found from features extracted from text. Text mining techniques allow you to analyze text data

among unstructured big data to extract significantly repeated keywords. This technique also identifies

associations between key words and groups words with common characteristics (Xiao et al., 2018). Text

mining can also present the words in a document in order of importance, or it can be represented by

larger, darker visualizations in an important order <Figure 1> below is an example of Word Cloud.

Word cloud, which is presented by Na et al. (2019), is expressed in large and thicker terms for the most

important words for the 2017 audit reports of KOSPI companies, and in contrast, smaller and more

flowing words for those that do not matter.

4) It is a technology that recognizes and processes the language that we use. That is, it allows us to recognize each

other's language and the language used by computers.

Page 10: The Effect of the Introduction of Text Mining Analysis on the … · 2020-01-23 · The Effect of the Introduction of Text Mining Analysis on the Accounting and Finance Research and

244 지역산업연구|제42권 제4호|2019.11

Sources: Na et al. (2019)

<Figure 1> 2017 KOSPI companies audit report word cloud

Meanwhile, the procedure for text mining is specifically described below. First, extract the necessary

text in the document using the Back of Word (BoW: Bag-of-Words)5) method. In this process,

research and unnecessary vocabulary are eliminated.

Second, create a TF-IDF (Term Frequency-Inverse Document Frequency) matrix that shows how

significantly words extracted using the Back of Word method are repeated in a document. TF-IDF is an

adjacent sequence between documents and text and is used primarily as a text mining vectorizing6)

method. TF-IDF is a relative value that does not simply mean the number of times the word is repeated

within the document, but how significantly the word is repeated in the document. TF-IDF is a kind of

weight used in text mining techniques, a statistical value that means how important the word is in a

document (Ramos, 2003; Paltoglou and Thelwall, 2010).

Third, to this stage, we can extract a significantly repeated keyword from the document, as we followed

the basic steps of text mining. Keywords allow you to grasp the key content and its meaning of the

document. <Figure 2> below schematizes the text mining procedure.

5) BoW considers the text used in a document as a set. It is a commonly used vector representation to determine the

number of such text regardless of the order of the text in the document. BoW helps to characterize the

document by extracting important words from unstructured text sources (Eo and Lee, 2019).6) With text mining techniques, words in a document are represented by vectors, and when words exist on the

document, they have vector values. Vectors are the weighting of words, and a typical method of obtaining this

value is the TF-IDF method (Kåebek et al., 2014).

Page 11: The Effect of the Introduction of Text Mining Analysis on the … · 2020-01-23 · The Effect of the Introduction of Text Mining Analysis on the Accounting and Finance Research and

The Effect of the Introduction of Text Mining Analysis on the Accounting and Finance Research and Proposals for Graduate School Education / Lee, Kun Chang*·Na, Hyung Jong * 245

<Figure 2> Text mining procedure

Sources: Gaikwad et al. (2014).

Opinion mining or Sentiment mining, a kind of text mining, provides us with information, even the

opinions and feelings implied within the text. If you want to further analyze your comments or feelings

about the document, you should follow the steps below. First of all, the extent to which the degree of

positive, neutral, and negative should be expressed should be specified. For example, if the maximum

value is +2 and the minimum value is –2, the more positive the word is, the closer the number is, the

more negative the number is, the closer the number –2. And the more neutral the word will be

expressed in numbers close to zero. <Table 3> below is the emotional vocabulary dictionary in the

audit report presented by Na et al. (2019), which quantifies the words in the audit report according to

the degree of positive, neutral, or negative in the context.

<Table 3> audit report emotional vocabulary dictionary

Degree of positive,

neutral, or negativeScore Words in the audit report

Negative

-3

Deficit Legal action

Damage Bad money

Difficulty Abort

-2

Reduction Loan-loss reserves

Expiration Assurance

Delay

-1

Expense Income tax

CompensationDivision

Normalization Politics

Page 12: The Effect of the Introduction of Text Mining Analysis on the … · 2020-01-23 · The Effect of the Introduction of Text Mining Analysis on the Accounting and Finance Research and

246 지역산업연구|제42권 제4호|2019.11

Source: Na et al. (2019)

If a dictionary of emotional words already exists for the document, it can be used for emotional

analysis. However, although there are currently many English-language sensitive language dictionaries,

most Korean-language versions of emotional dictionaries have yet to be established. When using

opinion mining or sentiment mining techniques, if there is no emotional language dictionary,

researchers should refer to a number of experts in the field to establish an emotional dictionary of it and

verify it before using the data (Yoo et al., 2013). For example, Na et al. (2019) collected opinions from

five experts in the field of audit to establish an emotional language dictionary on audit reports of Korean

KOSPI companies, and verified consistency through the ICC (Intraclass Correlation Coefficient) test.

Generally, more than 80% of the respondents' opinions are considered to be consistent.

Text mining techniques have a variety of techniques. The technologies like information extraction,

summarization, categorization, clustering, and information visualization, are utilized in the text mining

process. These various technologies can be useful tools for financial and accounting research. <Table

4> below describes the techniques that can be used in text mining.

Degree of positive,

neutral, or negativeScore Words in the audit report

Neutral 0

Individual Result

ClassificationImpact Budget

Measurement Efficiency

Positive

1

Development Issuing

New building Take over Normal Increase

2

Sales New

Surplus Early

Propulsion

3

Improvement Stability Closing

Dismissue

Page 13: The Effect of the Introduction of Text Mining Analysis on the … · 2020-01-23 · The Effect of the Introduction of Text Mining Analysis on the Accounting and Finance Research and

The Effect of the Introduction of Text Mining Analysis on the Accounting and Finance Research and Proposals for Graduate School Education / Lee, Kun Chang*·Na, Hyung Jong * 247

<Table 4> Techniques in text mining

Techniques utilized in text mining Explanation about techniques of text mining

Information ExtractionInformation extraction is initial step for computer to analyze unstructured text by identifying key phrases and relationships within text.

Categorization Categorization automatically assigns one or more category to free text document. Categorization is supervised learning method because it is based on input output examples to classify new documents.

Clustering Clustering method can be used in order to find groups of documents with similar content

Visualization In text mining visualization methods can improve and simplify the discovery of relevant information

Summarization

Text summarization is to reduce the length and detail of a document while retaining most important points and general meaning. Text summarization is helpful for to figure out whether or not a lengthy document meets the user’s needs and is worth reading for further information hence summary can replace the set of documents

Sources: Gaikwad et al. (2014)

Ⅴ. Proposals of education curriculums in accounting and finance filed for text mining technology

The accounting and finance sectors study entities that are not individuals. Information about

companies is very diverse and extensive. Such firm information may be disclosed as quantitative data,

such as financial statements, but may also be disclosed as qualitative data, such as business reports and

audit reports. Until now, empirical studies in the accounting and financial sectors have been conducted

using only quantitative data, since statistical analysis must be performed. This results in limiting the

scope of the study.

Text mining techniques help to quantify unstructured text big data for empirical analysis. This is to

study qualitative data that has not previously been used as research material. By enabling, it can play an

important role in expanding the scope of research. Therefore, I think the use of text mining techniques

should be more common for the development of accounting and financial research.

In order to realize this, the fundamental solution is to provide education for graduate school masters

and Ph.D. The following are education plans for text mining education in graduate school master's and

Ph.D.

First, classes on the use of text mining techniques should be added to the graduate education

curriculum. To do this, both theoretical and practical classes are necessary. As a theoretical class, classes

Page 14: The Effect of the Introduction of Text Mining Analysis on the … · 2020-01-23 · The Effect of the Introduction of Text Mining Analysis on the Accounting and Finance Research and

248 지역산업연구|제42권 제4호|2019.11

that explain the content of "utilization of unstructured text big data" are basically necessary. A

theoretical explanation and understanding of why textual data is needed and where it can be used will

have to be preceded. And while the field of accounting and financial research primarily uses SAS or

STATA statistical programs as a hands-on class, text mining requires a" class on R or Python". It is

necessary for researchers to initially collect the necessary text material to crawl into R or Python, since it

is also more efficient to use R or Python to process the collected text data and derive the TF-IDF value.

Statistical programs such as SAS and STATA can be used for empirical analyses, such as regression, after

quantifying text data.

Second, incumbent professors in the accounting and finance fields who teach graduate master's

degree Ph.D. students have not been educated on text mining in the past, so it is difficult to teach these

courses in reality. Therefore, at this point, we need to solve this problem by hiring professors who can

teach outside classes on text mining. However, these methods will allow students with text-mining

training to become professors and nurture future students after a certain period of time.

Like the business environment changing with artificial intelligence technology, the research

environment is changing due to artificial intelligence technology. The accounting and finance sectors

should also detect these changes in the research environment and come up with appropriate

countermeasures. Text mining, an atypical text big data analyzer, is considered the most useful artificial

intelligence technology in accounting and financial research.

<Table 5> below is the artificial intelligence management class curriculum at OO graduate school.

Through this class curriculum analysis, we present a curriculum that can be practically introduced by

the accounting and financial sector oligopoly. Looking at artificial intelligence management major

classes presented below, the first semester is the opening of a theory class called artificial intelligence

and business to explain the changing business environment and to educate students about the need to

introduce artificial intelligence technology. In the second semester, the basic statistical program classes

will be taught on how to use SAS. And at the time of the third semester, start practical training on R or

Python in earnest to take practical classes on text mining, and at the same time open classes on artificial

intelligence and accounting and finance to train how text materials can be applied and used in

accounting and finance research. In the fourth semester, more in-depth R or Python training will help

students learn the skills to quantify text. In the fifth semester, it has completed practical training on

text mining by educating on opinion mining and sentiment mining techniques through the class of text

big data analysis and practice. At the same time, it has completed the entire curriculum by teaching

solutions to social problems, countermeasures, and ethics that may arise due to artificial intelligence

through the class of 4th industrial revolution and social responsibility.

Page 15: The Effect of the Introduction of Text Mining Analysis on the … · 2020-01-23 · The Effect of the Introduction of Text Mining Analysis on the Accounting and Finance Research and

The Effect of the Introduction of Text Mining Analysis on the Accounting and Finance Research and Proposals for Graduate School Education / Lee, Kun Chang*·Na, Hyung Jong * 249

<Table 5> Example of a graduate course in artificial intelligence management

Too many classes may not be allocated to text mining education in graduate school classes in

accounting and finance, but I think it should be reorganized so that at least one theoretical class and one

practical class can be heard. Classes such as‘artificial intelligence business’and‘artificial intelligence and

accounting and finance’are suitable for theoretical class, and classes such as‘artificial intelligence

programming1 (R and Python)’and‘text big data analysis and practice’are suitable for practical class. In

addition to the existing basic statistical classes, it is necessary to train students to utilize text-manning

through additional statistical classes using R and Python.

Overseas studies in the accounting and finance sectors, as shown in <Table 1> of this study, show a

growing trend in big data analytics. In order to keep up with this global research trend, we need to

reform the graduate school curriculum to improve the research skills of the next generation.

VI. Conclusion

This paper explains the impact of text mining techniques, which are unstructured text big data

analysis methods, on the research of accounting and finance sectors, and puts forward appropriate

Semester Subject Descriptions of the subject

the first semester

· Artificial intelligence businessResearch on the impact of artificial intelligence on the business as a whole, and learn about future responses to it.

the second semester

· Practice of management statistics (SAS)

Learn about and practise the appropriate SAS as a statistical program for handling big data.

the third semester

· AI programming 1 (R, Python)Practice text mining techniques for programming using R and Python.

· Artificial intelligence, accounting and finance

Use artificial intelligence technology to learn examples and techniques that apply to the accounting and finance sectors.

the fourth semester

· AI programming 2 (R, Python)Practice machine learning techniques for programming using R and Python.

· Artificial intelligence and marketing

Apply artificial intelligence technology to learn cases and techniques applied in the marketing field.

the fifth semester

· Text big data analysis and practice

Learn how to extract the key words from the text big data and how to obtain the constant value.

· The 4th Industrial Revolution and Social Responsibility

Learn about the problems that may arise from the introduction of artificial intelligence technology in the 4th Industrial Revolution and the need for ethical regulation and social responsibility.

Page 16: The Effect of the Introduction of Text Mining Analysis on the … · 2020-01-23 · The Effect of the Introduction of Text Mining Analysis on the Accounting and Finance Research and

250 지역산업연구|제42권 제4호|2019.11

teaching methods to introduce them in accounting and finance sectors.

It pointed out that the scope of the study was also limited because only quantified data was used as a

limitation of empirical analysis studies in the accounting and financial sectors. It argued that for the

development of research in the accounting and finance sectors, it is necessary to try convergence by

introducing useful research methodologies in other academic fields. Taking text mining techniques, a

technique actively used in the field of management information systems, is necessary for a new

curriculum of graduate school master and Ph.D. should be implemented in order to develop accounting

and financial research.

To summarize the need to introduce the text mining techniques claimed in this paper: Among

artificial intelligence technologies, text mining technology can help use non-metered data such as text

for empirical analysis research.

A large amount of information in an enterprise cannot be expressed in quantitative terms. Firms'

financial statements describe its financial position and performance as qualitative information. In

accounting and finance studies, in fact, almost all quantitative data in financial statements is used for

empirical analysis. Business reports, audit reports, etc. express important information of the enterprise

with qualitative data such as text.

If text mining techniques are used, qualitative data such as text data can be quantified and therefore

can be used for empirical analysis. This means an expansion of available research data and an expansion

to the extent that it can be studied. In other words, text mining can be an important means of studying

many areas that have not yet been studied.

Currently, professors in the accounting and finance fields are not an educated generation of artificial

intelligence technologies such as text mining technology, so it is practically impossible to apply this

method to research or directly educate students to foster graduate school studies. However, for the

development of accounting and financial field research and the subsequent development of researchers,

training in text mining, a technique for analyzing unstructured text materials among artificial

intelligence technologies, is required during the graduate school education course.

For now, the government should continue to provide education on this in its master's and Ph.D.

courses by recruiting outside text mining experts. Over time in the future, the proportion of professors

majoring in accounting and finance who have acquired text mining skills will gradually increase. At

least one theoretical class and one practical class on text mining should be opened in the graduate school

education curriculum of accounting and finance majors to help keep up with changing global research

trends. When accepting innovative new things without fear of change, the areas of accounting and

financial research can be further advanced. To do this, graduate students, who are essentially future

accounting and finance researchers, should be educated about text mining technology, one of the

Page 17: The Effect of the Introduction of Text Mining Analysis on the … · 2020-01-23 · The Effect of the Introduction of Text Mining Analysis on the Accounting and Finance Research and

The Effect of the Introduction of Text Mining Analysis on the Accounting and Finance Research and Proposals for Graduate School Education / Lee, Kun Chang*·Na, Hyung Jong * 251

artificial intelligence technologies.

The contribution of this paper is as follows. First, the need to introduce text mining techniques for

unstructured text big data analysis was explained as a way to counter the need for new changes in

accounting and financial research areas. Second, the concepts and procedures of big data and text

mining techniques were introduced in detail. Third, as a fundamental solution for the use of text mining

technology in the study of the accounting and finance sectors, we proposed a concrete solution to this by

insisting on a reform of the graduate school education.

■ 논문투고일 ■ 논문 최종심사일 ■ 논문게재확정일

2019. 10. 112019. 10. 302019. 11. 08

Page 18: The Effect of the Introduction of Text Mining Analysis on the … · 2020-01-23 · The Effect of the Introduction of Text Mining Analysis on the Accounting and Finance Research and

252 지역산업연구|제42권 제4호|2019.11

참고문헌

김유신·김남규·정승렬(2012), “뉴스와 주가: 빅데이터 감성분석을 통한 지능형 투자의사결

정모형. 지능정보연구,” 한국지능정보시스템학회, 18(2), 143-168.

김재봉·김형중(2017), “주가지수 방향성 예측을 위한 도메인 맞춤형 감성사전 구축방안,”

한국디지털콘텐츠학회 논문지, 18(3), 585-592.

나형종·최석재·권오병(2018), “The Association of Institutional Information on Websites with

Present and Future Financial Performance,”한국전자거래학회지, 23(4), 63-85.

나형종·이건창·최승욱·김성태(2019), “감사보고서의 비정형 내용분석과 감사보수 및 시간

을 이용한 감사의견의 적정성 연구: 텍스트 마이닝과 감성분석 기법 적용을 중심

으로,”회계학연구, 44(4), 175-214.

모예린·서윤석(2019), “주석 내용의 변동과 주식시장: 주석 내용의 변동이 자기자본비용과

주식거래량 및 이익반응계수에 미치는 영향,” 회계학연구, 44(4), 215-249.

어균선·이건창(2019), “효과적 이모션마이닝을 위한 속성선택 방법에 관한 연구,” 디지털

융복합연구, 17(3), 107-117.

유선희(2013), “빅데이터 기반의 산업시장 정보분석,” 한국과학기술정보연구원 보고서

유승의·홍순구·이태헌·김나랑(2018), “텍스트 네트워크 분석을 통한 부산시 버스민원 패턴

분석,” 전산회계연구, 16(2), 19-43.

유은지·김유신·김남규·정승렬(2013), “주가지수 방향성 예측을 위한 주제지향감성사전 구

축 방안,” 지능정보연구, 19(1), 95-110.

육근효(2018), “CEO 의 사회적 책임 메시지와 지속가능성 성과의 관계: Text Mining 접근법

의 활용,” 회계저널, 27(1), 253-279.

이희승·권오병(2015), “텍스트마이닝을 활용한 기업의 CSR 요인 추출과 기업 성과와의 관

계 분석,” 한국 IT 서비스학회 학술대회 논문집, 577-580.

장준규·이규현·이준기(2016), “투자전략 보고서의 제목이 주가 예측에 미치는 영향: 텍스트

마이닝 중심으로,” 한국빅데이터학회지, 1(2), 21-34.

최정원·한호선·이미영·안준모(2015), “텍스트마이닝 방법론을 활용한 기업 부도 예측 연

구,”생산성논집 (구 생산성연구), 29(1), 201-228.

Al-Maimani, M., N. Salim, and A. M. Al-Naamany. A. M.(2014), “Semantic and fuzzy aspects of

opinion mining,” Journal of Theoretical and Applied Information Technology, 63(2),

330-342.

Page 19: The Effect of the Introduction of Text Mining Analysis on the … · 2020-01-23 · The Effect of the Introduction of Text Mining Analysis on the Accounting and Finance Research and

The Effect of the Introduction of Text Mining Analysis on the Accounting and Finance Research and Proposals for Graduate School Education / Lee, Kun Chang*·Na, Hyung Jong * 253

Anand, D., and D. Naorem(2016), “Semi-supervised aspect based sentiment analysis for movies using

review filtering,” Procedia Computer Science, 84, 86-93.

Boskou, G., E. Kirkos, and C. Spathis.(2018), “Assessing internal audit with text mining,” Journal of

Information and Knowledge Management, 17(2), 1-22.

Byrnes, P. E., A. Al-Awadhi, B. Gullvist, H. Brown-Liburd, R. Teeter, J. D. Warren Jr, and M.

Vasarhelyi(2018), “Evolution of auditing: from the traditional approach to the future audit,”

In Continuous Auditing: Theory and Application Emerald Publishing Limited, 285-297.

Campos, R., Mangaravite, V., Pasquali, A., Jorge, A. M., Nunes, C., and Jatowt, A. (2018), “YAKE!

collection-independent automatic keyword extractor,” In European Conference on Information

Retrieval, Springer, Cham, 806-810.

Cheng, X., Huang, D., Chen, J., Meng, X., and Li, C.(2019), “An Investigation on Factors Affecting

Stock Valuation Using Text Mining for Automated Trading,” Sustainability, 11(7), 1938.

Cockcroft, S., and Russell, M.(2018), “Big data opportunities for accounting and finance practice and

research,” Australian Accounting Review, 28(3), 323-333.

Fisher, I. E., Garnsey, M. R., and Hughes, M. E.(2016), “Natural language processing in accounting,

auditing and finance: A synthesis of the literature with a roadmap for future research,”

Intelligent Systems in Accounting, Finance and Management, 23(3), 157-214.

Gaikwad, S. V., Chaugule, A., and Patil, P.(2014), “Text mining methods and techniques,”

International Journal of Computer Applications, 85(17).

Gandomi, A., and Haider, M.(2015), “Beyond the hype: Big data concepts, methods, and analytics,”

International journal of information management, 35(2), 137-144.

Goltz, N., and Mayo, M.(2017), “Enhancing regulatory compliance by using artificial intelligence text

mining to identify penalty clauses in legislation,” In MIREL 2017-Workshop onMIning and

REasoning with Legal texts June 16th.

Gupta, V. and G. S., Lehal(2009), “A survey of text mining techniques and applications,” Journal of

Emerging Technologies in Web Intelligence, 1(1), 60-76.

Janvrin, D. J., and Watson, M. W.(2017). “Big Data: A new twist to accounting,” Journal of

Accounting Education, 38, 3-8.

Kågebäck, M., O. Mogren, N. Tahmasebi., and D. Dubhashi(2014), “Extractive summarization using

continuous vector space models,” In Proceedings of the 2nd Workshop on Continuous Vector

Space Models and Their Compositionality, 31-39.

Koltringer, C., and Dickinger, A.(2015), “Analyzing destination branding and image from online

sources: a web content mining approach,” Journal of Business Research 68(9), 1836-1843.

Page 20: The Effect of the Introduction of Text Mining Analysis on the … · 2020-01-23 · The Effect of the Introduction of Text Mining Analysis on the Accounting and Finance Research and

254 지역산업연구|제42권 제4호|2019.11

Lee, I.(2017), “Big data: Dimensions, evolution, impacts, and challenges,” Business Horizons, 60(3),

293-303.

Noh, H., Y. Jo, and S. Lee.(2015),“Keyword selection and processing strategy for applying text mining

to patent analysis,” Expert Systems with Applications, 42(9), 4348-4360.

Paltoglou, G., and Thelwall, M.(2010), “A study of information retrieval weighting schemes for

sentiment analysis,” In Proceedings of the 48th annual meeting of the association for

computational linguistics, Association for Computational Linguistics, 1386-1395.

Pejić Bach, M., Krstić, Ž., Seljan, S., and Turulja, L.(2019), “Text mining for big data analysis in

financial sector: a literature review,” Sustainability, 11(5), 1277.

Ramos, J.(2003), “Using TF-IDF to determine word relevance in document queries,”In Proceedings of

the First Instructional Conference on Machine Learning, 242, 133-142.

Ravi, K. and V. Ravi(2015), “A survey on opinion mining and sentiment analysis: tasks, approaches and

applications,” Knowledge-Based Systems, 89, 14-46.

Rezaee, Z., and Wang, J.(2019), “Relevance of big data to forensic accounting practice and education,”

Managerial Auditing Journal, 34(3), 268-288.

Sivarajah, U., Kamal, M. M., Irani, Z., and Weerakkody, V.(2017), “Critical analysis of Big Data

challenges and analytical methods,” Journal of Business Research, 70, 263-286.

Wu, Y. and M. Ester(2015), “Flame: a probabilistic model combining aspect based opinion mining and

collaborative filtering,” In Proceedings of the Eighth ACM International Conference on Web

Search and Data Mining ACM, 199-208.

Wu, G. G. R., Hou, T. C. T., and Lin, J. L.(2019), “Can economic news predict Taiwan stock market

returns?,” Asia Pacific Management Review, 24(1), 54-59.

Yang, R., Y. Yu, M. Liu, and K. Wu.(2018), “Corporate risk disclosure and audit fee: a text mining

approach,” European Accounting Review, 27(3), 583-594.

Zhang, W., T. Yoshida, and X. Tang(2011), “A comparative study of TF-IDF, LSI and multi-words for

text classification,” Expert Systems with Applications, 38(3), 2758-2765.

Page 21: The Effect of the Introduction of Text Mining Analysis on the … · 2020-01-23 · The Effect of the Introduction of Text Mining Analysis on the Accounting and Finance Research and

The Effect of the Introduction of Text Mining Analysis on the Accounting and Finance Research and Proposals for Graduate School Education / Lee, Kun Chang*·Na, Hyung Jong * 255

국 문 요 약

텍스트 마이닝 분석 기법의 도입이 회계 및 재무 분야 연구에 미치는 영향과 이에 대한 대학원 교육방안 제시

이건창(Lee, Kun Chang)*․나형종(Na, Hyung Jong)**

본 논문은 비정형 빅데이터 분석방법 중 하나인 텍스트 마이닝이 회계 및 재무 분야 연구에 미치는 영향에 대해서 설명하고, 회계 및 재무 분야에 이를 도입하기 위한 적절한 교육 방안을 제시한다. 회계 및 재무 분야의 실증 분석 연구들의 한계점은 계량화된 자료만을 사용할 수밖에 없다는 점이고, 이는 연구의 범위를 제한한다. 회계 및 재무 분야 연구의 발전을 위해서는 정성적 자료인 문서들을 분석할 수 있는 텍스트 마이닝 기법을 도입해야 하고, 대학원 석사·박사 과정에서 이에 대한 교육과정을 신설하여 실시해야 한다. 본 논문의 공헌점은 다음과 같다. 첫째, 회계 및 재무 연구 분야에 새로운 변화의 필요성을 주장하며, 이에 대응하기 위한 방안으로 비정형 텍스트 빅데이터 분석을 위한 텍스트 마이닝 기법의 도입 필요성에 대해 설명하였다. 둘째, 빅데이터와 텍스트 마이닝 기법에 대한 그 개념과 절차들을 자세히 소개하였다. 셋째, 회계 및 재무 분야 연구에 텍스트 마이닝 기술이 활용되기 위한 근본적인 해결책으로써, 이에 대한 대학원 교육과정 개편을 주장함과 동시에 이에 대한 구체적인 개선방안을 제시하였다.

∣주요어∣ 텍스트 마이닝, 회계, 재무, 대학원 교육

1)

* 제1저자, 성균관대학교 경영대학 글로벌경영학과 교수, Professor, SKKU Business School, Sungkyunkwan University ([email protected]) ** 교신저자, 성균관대학교 경영대학 연구교수, Research Professor, SKKU Business School, Sungkyunkwan

University ([email protected])

Page 22: The Effect of the Introduction of Text Mining Analysis on the … · 2020-01-23 · The Effect of the Introduction of Text Mining Analysis on the Accounting and Finance Research and

256 지역산업연구|제42권 제4호|2019.11

1. 주저자

이건창(Lee, Kun Chang): [email protected]

현재 성균관대학교 경영대학 글로벌경영학과 교수로 재직 중이며, KAIST에서

경영학과에서 석사 및 박사학위를 취득하였다. 주요 관심분야는 인공지능과

경영의 융합 연구이다.

2. 교신저자

나형종(Na, Hyung Jong): [email protected]

현재 성균관대학교 경영대학 연구교수로 재직 중이며, 경희대학교에서 회계

학 석사 및 박사학위를 취득하였다. 주요 관심분야는 Text mining과 Machine

learning, Deep learning을 활용한 회계분야 연구이다.