NU Learning Resource Center OPAC catalog › Details for: International Journal of Data Warehousing and Mining

Normal view MARC view ISBD view

International Journal of Data Warehousing and Mining

Material type: Text

TextSeries: ; International Journal of Data Warehousing and Mining, Volume 16, Issue 3, July-September 2020Publication details: [Place of publication not identified] : IGI Publishing, c2020Description: 1-200 pages : illustrations ; 25 cmISSN: 1548-3924Subject(s): TRAJECTORY | NETWORK EMBEDDING | DATA MINING | CLASS IMBALANCE | PATENT MINING | SEMANTIC TRAJECTORY DATA WAREHOUSE | REGION PROPOSAL NETWORK | COMPLEX NETWORK | OPINION MINING | COLLABORATIVE FILTERING

Contents:

A Novel Method for Classifying Function of Spatial Regions Based on Two Sets of Characteristics Indicated by Trajectories -- Extending LINE for Network Embedding With Completely Imbalanced Labels -- Discovering Specific Sales Patterns Among Different Market Segments -- A Boosting-Aided Adaptive Cluster-Based Undersampling Approach for Treatment of Class Imbalance Problem -- Serialized Co-Training-Based Recognition of Medicine Names for Patent Mining and Retrieval -- Conceptual Model and Design of Semantic Trajectory Data Warehouse -- A Novel Multi-Scale Feature Fusion Method for Region Proposal Network in Fast Object Detection -- Skeleton Network Extraction and Analysis on Bicycle Sharing Networks -- Integrating Feature and Instance Selection Techniques in Opinion Mining -- Recommender Systems Using Collaborative Tagging.

Summary: [Article Title: A Novel Method for Classifying Function of Spatial Regions Based on Two Sets of Characteristics Indicated by Trajectories/ Haitao Zhang, Chenguang Yu and Chenguang Yu, p. 1-19] Abstract: Trajectory is a significant factor for classifying functions of spatial regions. Many spatial classification methods use trajectories to detect buildings and districts in urban settings. However, methods that only take into consideration the local spatiotemporal characteristics indicated by trajectories may generate inaccurate results. In this article, a novel method for classifying function of spatial regions based on two sets of characteristics indicated by trajectories is proposed, in which the local spatiotemporal characteristics as well as the global connection characteristics are obtained through two sets of calculations. The method was evaluated in two experiments: one that measured changes in the classification metric through a splits ratio factor, and one that compared the classification performance between the proposed method and methods based on a single set of characteristics. The results showed that the proposed method is more accurate than the two traditional methods, with a precision value of 0.93, a recall value of 0.77, and an F-Measure value of 0.84. https://doi.org/10.4018/IJDWM.2020070101Summary: [Article Title: Extending LINE for Network Embedding With Completely Imbalanced Labels/ Zheng Wang,Qiao Wang,Tanjie Zhu and Xiaojun Ye, p. 20-36] Abstract: Network embedding is a fundamental problem in network research. Semi-supervised network embedding, which benefits from labeled data, has recently attracted considerable interest. However, existing semi-supervised methods would get biased results in the completely-imbalanced label setting where labeled data cannot cover all classes. This article proposes a novel network embedding method which could benefit from completely-imbalanced labels by approximately guaranteeing both intra-class similarity and inter-class dissimilarity. In addition, the authors prove and adopt the matrix factorization form of LINE (a famous network embedding method) as the network structure preserving model. Extensive experiments demonstrate the superiority and robustness of this method. https://doi.org/10.4018/IJDWM.2020070102Summary: [Article Title: Discovering Specific Sales Patterns Among Different Market Segments/ Cheng-Hsiung Weng and Cheng-Kui Huang, p. 37-59] Abstract: Formulating different marketing strategies to apply to various market segments is a noteworthy undertaking for marketing managers. Accordingly, marketing managers should identify sales patterns among different market segments. The study initially applies the concept of recency–frequency–monetary (RFM) scores to segment transaction datasets into several sub-datasets (market segments) and discovers RFM itemsets from these market segments. In addition, three sales features (unique, common, and particular sales patterns) are defined to identify various sales patterns in this study. In particular, a new criterion (contrast support) is also proposed to discover notable sales patterns among different market segments. This study develops an algorithm, called sales pattern mining (SPMING), for discovering RFM itemsets from several RFM-based market segments and then identifying unique, common, and particular sales patterns. The experimental results from two real datasets show that the SPMING algorithm can discover specific sales patterns in various market segments. https://doi.org/10.4018/IJDWM.2020070103Summary: [Article Title: A Boosting-Aided Adaptive Cluster-Based Undersampling Approach for Treatment of Class Imbalance Problem/ Seifedine Kadry,Debashree Devi and Suyel Namasudra, p. 60-86] Abstract: The subject of a class imbalance is a well-investigated topic which addresses performance degradation of standard learning models due to uneven distribution of classes in a dataspace. Cluster-based undersampling is a popular solution in the domain which offers to eliminate majority class instances from a definite number of clusters to balance the training data. However, distance-based elimination of instances often got affected by the underlying data distribution. Recently, ensemble learning techniques have emerged as effective solution due to its weighted learning principle of rare instances. In this article, a boosting aided adaptive cluster-based undersampling technique is proposed to facilitate elimination of learning- insignificant majority class instances from the clusters, detected through AdaBoost ensemble learning model. The proposed work is validated with seven existing cluster based undersampling techniques for six binary datasets and three classification models. The experimental results have established the effectives of the proposed technique than the existing methods. https://doi.org/10.4018/IJDWM.2020070104Summary: [Article Title: Serialized Co-Training-Based Recognition of Medicine Names for Patent Mining and Retrieval/ Na Deng and Caiquan Xiong, p. 87-107] Abstract: In the retrieval and mining of traditional Chinese medicine (TCM) patents, a key step is Chinese word segmentation and named entity recognition. However, the alias phenomenon of traditional Chinese medicines causes great challenges to Chinese word segmentation and named entity recognition in TCM patents, which directly affects the effect of patent mining. Because of the lack of a comprehensive Chinese herbal medicine name thesaurus, traditional thesaurus-based Chinese word segmentation and named entity recognition are not suitable for medicine identification in TCM patents. In view of the present situation, using the language characteristics and structural characteristics of TCM patent texts, a modified and serialized co-training method to recognize medicine names from TCM patent abstract texts is proposed. Experiments show that this method can maintain high accuracy under relatively low time complexity. In addition, this method can also be expanded to the recognition of other named entities in TCM patents, such as disease names, preparation methods, and so on. https://doi.org/10.4018/IJDWM.2020070105Summary: [Article Title: Conceptual Model and Design of Semantic Trajectory Data Warehouse/ Michael Mireku Kwakye, p. 108-131] Abstract: The trajectory patterns of a moving object in a spatio-temporal domain offers varied information in terms of the management of the data generated from the movement. The query results of trajectory objects from the data warehouse are usually not enough to answer certain trend behaviours and meaningful inferences without the associated semantic information of the trajectory object or the geospatial environment within a specified purpose or context. This article formulates and designs a generic ontology modelling framework that serves as the background model platform for the design of a semantic data warehouse for trajectories. The methodology underpins on higher granularity of data as a result of pre-processed and extract-transformed-load (ETL) data so as to offer efficient semantic inference to the underlying trajectory data. Moreover, the modelling approach outlines the thematic dimensions that offer a design platform for predictive trend analysis and knowledge discovery in the trajectory dynamics and data processing for moving objects. https://doi.org/10.4018/IJDWM.2020070106Summary: [Article Title: A Novel Multi-Scale Feature Fusion Method for Region Proposal Network in Fast Object Detection/ Gang Liu and Chuyi Wang, p. 132-145] Abstract: Neural network models have been widely used in the field of object detecting. The region proposal methods are widely used in the current object detection networks and have achieved well performance. The common region proposal methods hunt the objects by generating thousands of the candidate boxes. Compared to other region proposal methods, the region proposal network (RPN) method improves the accuracy and detection speed with several hundred candidate boxes. However, since the feature maps contains insufficient information, the ability of RPN to detect and locate small-sized objects is poor. A novel multi-scale feature fusion method for region proposal network to solve the above problems is proposed in this article. The proposed method is called multi-scale region proposal network (MS-RPN) which can generate suitable feature maps for the region proposal network. In MS-RPN, the selected feature maps at multiple scales are fine turned respectively and compressed into a uniform space. The generated fusion feature maps are called refined fusion features (RFFs). RFFs incorporate abundant detail information and context information. And RFFs are sent to RPN to generate better region proposals. The proposed approach is evaluated on PASCAL VOC 2007 and MS COCO benchmark tasks. MS-RPN obtains significant improvements over the comparable state-of-the-art detection models. https://doi.org/10.4018/IJDWM.2020070107Summary: [Article Title: Skeleton Network Extraction and Analysis on Bicycle Sharing Networks/ Kanokwan Malang,Shuliang Wang,Yuanyuan Lv and Aniwat Phaphuangwittayakul, p. 146-167] Abstract: Skeleton network extraction has been adopted unevenly in transportation networks whose nodes are always represented as spatial units. In this article, the TPks skeleton network extraction method is proposed and applied to bicycle sharing networks. The method aims to reduce the network size while preserving key topologies and spatial features. The authors quantified the importance of nodes by an improved topology potential algorithm. The spatial clustering allows to detect high traffic concentrations and allocate the nodes of each cluster according to their spatial distribution. Then, the skeleton network is constructed by aggregating the most important indicated skeleton nodes. The authors examine the skeleton network characteristics and different spatial information using the original networks as a benchmark. The results show that the skeleton networks can preserve the topological and spatial information similar to the original networks while reducing their size and complexity. https://doi.org/10.4018/IJDWM.2020070108Summary: [Article Title: Integrating Feature and Instance Selection Techniques in Opinion Mining/ Zi-Hung You,Ya-Han Hu,Chih-Fong Tsai and Yen-Ming Kuo, p. 168-182] Abstract: Opinion mining focuses on extracting polarity information from texts. For textual term representation, different feature selection methods, e.g. term frequency (TF) or term frequency–inverse document frequency (TF–IDF), can yield diverse numbers of text features. In text classification, however, a selected training set may contain noisy documents (or outliers), which can degrade the classification performance. To solve this problem, instance selection can be adopted to filter out unrepresentative training documents. Therefore, this article investigates the opinion mining performance associated with feature and instance selection steps simultaneously. Two combination processes based on performing feature selection and instance selection in different orders, were compared. Specifically, two feature selection methods, namely TF and TF–IDF, and two instance selection methods, namely DROP3 and IB3, were employed for comparison. The experimental results by using three Twitter datasets to develop sentiment classifiers showed that TF–IDF followed by DROP3 performs the best. https://doi.org/10.4018/IJDWM.2020070109Summary: [Article Title: Recommender Systems Using Collaborative Tagging/ Latha Banda,Karan Singh,Mohamed Abdel-Basset,Pham Huy Thong,Hiep Xuan Huynh,David Taniar and Le Hoang Son, p. 183-200] Abstract: Collaborative tagging is a useful and effective way for classifying items with respect to search, sharing information so that users can be tagged via online social networking. This article proposes a novel recommender system for collaborative tagging in which the genre interestingness measure and gradual decay are utilized with diffusion similarity. The comparison has been done on the benchmark recommender system datasets namely MovieLens, Amazon datasets against the existing approaches such as collaborative filtering based on tagging using E-FCM, and E-GK clustering algorithms, hybrid recommender systems based on tagging using GA and collaborative tagging using incremental clustering with trust. The experimental results ensure that the proposed approach achieves maximum prediction accuracy ratio of 9.25% for average of various splits data of 100 users, which is higher than the existing approaches obtained only prediction accuracy of 5.76%. https://doi.org/10.4018/IJDWM.2020070110

Item type:

Serials

Tags from this library: No tags from this library for this title. Log in to add tags.

Average rating: 0.0 (0 votes)

Holdings ( 1 )
Title notes ( 22 )
Comments ( 0 )

Item type	Current library	Home library	Collection	Shelving location	Call number	Copy number	Status	Date due	Barcode
Serials	LRC - Main	National University - Manila	Gen. Ed. - CCIT	Periodicals	International Journal of Data Warehousing and Mining, Volume 16, Issue 3, July-September 2020 (Browse shelf (Opens below))	c.1	Available		PER000000326

Browsing National University - Manila shelves, Shelving location: Periodicals, Collection: Gen. Ed. - CCIT Close shelf browser (Hides shelf browser)

Previous	No cover image available	No cover image available	No cover image available	No cover image available	No cover image available	No cover image available	No cover image available	Next
Previous	Communications of the ACM, Volume 63, No. 4, March 2020 Communications of the ACM.	International Journal of Information Technology and Management, Volume 18, Issue 2/3, 2019 International Journal of Information Technology and Management	International Journal of Information Technology and Management, Volume 18, Issue 1, 2019 International Journal of Information Technology and Management	International Journal of Data Warehousing and Mining, Volume 16, Issue 3, July-September 2020 International Journal of Data Warehousing and Mining	Information Systems Management, Volume 37, Issue 1-2, 2020 Information Systems Management	Information Systems Management, Volume 37, Issue 3-4, 2020 Information Systems Management	Gadgets Magazine, Volume 19, Issue 9, May 2019 Gadgets Magazine.	Next

Includes bibliographical references.

[Article Title: A Novel Method for Classifying Function of Spatial Regions Based on Two Sets of Characteristics Indicated by Trajectories/ Haitao Zhang, Chenguang Yu and Chenguang Yu, p. 1-19]

Abstract: Trajectory is a significant factor for classifying functions of spatial regions. Many spatial classification methods use trajectories to detect buildings and districts in urban settings. However, methods that only take into consideration the local spatiotemporal characteristics indicated by trajectories may generate inaccurate results. In this article, a novel method for classifying function of spatial regions based on two sets of characteristics indicated by trajectories is proposed, in which the local spatiotemporal characteristics as well as the global connection characteristics are obtained through two sets of calculations. The method was evaluated in two experiments: one that measured changes in the classification metric through a splits ratio factor, and one that compared the classification performance between the proposed method and methods based on a single set of characteristics. The results showed that the proposed method is more accurate than the two traditional methods, with a precision value of 0.93, a recall value of 0.77, and an F-Measure value of 0.84.

https://doi.org/10.4018/IJDWM.2020070101

[Article Title: Extending LINE for Network Embedding With Completely Imbalanced Labels/ Zheng Wang,Qiao Wang,Tanjie Zhu and Xiaojun Ye, p. 20-36]

Abstract: Network embedding is a fundamental problem in network research. Semi-supervised network embedding, which benefits from labeled data, has recently attracted considerable interest. However, existing semi-supervised methods would get biased results in the completely-imbalanced label setting where labeled data cannot cover all classes. This article proposes a novel network embedding method which could benefit from completely-imbalanced labels by approximately guaranteeing both intra-class similarity and inter-class dissimilarity. In addition, the authors prove and adopt the matrix factorization form of LINE (a famous network embedding method) as the network structure preserving model. Extensive experiments demonstrate the superiority and robustness of this method.

https://doi.org/10.4018/IJDWM.2020070102

[Article Title: Discovering Specific Sales Patterns Among Different Market Segments/ Cheng-Hsiung Weng and Cheng-Kui Huang, p. 37-59]

Abstract: Formulating different marketing strategies to apply to various market segments is a noteworthy undertaking for marketing managers. Accordingly, marketing managers should identify sales patterns among different market segments. The study initially applies the concept of recency–frequency–monetary (RFM) scores to segment transaction datasets into several sub-datasets (market segments) and discovers RFM itemsets from these market segments. In addition, three sales features (unique, common, and particular sales patterns) are defined to identify various sales patterns in this study. In particular, a new criterion (contrast support) is also proposed to discover notable sales patterns among different market segments. This study develops an algorithm, called sales pattern mining (SPMING), for discovering RFM itemsets from several RFM-based market segments and then identifying unique, common, and particular sales patterns. The experimental results from two real datasets show that the SPMING algorithm can discover specific sales patterns in various market segments.

https://doi.org/10.4018/IJDWM.2020070103

[Article Title: A Boosting-Aided Adaptive Cluster-Based Undersampling Approach for Treatment of Class Imbalance Problem/ Seifedine Kadry,Debashree Devi and Suyel Namasudra, p. 60-86]

Abstract: The subject of a class imbalance is a well-investigated topic which addresses performance degradation of standard learning models due to uneven distribution of classes in a dataspace. Cluster-based undersampling is a popular solution in the domain which offers to eliminate majority class instances from a definite number of clusters to balance the training data. However, distance-based elimination of instances often got affected by the underlying data distribution. Recently, ensemble learning techniques have emerged as effective solution due to its weighted learning principle of rare instances. In this article, a boosting aided adaptive cluster-based undersampling technique is proposed to facilitate elimination of learning- insignificant majority class instances from the clusters, detected through AdaBoost ensemble learning model. The proposed work is validated with seven existing cluster based undersampling techniques for six binary datasets and three classification models. The experimental results have established the effectives of the proposed technique than the existing methods.

https://doi.org/10.4018/IJDWM.2020070104

[Article Title: Serialized Co-Training-Based Recognition of Medicine Names for Patent Mining and Retrieval/ Na Deng and Caiquan Xiong, p. 87-107]

Abstract: In the retrieval and mining of traditional Chinese medicine (TCM) patents, a key step is Chinese word segmentation and named entity recognition. However, the alias phenomenon of traditional Chinese medicines causes great challenges to Chinese word segmentation and named entity recognition in TCM patents, which directly affects the effect of patent mining. Because of the lack of a comprehensive Chinese herbal medicine name thesaurus, traditional thesaurus-based Chinese word segmentation and named entity recognition are not suitable for medicine identification in TCM patents. In view of the present situation, using the language characteristics and structural characteristics of TCM patent texts, a modified and serialized co-training method to recognize medicine names from TCM patent abstract texts is proposed. Experiments show that this method can maintain high accuracy under relatively low time complexity. In addition, this method can also be expanded to the recognition of other named entities in TCM patents, such as disease names, preparation methods, and so on.

https://doi.org/10.4018/IJDWM.2020070105

[Article Title: Conceptual Model and Design of Semantic Trajectory Data Warehouse/ Michael Mireku Kwakye, p. 108-131]

Abstract: The trajectory patterns of a moving object in a spatio-temporal domain offers varied information in terms of the management of the data generated from the movement. The query results of trajectory objects from the data warehouse are usually not enough to answer certain trend behaviours and meaningful inferences without the associated semantic information of the trajectory object or the geospatial environment within a specified purpose or context. This article formulates and designs a generic ontology modelling framework that serves as the background model platform for the design of a semantic data warehouse for trajectories. The methodology underpins on higher granularity of data as a result of pre-processed and extract-transformed-load (ETL) data so as to offer efficient semantic inference to the underlying trajectory data. Moreover, the modelling approach outlines the thematic dimensions that offer a design platform for predictive trend analysis and knowledge discovery in the trajectory dynamics and data processing for moving objects.

https://doi.org/10.4018/IJDWM.2020070106

[Article Title: A Novel Multi-Scale Feature Fusion Method for Region Proposal Network in Fast Object Detection/ Gang Liu and Chuyi Wang, p. 132-145]

Abstract: Neural network models have been widely used in the field of object detecting. The region proposal methods are widely used in the current object detection networks and have achieved well performance. The common region proposal methods hunt the objects by generating thousands of the candidate boxes. Compared to other region proposal methods, the region proposal network (RPN) method improves the accuracy and detection speed with several hundred candidate boxes. However, since the feature maps contains insufficient information, the ability of RPN to detect and locate small-sized objects is poor. A novel multi-scale feature fusion method for region proposal network to solve the above problems is proposed in this article. The proposed method is called multi-scale region proposal network (MS-RPN) which can generate suitable feature maps for the region proposal network. In MS-RPN, the selected feature maps at multiple scales are fine turned respectively and compressed into a uniform space. The generated fusion feature maps are called refined fusion features (RFFs). RFFs incorporate abundant detail information and context information. And RFFs are sent to RPN to generate better region proposals. The proposed approach is evaluated on PASCAL VOC 2007 and MS COCO benchmark tasks. MS-RPN obtains significant improvements over the comparable state-of-the-art detection models.

https://doi.org/10.4018/IJDWM.2020070107

[Article Title: Skeleton Network Extraction and Analysis on Bicycle Sharing Networks/ Kanokwan Malang,Shuliang Wang,Yuanyuan Lv and Aniwat Phaphuangwittayakul, p. 146-167]

Abstract: Skeleton network extraction has been adopted unevenly in transportation networks whose nodes are always represented as spatial units. In this article, the TPks skeleton network extraction method is proposed and applied to bicycle sharing networks. The method aims to reduce the network size while preserving key topologies and spatial features. The authors quantified the importance of nodes by an improved topology potential algorithm. The spatial clustering allows to detect high traffic concentrations and allocate the nodes of each cluster according to their spatial distribution. Then, the skeleton network is constructed by aggregating the most important indicated skeleton nodes. The authors examine the skeleton network characteristics and different spatial information using the original networks as a benchmark. The results show that the skeleton networks can preserve the topological and spatial information similar to the original networks while reducing their size and complexity.

https://doi.org/10.4018/IJDWM.2020070108

[Article Title: Integrating Feature and Instance Selection Techniques in Opinion Mining/ Zi-Hung You,Ya-Han Hu,Chih-Fong Tsai and Yen-Ming Kuo, p. 168-182]

Abstract: Opinion mining focuses on extracting polarity information from texts. For textual term representation, different feature selection methods, e.g. term frequency (TF) or term frequency–inverse document frequency (TF–IDF), can yield diverse numbers of text features. In text classification, however, a selected training set may contain noisy documents (or outliers), which can degrade the classification performance. To solve this problem, instance selection can be adopted to filter out unrepresentative training documents. Therefore, this article investigates the opinion mining performance associated with feature and instance selection steps simultaneously. Two combination processes based on performing feature selection and instance selection in different orders, were compared. Specifically, two feature selection methods, namely TF and TF–IDF, and two instance selection methods, namely DROP3 and IB3, were employed for comparison. The experimental results by using three Twitter datasets to develop sentiment classifiers showed that TF–IDF followed by DROP3 performs the best.

https://doi.org/10.4018/IJDWM.2020070109

[Article Title: Recommender Systems Using Collaborative Tagging/ Latha Banda,Karan Singh,Mohamed Abdel-Basset,Pham Huy Thong,Hiep Xuan Huynh,David Taniar and Le Hoang Son, p. 183-200]

Abstract: Collaborative tagging is a useful and effective way for classifying items with respect to search, sharing information so that users can be tagged via online social networking. This article proposes a novel recommender system for collaborative tagging in which the genre interestingness measure and gradual decay are utilized with diffusion similarity. The comparison has been done on the benchmark recommender system datasets namely MovieLens, Amazon datasets against the existing approaches such as collaborative filtering based on tagging using E-FCM, and E-GK clustering algorithms, hybrid recommender systems based on tagging using GA and collaborative tagging using incremental clustering with trust. The experimental results ensure that the proposed approach achieves maximum prediction accuracy ratio of 9.25% for average of various splits data of 100 users, which is higher than the existing approaches obtained only prediction accuracy of 5.76%.

https://doi.org/10.4018/IJDWM.2020070110

There are no comments on this title.

to post a comment.