Monday, January 27, 2020

Improving the Accuracy of Arabic DC System

Improving the Accuracy of Arabic DC System The main goal of this research is to investigate and to develop the appropriate text collections, tools and procedures for Arabic document classification. The following specific objectives have been set to achieve the main goal: To investigate the impact of preprocessing tasks including normalization, stop word removal, and stemming in improving the accuracy of Arabic DC system. To introduce a novel technique for Arabic stemming in order to improve the accuracy of the document classification system. The new algorithm for Arabic stemming tries to overcome the deficiencies in state-of-the-art Arabic stemming techniques and dealing with MWEs, foreign Arabized words and handling the majority of broken plural forms to reduce them into their singular form. To use Arabic text summarization technique as feature reduction technique to eliminate the noise on the documents and select the most salient sentences to represent the original documents. To explore the impact of different feature selection techniques on the accuracy of Arabic document classification and proposes and implements a new variant of Term Frequency Inverse Document Frequency (TFIDF) weighting methods that take into account the important of the first appearance of a word and the compactness of the word which can be taken as factors that determine the important features in the document. To implement various classifiers and compares their performances. 1.1.Problem Statement Despite the achievements in document classification, the performance of document classification systems is far from satisfactory. document classification tasks are characterized by natural languages. This means DC is closely related to natural language processing (NLP) which require knowledge of its subject matter. In general NL reveals many of syntactic and semantic ambiguities beside the complexities [45]. In the context of DC, a researcher tries to address various problems arising from characteristics of documents in the process of feature extraction and feature representation; or problems emanating from the classification algorithms. The following sections provide ideas on research problems. 1.1.1. Preprocessing Text Problem The preprocessing stage is a challenge and affects positively or negatively on the performance of any DC system. Therefore, the improvement of the preprocessing stage for highly inflected language such as the Arabic language will enhance the efficiency and accuracy of the Arabic DC system. In spite of the lack of standard Arabic morphological analysis tools most of the previous studies on Arabic DC have proposed the use of preprocessing tasks to reduce the dimensionality of feature vectors without comprehensively examining their contribution in promoting the effectiveness of the DC system. One of the challenges facing the researchers in Arabic document classification systems is the absence of a strong and an effective stemming algorithm. Arabic is morphologically a complex language [46], it uses both kinds of morphologies: inflectional and derivational morphologies. Based on these types of morphology, a single word may yield hundreds or even thousands of variant forms [47]. The impor tance of using the stemming technique in the documents classification lies in that it makes the processes less dependent on particular forms of words and reduces the highly dimensionality of the feature space, which, in turn, enhance the performance of the classification system.   In spite of the rapid research conducted in other languages, Arabic language still suffers from the shortages of researchers and development.   The state-of-the-art Arabic stemmers suffer from high stemming error-rates due to its understemming errors, overstemming errors, ignored the handling of multiword expressions (MWEs), broken plural forms, and Arabized words. Therefore, the limitations of the current Arabic stemming methods have motivated this author to investigate a novel technique for Arabic stemming to be used in the extraction of the word roots of Arabic language in order to improve the accuracy of the document classification system in chapter 5. 1.1.2. Highly Dimensionality of the Feature Space Extremely high dimensional features paces and large volumes of data problems occur in automatic document classification. High dimensionality problems arise because the number of features used in the classification process increases along with dimensionality of the feature vectors[13, 15, 48, 49]. Practical examples show that the number of features consisting the dimensionality could amount to thousands. A large number of features are irrelevant to the classification task and can be removed without affecting the classification accuracy for several reasons: First, the performance of some classification algorithms is negatively affected when dealing with a high dimensionality of features. Second, an over-fitting problem may occur when the classification algorithm is trained in all features. Finally, some features are common and occur in all or most of the categories [50]. In order to solve this problem, the feature vector dimensionality is required to be reduced without degradation of classification performance. It was important to extract the features with high discriminating power using various techniques.   Text summarization, feature selection and feature weighting are common techniques and methods that are used in document classification to reduce the highly dimensionality of the feature space and to improve the efficiency and accuracy of the classification system. The term frequency (TF) weighted by inverse document frequency (IDF) which is abbreviated as TFIDF can partially solve the problem of variation in content and length in the documents but it cannot solve the problem of the distribution of the important words within the document. In general, the document is written in an organized manner to describe its main topic(s). For example, the main topic for news articles may mentions at the title and the first part of the document to draw the attention of the reader. Therefore, depending on the location, the document parts may have different degrees of contribution to the documents main topic(s) [51]. In this thesis, we propose new feature weighting methods that treat the problem of the distribution of the important words within the document in chapter 6. In order to satisfy the objectives stated in this research, the research questions of this study can be summarized as: What are the impact of text preprocessing techniques such as normalization, stop word removal, and stemming in improving the performance of Arabic DC system? What are the available Arabic text preprocessing methods to be implemented in this research? What are their advantages and disadvantages? How to compare and improve their performance in order to improve the accuracy of the Arabic documents classification system? What are the Impact of feature reduction techniques on Arabic document classification? How to overcome the problem of the highly dimensionality of the feature space and the difficulty of selecting the important features for understanding the document? Which classification algorithms have the best performance when applied on different representations of Arabic dataset? 1.2.Research Contribution This research focuses on exploring different preprocessing techniques, dimensionality reduction techniques and investigating their effect on Arabic document classification performance. More specifically, the main contributions of this thesis are as follows: Demonstrate that using preprocessing task such as normalization, stop word removal, and stemming for Arabic datasets have a significant impact on the classification accuracy, especially with complicated morphological structure of the Arabic language. Furthermore, we demonstrate that choosing appropriate combinations of preprocessing tasks provides significant improvement on the accuracy of document classification depending on the feature size and classification techniques. In this thesis, we propose a novel stemmer for Arabic documents classification. The proposed stemmer attempts to overcome the weaknesses of root-based stemming technique and light stemming technique, in addition to dealing with the majority of broken plural forms, MWEs, and foreign Arabized words. We compare the proposed stemmer with the well-known Arabic stemmers, including root-base stemming (Khoja stemmer) and light stemming (Larkey stemmer), to study its contribution in improving the classification system. The comparison is carried out for different datasets, classification techniques, and performance measures. Demonstrate that using document summarization technique help to improve the efficiency of Arabic document classification by reducing the highly dimensionality of the feature space without affecting the value or content of documents, then saving the memory space and execution time for documents classification process. In this thesis, we investigate the impact of different feature selection techniques, namely, Information gain (IG), Goh and Low (NGL) coefficients, Chi-square Testing (CHI), and Galavotti-Sebastiani-Simi Coefficient (GSS) that have a significant impact on reducing the dimensionality of feature space and thus improve the performance of Arabic document classification system. In this thesis, we investigate the impact of feature representation schemas on the accuracy of Arabic document classification. The document usually consists of several parts and the important features that more closely associated with the topic of the document are appearing in the first parts or repeated in several parts of the document. Therefore, the proposed weighting methods take into account the important of the first appearance of a word and the compactness of the word which can be taken as factors that determine the important features in the document. Unfortunately, there is no free benchmarking dataset for Arabic documents classification. One of the aims of this research is to compile dataset for Arabic documents classification that cover different text genres which will be used in this research and can be used in the future as a benchmark for computation linguistics researches including text mining, information retrieval. The dataset collected from several published papers for Arabic document classification and from scanning the well-known and reputable Arabic websites. Compiling freely and publically available corpora is advancement step on the field of Arabic document classification.

Sunday, January 19, 2020

Web Calculator Exercise 2

Question 1 a. Mean age = 960/20=48 b. Standard Deviation = 10. 74832 Web address: http://easycalculation. com/statistics/standard-deviation. php Frequency distribution table for denomination. Score f(frequency) 1 1 2 2 4 2 5 1 6 3 7 3 8 1 9 3 10 3 12 1 N=30 c. What is the percentage of people who identify themselves as Baptist? 3/20 = . 15 x 100 = 15% What is the mode of church attendance? 5 Question 2 a. What is the Z score for a car with a price of $33,000? Z=2. 85714286 b. What is the Z score for a car with a price of $30,000? Z=2 Web address for calculator: http://www. danielsoper. com/statcalc3/calc. spx? id=22 c. At what percentile rank is a car that sold for $30,000? 97. 72% Web calculator used: http://easycalculation. com/statistics/zscore-to-percentile. php 3. One student’s Math score was 70 and the same individual’s English score was 84. On which exam did the student do better? Math: +3 points divided by 9. 58 SD = . 3132 English: + 6 points divided by 12. 45 SD = . 482 The student did better on the English test. 4. Suppose you administered an anxiety test to a large sample of people and obtained normally distributed scores with a mean of 45 and standard deviation of 4. Do not use web-calculator to answer the following questions.Instead, you need to use the Z distribution table in Appendix A in Jackson’s book. a. If Andrew scored 45 on this test. What is the Z score? Z=45-45 z=0 4 b. If Anna scored 30 oh this test. What is her Z score? Z=30-45 Z=-3. 75 4 c. If Bill’s Z score was 1. 5, what is his real score on this test? 1. 5 = x-45 X=51 4 d. There are 200 students in a sample. How many of these students will have scores that fall under the score of 41? Z=41-45 Z= -1 According to Appendix A . 159 x 200 Answer = 31. 8 fall under 41 4 5. Obtain the Persaon’s r and coefficient of determination for the following relationships. . Between the IQ and psychology scores. r=. 59231 Determination= . 35084 WEB: http://easycalcul ation. com/statistics/r-squared. php b. Between the IQ and statistics score. r= . 73667 Determination= . 54268 WEB: http://easycalculation. com/statistics/r-squared. php c. Between the psychology scores and statistics scores. r= . 71050 Determination = . 50480 WEB: http://easycalculation. com/statistics/r-squared. php 6. Using a web-calculator, obtain the appropriate correlation coefficients. r= . 85190 http://easycalculation. com/statistics/r-squared. php

Saturday, January 11, 2020

Battle of King’s Mountain

The Battle of King's Mountain Major Ferguson of the Loyalist Militia was tasked with raising and organizing Loyalist units from the backcountry of South Carolina to help prtotect the British General Cornwallis. Ferguson gathered a few Tory units and marched towards Gilbert Town, North Carolina, where he set up a base camp. He issued a command to the opposition forces to lay down their weapons. If they refused he stated he would, â€Å"lay waste to their country with fire and sword. † Patriot militia leaders John Sevier and Isaac Shelby sent word to William Campbell in Virginia to aid an attack on Major Ferguson.Many more more militiamen and local gunmen were rallied by the Patriot leaders. These some 1,400 men became what was known as the â€Å"Mountain Men†. Among these hundreds of men were two traitors who deserted the Patriots and ran off the Gilbert Town to alert Ferguson of the mass of militia converging on him. The Major called for a full retreat to Charlotte, and requested reinforcements for General Cornwallis. The message did not reach Cornwallis until a day after the battle. The Patriot militia recieved word of Ferguson's retreat and urged on to try to catch him.Instead of reaching Charlotte, Ferguson's force camped at King's Mountain where they set camp just west of the mountain's highest point. In a rush to reach the Loyalist regime the Patriots sent over 900 men on horseback throughout the night and the next morning until they reached King's Mountain. The Mountain Men surrounded the camp and attacked. The Patriots formed eight detachments to fully surround the Loyalist camp. British Major Ferguson's force only consisted of Loyalists, not British Red Coats, the majority of which were just rallied days before from South Carolina.The rebel force charged up the mountain screaming and firing their muskets from behind natural barricades. The Loyalists were unaware and were caught off gaurd; Ferguson rallied his troops and led charges down th e hill. Lesser armed, the Patriots retreated to the forest until charging up the hill once again. A pattern formed in the battle in which rebel forces charged up the hill causing a Loyalist charge down the hill. The steep slope of the mountain caused the Loyalists to overshoot and completely miss the charging Patriots, also it became hard to lock on to a target which was in no form and never was in one place.An hour of firing resulting in large losses to the Loyalist force. However, Ferguson felt confident and would not allow a surrender. He continued charging until he was shot off his horse dead before he hit the ground. Eventually the Patriots overwhelmed the leaderless Loyalists and gained a surrender. The Battle at King's Mountain was a decisive victoy for the Patriot army and quite a significant win. There was such an enormous amount of bloodshed mainly because of the Patriots' hunger for retaliation after Banastre Tarleton massacred many continental soldiers. The defeat of Maj or Ferguson helped win the future battle at Cowpens, SC.Ferguson's militia was supposed to help cover General Cornwallis's flank. King's Mountain helped flip thhe momentum in the American south in the favor of the Patriots. The â€Å"Mountain Men† were able to destroy the Loyalists using what is one of the early accounts of â€Å"guerilla warfare†. British led troops were so used to fighting direct battles against lined troops, but the evasive and morphing attack of the Patriots is what decided their fate. The Battle of King's Mountain will forever stand as one of the pivotal battles of America's fight for freedom and of American history.

Friday, January 3, 2020

Romantic Poetry - 807 Words

Romantic Poetry Introduction Romantic poetry tends to embrace certain particular themes, and one of the main themes found in romantic poetry is the sublime (addressing male themes of reason, strength, and fortitude); another main theme is the feminine, which tends to represent beauty and domesticity. This paper explores the theme of the sublime, which has been employed effectively and creatively by both male and female poets. The Sublime in Romantic Poetry Scholars know that not all worthy romantic poetry has been created by males, writes Christopher John Murray in his book the Encyclopedia of the Romantic Era, 1760-1850. And although the most well-known romantic poets happen to be British males like Coleridge and Wordsworth, for example there were notable female poets who also used the sublime (also referred to as masculine) theme and who were brilliant at their artistry albeit in the shadows of the male poets who got more attention. One of those female poets of note was Helen Maria Williams (1761-1827). She wrote poetry from the romantic genre about the horrors of the slave trade, she wrote about the Spanish colonialism in native people in South America, and she wrote about the French Revolution. In fact Napoleon had her arrested in 1802 for her poem Ode on the Peace of Amiens. Her poem, A Song, embraces the sublime / masculine theme. No riches from his scanty store / My lover would impart / He gave me a boon I valued more / He gave me all hisShow MoreRelatedThe Elements Of Poetry For Romantics984 Words   |  4 Pageswere quite a few reasons behind the societal purpose of poetry for Romantics. One of the biggest reasons was that they wanted society to pick up on romanticism through the art of poetry. The purpose of art is to teach us something anyway, like public life for example. The best way to pick up on romanticism was through the art of poetry at the time. Romantics wanted nothing more than people feel some emotion and self-awareness, and by writing poetry to do just that, that’s how they attempted to get toRead MoreChanging Characteristics of Poetry from Modern to Romantics3272 Words   |  14 Pages Topic: Changing characteristics of poetry from Romantics to Modern Abstract: The characteristics of poetry changed with the changing of eras and literary periods. Romantics have their own features and writing style. Nature and beauty play very important role in Romantic poetry. Victorian poetry is different from Romantics because its themes are about Victorian age, which is influenced by democracy, evolutionary sciencesRead MoreNature vs. Society: Wordsworths Romantic Poetry1646 Words   |  7 PagesNature Vs. Society: Wordsworth’s Romantic Poetry Over time, poetry has changed and evolved in its sense of the word nature. In its beginnings the idea of nature or natural was seen as negative and evil. However, in more recent times due to the era of Romanticism, nature in poetry is viewed in a positive and even beautiful light. William Wordsworth was a poet who wrote his poetry with a romantic attitude. Furthermore Wordsworth wrote specifically the poems â€Å"We Are Seven† (WAS) and â€Å"Three YearsRead MoreEssay on animals in romantic poetry566 Words   |  3 Pagesanimals in romantic poetry Many Romantic poets expressed a fascination with nature in their works. Even more specific than just nature, many poets, such as William Blake, Robert Burns, and Samuel Taylor Coleridge all seemed fascinated with animals. Animals are used as symbols throughout poetry, and are also used to give the reader something to which they can relate. No matter what the purpose, however, animals played a major part in Romantic Poetry. William Blake used animals as basicRead MoreBritish Romantic Poetry As A Revolutionary Part Of England s Culture Essay1489 Words   |  6 PagesBritish romantic poetry was remarkable for a myriad of reasons. Not only did it vouch for a focus on nature in literature, but also showed an increased interest in both the emotion of the average person, and a heightened esteem for imagination as well as the wonder and amazement that accompanied children. Of course, it showed a darker side of the world as well, with some of the more distinguished writers focusing on the poor and how they lived. Stylistically, there was also a clear influence fromRead MoreAnalyzing Romantic Poetry: Shelley Essay1478 Words   |  6 Pagesâ€Å"Stanzas, Written in Dejection, near Naples†, Percy Bysshe Shelley remains as one of the most influential poets today. A man on the Romantic Era, Shelley’s reflective poetry earns him the title of the imaginative radical during that time, centering his po etry on restrictions in society and humanity’s place in the universe. (Abrams 428) In his lifetime, Shelley and his poetry exemplified intelligence, logical thinking, earnestness, and curiosity, all qualities which had engendered from a life of studiesRead MoreRepresentations of Romantic Love in Poetry Across the Periods1480 Words   |  6 PagesRomantic love has been the subject of endless contemplation for poets of all periods. Intangible and complex, love is the highest manifestation of humanity. No topic in poetry has received more attention than romantic love. Conversely, the ultimate expression of love is through poetry. In each poetic period, the representation of romantic love has been informed by the social and cultural values of the time. Thus, across time, attitudes towards romantic love have shifted with changing values and beliefsRead MoreEssay on Romantic Contradiction in the Poetry of John Donne1010 Words   |  5 PagesRomantic Contradiction in the Poetry of John Donne John Donnes poem Elegy 19: To His Mistress Going to Bed is closely related to The Sun Rising in its treatment of love, lust, and togetherness. Both discuss and argue different stances on the same topics, but are united by their language and development. The structure of Elegy 19 and use of poetic techniques relate it directly and indirectly to The Sun Rising. In Elegy 19, there are forty-eight lines of adoration of the mistressRead More Comparing Wordsworth and Keats’ Romantic Poetry. Essay1102 Words   |  5 PagesComparing Wordsworth and Keats’ Romantic Poetry. Both Wordsworth and Keats are romantic Poets, they express ideas on nature and send us the message to respect it. They say we have to admire the beauty of nature in different ways. Wordsworh uses simpler language in his poems wether to express simple or complex ideas, by which we understand he aimed his poems to lower classes. Keats instead, uses much more complex language to describe and express his ideas, so we know he aimed his poems toRead MoreEssay on The Themes of Love in Romantic and Victorian Poetry899 Words   |  4 PagesThe Themes of Love in Romantic and Victorian Poetry Within this essay I shall be comparing the themes of love used in ‘Red, Red Rose’ by Robert Burns, ‘Remember’ by Christina Rossetti, ‘So We’ll Go No More A-Roving’ by Lord Bryon, ‘Sonnet XVIII’ by William Shakespeare and ‘Sonnets from the Portuguese XLIII’ by Elizabeth Barrett Browning. To do this I will analyse the different themes of love portrayed by each poet, how the love is declared and explore the ways in which