Tuesday, January 28, 2020

Machine Learning in Malware Detection

Machine Learning in Malware Detection 1.0 Background Research Malware was first created in 1949 by John von Neumann. Ever since then, more and more malwares are created. Antivirus company are constantly looking for a method that is the most effective in detecting malware. One of the most famous method used by antivirus company in detecting malware is the signature based detection. But over the years, the growth of malware is increasing uncontrollably. Until recent year, the signature based detection have been proven ineffective against the growth of malware. In this research, I have chosen another method for malware detection which is implementing machine learning method on to malware detection. Using the dataset that I get from Microsoft Malware Classification Challenge (BIG 2015), I will find an algorithm that will be able to detect malware effectively with low false positive error. 1.1 Problem Statement With the growth of technology, the number of malware are also increasing day by day. Malware now are designed with mutation characteristic which causes an enormous growth in number of the variation of malware (Ahmadi, M. et al., 2016). Not only that, with the help of automated malware generated tools, novice malware author is now able to easily generate a new variation of malware (Lanzi, A. et al., 2010). With these growths in new malware, traditional signature based malware detection are proven to be ineffective against the vast variation of malware (Feng, Z. et al., 2015). On the other hand, machine learning methods for malware detection are proved effective against new malwares. At the same time, machine learning methods for malware detection have a high false positive rate for detecting malware (Feng, Z. et al., 2015). 1.2 Objective To investigate on how to implement machine learning to malware detection in order to detection unknown malware. To develop a malware detection software that implement machine learning to detect unknown malware. To validate that malware detection that implement machine learning will be able to achieve a high accuracy rate with low false positive rate. 1.3 Theoretical / Conceptual Framework 1.4 Significance With Machine Learning in Malware detection that have a high accuracy and low false positive rate, it will help end user to be free from fear malware damaging their computer. As for organization, they will have their system and file to be more secure. 2.0 Literature Review 2.1 Overview Traditional security product uses virus scanner to detect malicious code, these scanner uses signature which created by reverse engineering a malware. But with malware that became polymorphic or metamorphic the traditional signature based detection method used by anti-virus is no long effective against the current issue of malware (Willems, G., Holz, T. Freiling, F., 2007). In current anti-malware products, there are two main task to be carried out from the malware analysis process, which are malware detection and malware classification. In this paper, I am focusing on malware detection. The main objective of malware detection is to be able to detect malware in the system. There are two type of analysis for malware detection which are dynamic analysis and static analysis. For effective and efficient detection, the uses of feature extraction are recommended for malware detection (Ahmadi, M. et al., 2016). There are various type of detection method, the method that we are using will b e detecting through hex and assembly file of the malware. Feature will be extracted from both hex view and assembly view of malware files. After extracting feature to its category, all category is to be combine into one feature vector for the classifier to run on them (Ahmadi, M. et al., 2016). For feature selection, separating binary file into blocks to be compare the similarities of malware binaries. This will reduce the analysis overhead which cause the process to be faster (Kim, T.G., Kang, B. Im, E.G., 2013). To build a learning algorithm, feature that are extracted with the label will be undergo classification with using any classification method for example Random Forest, Neural Network, N-gram, KNN and many others, but Support Vector Machine (VCM) is recommended for the presence of noise in the extracted feature and the label (Stewin, P. Bystrov, I., 2016). As to generate result, the learning model is to test with dataset with label to generate a graph which indicate detec tion rate and false positive rate. To find the best result, repeat the process using many other classification and create learning model to test on the same dataset. The best result will the one graph that has the highest detection rate and lowest false positive rates (Lanzi, A. et al., 2010). 2.2 Dynamic and Static Analysis Dynamic Analysis runs the malware in a simulated environment which usually will be a sandbox, then within the sandbox the malware is executed and being observe its behavior. Two approaches for dynamic analysis that is comparing image of the system before and after the malware execution, and monitors the malware action during the execution with the help of a debugger. The first approach usually give a report which will be able to obtain similar report via binary observation while the other approach is more difficult to implement but it gives a more detailed report about the behavior of the malware (Willems, G., Holz, T. Freiling, F., 2007). Static Analysis will be studying the malware without executing it which causing this method to be more safe comparing to dynamic analysis. With this method, we will dissemble the malware executable into binary file and hex file. Then study the opcode within both file to compare with a pre-generated opcode profile in order to search for malicious code that exist within the malware executable (Santos, I. et al., 2013). All malware detection will be needed either Static Analysis or Dynamic Analysis. In this paper, we will be focusing on Static Analysis (Ahmadi, M. et al., 2016). This is because, Dynamic analysis has a drawback, it can only run analysis on 1 malware at a time, making the whole analysis process to take a long time, as we have many malware that needed to be analysis (Willems, G., Holz, T. Freiling, F., 2007). As for Static Analysis, it mainly uses to analyze hex code file and assembly code file, and compare to Dynamic Analysis, Static Analysis take much short time and it is more convenient to analyze malware file as it can schedule to scan all the file at once even in offline (Tabish, S.M., Shafiq, M.Z. Farooq, M., 2009). 2.3 Features Extraction For an effective and efficient classification, it will be wise to extract feature from both hex view file and assembly view file in order to retrieve a complementary date from both hex and assembly view file (Ahmadi, M. et al., 2016). Few types of feature that are extracted from the hex view file and assembly view file, which is N-gram, Entropy, Image Representative, String Length, Symbol, Operation Code, Register, Application Programming Interface, Section, Data Define, Miscellaneous (Ahmadi, M. et al., 2016). For N-gram feature, it usually used to classify a sequence of action in different areas. The sequence of malware execution could be capture by N-gram during feature extraction (Ahmadi, M. et al., 2016).   For Entropy feature, it extracts the probability of uncertainty in a series of byte in the malware executable file, these probability of uncertainty is depending on the amount of information on the executable file (Lyda, R.,Hamrock, J,. 2007). For Image Representative feature, the malware binary file is being read into 8-bit vector file, then organize into a 2D array file. The 2D array file can be visualize as a black and gray image whereas grey are the bit and byte of the file, this feature look fo r common in bit arrangement in the malware binary file (Nataraj, L. et al., 2011). For String Length feature, we open each malware executable file and view it in hex view file and extract out all ASCII string from the malware executable, but because it is difficult to only extract the actual string without extract other non-useful element, it is required to choose important string among the extracted (Ahmadi, M. et al., 2016). For Operation Code features, Operation code also known as Opcode are a type of instruction syllable in the machine language. In malware detection, different Opcode and their frequency is extracted and to compare with non-malicious software, different set of Opcodes are identifiable for either malware or non-malware (Bilar, D., n.d.). For Register feature, the number of register usage are able to assist in malware classification as register renaming are used to make malware analysis more difficult to detect it (Christodorescu, M., Song, D. Bryant, R.E., 2005). For Application Programming Interface feature, API calling are code that call the function of other software in our case it will be Windows API. There are large number of type of API calls in malicious and non-malicious software, is hard to differentiate them, because of this we will be focusing on top frequent used API calls in malware binaries in order to bring the result closer (Top maliciously used apis, 2017). For Data Define feature, because not all of malware contains API calls, and these malware that does not have any API calls they are mainly contain of operation code which usually are db, dw, dd, there are sets of features (DP) that are able to define malware (Ahmadi, M. et al., 2016). For Miscellaneous feature, we choose a few word that most malware have in common from the malware dissemble file (Ahmadi, M. et al., 2016). Among so many feature, the most appropriate feature for our research will be N-gram, and Opcode. This is because it is proven that there two feature have the highest accuracy with low logloss. This two feature appears frequently in malware file and it already have sets of well-known features for malware. But the drawback using N-gram and Opcode are they require a lot of resource to process and take a lot of time (Ahmadi, M. et al., 2016). We will also try other feature to compare with N-gram and Opcode to verified the result. 2.4 Classification In this section, we will not review about the algorithm or mathematical formula of a classifier but rather their nature to able to have advantage over certain condition in classifying malware feature. The type of classifier that we will review will be Nearest Neighbor, NaÃÆ' ¯ve Bayes, Decision tree, Support Vector Machine and XGBOOST [21] (Kotsiantis, S.B., 2007) (Ahmadi, M. et al., 2016). As we need a classifier to train our data with the malware feature, we will need to review the classifier to choose the most appropriate classifier that are able to have the best result. The Nearest Neighbor classifier are one of the simplest method for classifying and it is normally implement in case-based reasoning [21]. As for NaÃÆ' ¯ve Bayes, it usually generates simply and constraint model and not suitable for irregular data input, which make it not suitable for malware classification because that the data in malware classification are not regular (Kotsiantis, S.B., 2007). For Decision Tree, it classify feature by sorting them into tree node base on their feature values and each branch represent the node value. Decision Tree will determine either try or false based on node value, which make it difficult to dealt with unknown feature that are not stored in tree node (Kotsiantis, S.B., 2007). For Support Vector Machine, it has a complexity model which enable it to deal with lar ge amount of feature and still be able to obtain good result from it, which make it suitable for malware classification as malware contains large number of feature (Kotsiantis, S.B., 2007). For XGBOOST, it is a scalable tree boosting system which win many machine learning competition by achieving state of art result. The advantage for XGBOOST, it is suitable for most of any scenario and it run faster than most of other classification technique (Chen, T., n.d.). To choose a Classification for our malware analysis, we will be choosing XGBOOST, as it is suitable for malware classification, it also recommended by winner from Microsoft Malware Classification Challenge (Ahmadi, M. et al., 2016). But we will also use Support Vector Machine, as it too is suitable for malware classification and we will use it to compare the result with XGBOOST to get a more accurate result. References   Ahmadi, M. et al., 2016. Novel Feature Extraction, Selection and Fusion for Effective Malware Family Classification. ACM Conference on Data and Application Security and Privacy, pp.183-194. Available at: http://doi.acm.org/10.1145/2857705.2857713. Amin, M. Maitri, 2016. A Survey of Financial Losses Due to Malware. Proceedings of the Second International Conference on Information and Communication Technology for Competitive Strategies ICTCS 16, pp.1-4. Available at: http://dl.acm.org/citation.cfm?doid=2905055.2905362. Berlin, K., Slater, D. Saxe, J., 2015. Malicious Behavior Detection Using Windows Audit Logs. Proceedings of the 8th ACM Workshop on Artificial Intelligence and Security, pp.35-44. Available at: http://doi.acm.org/10.1145/2808769.2808773. Feng, Z. et al., 2015. HRSà ¢Ã¢â€š ¬Ã‚ ¯: A Hybrid Framework for Malware Detection. , (10), pp.19-26. Han, K., Lim, J.H. Im, E.G., 2013. Malware analysis method using visualization of binary files. Proceedings of the 2013 Research in Adaptive and Convergent Systems, pp.317-321. Kim, T.G., Kang, B. Im, E.G., 2013. Malware classification method via binary content comparison. Information (Japan), 16(8 A), pp.5773-5788. Kà ¼Ãƒ §Ãƒ ¼ksille, E.U., Yalà §Ãƒâ€žÃ‚ ±nkaya, M.A. Uà §ar, O., 2014. Physical Dangers in the Cyber Security and Precautions to be Taken. Proceedings of the 7th International Conference on Security of Information and Networks SIN 14, pp.310-317. Available at: http://dl.acm.org.proxy1.athensams.net/citation.cfm?id=2659651.2659731. Lanzi, A. et al., 2010. AccessMiner: Using System-Centric Models for Malware Protection. Proceedings of the 17th ACM Conference on Computer and Communications Security CCS10, pp.399-412. Available at: http://dl.acm.org/citation.cfm?id=1866353%5Cnhttp://portal.acm.org/citation.cfm?doid=1866307.1866353. Nicholas, C. Brandon, R., 2015. Document Engineering Issues in Document Analysis. Proceedings of the 2015 ACM Symposium on Document Engineering, pp.229-230. Available at: http://doi.acm.org/10.1145/2682571.2801033. Patanaik, C.K., Barbhuiya, F.A. Nandi, S., 2012. Obfuscated malware detection using API call dependency. Proceedings of the First International Conference on Security of Internet of Things SecurIT 12, pp.185-193. Available at: http://www.scopus.com/inward/record.url?eid=2-s2.0-84879830981partnerID=tZOtx3y1. Pluskal, O., 2015. Behavioural Malware Detection Using Efficient SVM Implementation. RACS Proceedings of the 2015 Conference on research in adaptive and convergent systems, pp.296-301. Santos, I. et al., 2013. Opcode sequences as representation of executables for data-mining-based unknown malware detection. Information Sciences, 231, pp.64-82. Stewin, P. Bystrov, I., 2016. Detection of Intrusions and Malware, and Vulnerability Assessment, Available at: http://dblp.uni-trier.de/db/conf/dimva/dimva2012.html#StewinB12. Willems, G., Holz, T. Freiling, F., 2007. Toward automated dynamic malware analysis using CWSandbox. IEEE Security and Privacy, 5(2), pp.32-39. Tabish, S.M., Shafiq, M.Z. Farooq, M., 2009. Malware detection using statistical analysis of byte-level file content. Proceedings of the ACM SIGKDD Workshop on CyberSecurity and Intelligence Informatics CSI-KDD 09, pp.23-31. Available at: http://portal.acm.org/citation.cfm?doid=1599272.1599278. Lyda, R.,Hamrock, J,. 2007.Using Entropy Analysis to Find Encrypted and Packed Malware. Nataraj, L. et al., 2011. Malware Imagesà ¢Ã¢â€š ¬Ã‚ ¯: Visualization and Automatic Classification. Bilar, D., Statistical Structuresà ¢Ã¢â€š ¬Ã‚ ¯: Fingerprinting Malware for Classification and Analysis Why Structural Fingerprintingà ¢Ã¢â€š ¬Ã‚ ¯? Christodorescu, M., Song, D. Bryant, R.E., 2005. Semantics-Aware Malware Detection. Top maliciously used apis. https: //www.bnxnet.com/top-maliciously-used-apis/, 2017. Weiss, S.M. Kapouleas, I., 1989. An Empirical Comparison of Pattern Recognition , Neural Nets , and Machine Learning Classification Methods. , pp.781-787. Kotsiantis, S.B., 2007. Supervised Machine Learningà ¢Ã¢â€š ¬Ã‚ ¯: A Review of Classification Techniques. , 31, pp.249-268. Chen, T., XGBoostà ¢Ã¢â€š ¬Ã‚ ¯: A Scalable Tree Boosting System.

Monday, January 20, 2020

Charles Dickens, The Old Curiosity Shop :: Free Essays Online

Charles Dickens, The Old Curiosity Shop Charles Dickens 1841 novel The Old Curiosity Shop, entering its third century, mesmerizes readers with either heartfelt sentimentality to the plight of a homeless thirteen year-old girl, Nell Trent, and her aged Grandfather, as they wander the countryside of England, keeping one step ahead of their horrible dwarf nemesis, Daniel Quilp; or as a "crude sentimental" (Harris 137) journey down the path of individual weakness that lead to the death of them both. In Dickens day, a curiosity shop was an establishment where individuals would go to purchase precious or antique gifts, and it is in one of these shops that thirteen year-old Nell lived with her Grandfather. A short summary of the story is that the Grandfather has an addictive gambling problem, and gambles the money needed to run the shop away, all the while borrowing money from Daniel Quilp, a nasty goblin type figure of a man. The losses amount to the shop being taken over by Quilp, leaving Nell and the Grandfather fleeing to avoid him. They wander the English countryside amongst the throng of carnivals, sideshows, philanthropic souls who try to help them, and downtrodden people who try to exploit them. Their deaths, Nell’s especially, whose Dickens wrote of in a lingering, sentimental tone, are where the discussion of the book has been centered on for over a century-and-a-half. The Old Curiosity Shop began as a series of short stories in a publication Dickens created in 1840 called Master Humphrey’s Clock. With a weekly circulation of over 70,000 readers, Dickens was able to finance the work of The Old Curiosity Shop with the income made from Master Humphrey’s Clock. Emotionally, working under a strenuous monthly deadline proved to be straining on Dickens. In July of 1840, Dickens was telling his friend Lord Jeffrey, editor of The Edinbergh Review, that The Old Curiosity Shop "demands my constant attention" (Page 22,23), and by December of that year Dickens seemed to be on the edge of a mental collapse, telling Lord Jeffrey that the "anguish" of writing under the pressure was "unspeakable, the difficulty tremendous" (Page 30). The story was completed in early 1841 and Dickens began the painstaking steps in putting the short stories into a complete novel. One of the immediate obstacles Dickens encountered (actually his printers, Dickens was busy completing his next novel, Barnaby) was marrying the chapters together in proper sequence.

Sunday, January 12, 2020

Coconut Tree Essay

The coconut tree as a â€Å"tree of life† is characteristically a food supplier as this tree provides fruit and is well-known to be devoid of any anti-nutrient factors and is known as a whole food with 5,000 years of recorded use in food preparation with health benefits.The fruit is edible at any stage of maturity. It provides not only a solid food but a large volume of very safe and healthy drinking water-based juice. The fruit of the coconut palm is the main source of many food products such as coconut milk/cream, desiccated coconut, coconut chip, coconut water, nata de coco, coconut oil, copra, etc. Apart from these, the unopened inflorescence can produce coconut sap or toddy (tuba) which can be processed into high value and nutritious food products. Coconut sap sugar, considered to be one of the best natural sweeteners, is truly a perfect and healthier substitute for artificial sweeteners because it is not a product of chemical laboratories, not an artificial sweetener and not a by-product of sugar cane, not brown sugar nor muscuvado sugar. Coco sugar is good for both diabetic and non-diabetic consumers because it does not induce high blood sugar because it has a Low Glycemic index. Glycemic index (GI) is a numerical system of measuring how much of a rise in circulating blood sugar a carbohydrate triggers-the higher the number, the greater blood sugar response. So a low GI food will cause a small rise, while a high GI food will trigger the opposite. GI is about the quality of the carbohydrates, not the quantity. Coco sugar can be good for weigh maintenance. (Dr. Trinidad P. Trinidad, Scientist II of the Food and Nutrition Research Institute – Department of Science and Technology.) It is also rich in various amino acids, vitamins and minerals that are essential for the human body to benefit.

Friday, January 3, 2020

History Of Globalization Of NIGERIA Free Essay Example, 1000 words

This movement at first was multiethnic, although it was bound in the South in the period 1930-1944, a time when the actual incomes of most participants in the money economy of Nigeria fell due to deterioration in the net terms of barter trade. During this period, the Great Depression reduced Britain’s imports, investments, and spending in Nigeria. It was not until in the 1970s that Nigeria started participating in the second global economy. It has maintained being a major participant in the International oil industry from the 1970s. Nigeria also maintains association with the Organization of the Petroleum Exporting Countries (OPEC), an association that it joined in 1971. Its status as a key petroleum producer reflects highly in its international relations with both developing countries like Jamaica, Ghana, and Kenya and developed nations such as China and the United States. For more than five years now, Nigeria has been a minor player in the global emerging equities markets, bonds, and loan syndications. Its external funding using this private sources especially loan syndications was approximately less than partially a billion dollar per annum. We will write a custom essay sample on History Of Globalization Of NIGERIA or any topic specifically for you Only $17.96 $11.86/pageorder now Nigeria got 0.5% of GDP in external financing. Its low access to such resources of external funding was due to global credit rating that rendered Nigeria non-creditworthy. This country was rated BB in 2005 for the first time. Nigeria could have been an active capital exporter for the last 30 years if it was stable politically. It is precisely known that capital from Sub-Saharan Africa and Nigeria has been enormous over the last three decades (Bernholz 109). After the Second World War, Nigeria experienced some political and economic systems that have been the major causes of economic decline. In the 1990s, the military regime caused bad governance and political instability. Prolonged military rule led to political and economic decline and stagnation (Ihonvbere 45). However, the beginning of the elected regime in the early 21st century after about 30years of military rule has enabled Nigeria to arrest the decline in her socio-economic growth and focus on economic revival. The end of the military regime, the arrival of an elected civilian management, the renovated national commitment to the country’s development combined with the nation’s human and natural endowments offer a basis for optimism that these regimes will be successful in the attempt to attract foreign investment to speed up the process of reviving global economy in Nigeria.