Hiran V Nath
Assistant Professor
MB209, Department of Computer Science and Engineering, National Institute of Technology Calicut, NIT Campus PO, Calicut, Kerala - 673601, India +91-0495-2286819 hiranvnath[at]nitc[dot]ac[dot]in (email)
MB209, Department of Computer Science and Engineering, National Institute of Technology Calicut, NIT Campus PO, Calicut, Kerala - 673601, India +91-0495-2286819 hiranvnath[at]nitc[dot]ac[dot]in (email)
Portable Document Format (PDF) is used as a defacto standard for sharing documents. Even though pdf is a document description language, it has lot of features similar to programming language. With the add on support of JavaScript (Malicious script) and the facility to embed any file into a PDF document, creates a big potential for disastrous cyber attacks. From 2008 onwards, the malicious users are concentrating more on embedding malicious codes into pdf documents. Compared to PE, pdf files pose higher risk since the embedded content can be encrypted and/or encoded. Recently multistage delivery of malware is used for APTs and targeted attacks. Here pdf documents are used for accomplishing one or more stages, like mini-duke, where pdf file was used for first stage. It went undetected for almost two years. These files could be considered as a carrier of k-ary codes. In this paper, we bring out the importance of analyzing the data encoded in the stream tag along with other structural information. We are giving a proof of concept by embedding JavaScript into PDF document. This is not detected by any of the existing pdf parsers. Finally, we propose ensemble learning for detecting such pdf files.
n/a