Why do people keep making libraries which can only CREATE pdf files? I want one which can parse pdfs and export text/images with layout.
Tag Archive for 'PDF'
Here is an example of using new AIR2 feature which allows you to run system native processes from your application. The guy uses a Java tool to grab screen images.
This might not be that exciting, but Java as AIR runs everywhere and this method allows you to basically use Java libraries in your Flash project. I need to parse PDF files for my needs and I am pretty much desperate on this one. But using Java might be a way out. There are plenty of libraries.
Flex UI and some Java logic should be good.
… says computerworld.com.
This wouldn’t be funny if it was 100% true. I found this link (and this link) at a Russian IT community where the majority of people doesn’t really like Flash. So if they see “flash” and “flaw” in a topic title they usually behave like Pavlov’s dogs. I saw a lot of comments that Flash sucks, that it’s good time to install flash-blocking plugins and start using Silverlight or Java. But it seems nobody really knows what is going on.
- This is actually not a bug
- This is not a Flash specific flaw
- This is a really old vulnerability
But nooo, people keep running around screaming and cursing Flash.
This is so funny reading you guys. You don’t know much about the subject but keep making weird conclusions. During his or her life a person once understands that “you sleep better if you know less” (there should be an English proverb like that). If you are saying that this is a “fundamental flash bug” you might not know that it is not even a tip of an iceberg.
If you spend half a day googling and reading books you might find out that danger is actually everywhere, you just don’t know about it assuming that all hardware and software is flawless. Well, this “flash” vulnerability is just one in the huge family of Cross-site Scripting vulnerabilities. There were a lot of them found and a lot of them are still hiding within your favorite browsers. Everything executed on client side is vulnerable to XSS attacks. The most common technology is JavaScript, almost every other attack involves JavaScript too, like using it to retrieve sensitive data from Flash, Java or Silverlight.
So, how does it work? An XSS attack is done using a vulnerability in client software which allows an attacker to inject malicious code into client’s trusted zone and execute it. Security system thinks that if this code is executed from example.com it is trusted and can have access to all data associated with this domain.
How is this related to Flash? This is related to Flash as much as it is to every other client-side technology, computerworld just took one and started blaming it. So, I somehow upload my SWF to example.com. If it allows me to upload SWFs of course. If it doesn’t I can either rename it to something else or join with something else, apparently Flash Player can load a file of any extension placed in src attribute of embed tag. Anyway, if this SWF goes into /uploads folder of example.com it is considered to be trusted, because someone long time ago assumed that if a file is within public access of a domain it could be uploaded there only by that domain’s admin, which is not usually true as we see. This SWF now has access to everything related to example.com via JavaScript. This would be really stupid to display it without allowscriptaccess=never on example.com itself. But the article above says that this SWF is loaded with an external link not from example.com. For example, someone pretending to be your friend John sends you a link which points to this SWF being loaded from example.com where you are logged in right now. Congratulations, the SWF just stole your cookie.
If I’m not mistaken, this vulnerability can be easily fixed if user uploaded content is kept on a separate subdomain.
This is not related just to Flash. As you see a lot of JavaScript is involved. And you definitely can do the same using Java. But what’s more, as I said there are ways to bypass server’s uploading restrictions. For example, it’s possible to combine GIF and JAR (it’s actually a ZIP file) or PDF and JAR into one file so it looks like a perfect GIF (PDF) and can be executed as a perfect JAR. Did you know that? Did you know that there are still a lot of vulnerabilities in your favorite browsers? I don’t even want to mention HUGE number of sites made by newbies which have absolutely no security, they allow SQL injections and XSS JavaScript Injections. And you trust them your private information and credit card numbers? Did you know that it’s even possible to trick Google and find out your password? Did you know that saved passwords in Firefox can be retrieved by hackers too?
Did you know that these are not even 1/10 of all vulnerabilities? But why have nobody hacked you yet? Probably someone already did, you just don’t know that. Or nobody is interested in you.
How is this related to Flash? This is not Adobe’s fault. Of course they can come up with something involving crossdomain.xml and even more restrictive policies, but you can just upload your own crossdomain.xml to example.com as you did with your SWF. This is not again Adobe’s flaw. This is probably because in the beginning of Internet all basic protocols were not designed with security in mind. And now people invented new ones and upgraded old ones fixing leaks here and there.
Sites owners must develop their projects with security in mind and not just blame Flash. This is stupid.
And the last one. I actually USED this vulnerability long-long time ago against one of Flash discussion boards. Once again, this is not new!
Since there’s no perfect tool for me I decided to create one myself.
If you want something done do it yourself.
Right now the only platform I can create application for without additional training is Flash/Flex/AIR. AIR looks perfect for this purpose, but I doubt flash can handle some of the features I need. Here’s the list of features from my previous post.
- Be able to create rich text nodes,
- Insert images, movies, audio files and PDFs,
- Have everything searchable,
- Tag everything,
- Be able to sort nodes to folders and smart folders,
- Be able to crosslink files,
- Add small notes (as clouds) to everything,
- Annotate PDFs,
- Have a usable intuitive interface,
- Run on a mac offline.
AIR can run on Mac, with Flash I can create any interface I need, I can show images, movies and mp3 files, I can edit text in a WYSIWYG mode (though I have yet to find a good one, and not only me), I can store data locally and online with full search index. But there’s huge problem with PDF support. Seriously, Flash PDF support sucks.
The plan.
First, I need to research a lot more about handling the PDF issue. If I fail to find a good solution I should probably stop and spend time learning Objective-C for native OS X programming. Right now there are several possible ways to go:
- pdf2swf library. It comes with C source code. The easiest way is to invoke it from command line from AIR application but this is not right. I might be able to modify it a bit and compile to SWC using Alchemy. The problem is that I don’t know C and have never tried to do anything with Alchemy. pdf2swf has some issues too, especially with fonts.
- Port a Java PDF library. With Flash 10 it is much easier to port a Java library to AS3, there’s even a special syntax converter and there should be AS3 implementations of core Java classes. This should be much easier than rewriting from scratch. This is what PDFCase has done, but I can’t contact its author. It is a port of PDFBox Java library. I played a bit with it but am not sure still what exactly it parses PDFs to. I am getting a lot of COSObjects and don’t know what they contain and how to work with them. Also, it takes 20 seconds to parse a 60 pages PDF file with several images. So in a port there must be a way to break the whole task into small subtasks not to hang the player. And I still have to understand how PDFBox works to start porting it.
- Write my own PDF parser. From what I saw PDF format is complicated. And I couldn’t find a free format spec.
Another question is what to do after I parsed a PDF file to simple instructions. I need to render them with Flash. This might be a problem too, especially with fonts.
Well, if I find a solution to the problem above the rest is pretty straightforward: prototype the interface, use Flex and Swiz to code it, fix bugs.
Afterthoughts.
I think that AIR might be not the tool of choice (see the irony?) for this type of project. I might need to learn Objective-C and code this application as a native OS X app, because there are frameworks to work with PDFs for Objective-C. For example PDFKit and SkimNotes. This might take some time but I am an experienced programmer after all. How Objective-C can be so different from ActionScript? Doubt it.
By PDF support I mean possibility to show and generate PDF files in Flash/AIR.
Show.
Flash and Flex can’t work with PDF files. Period. The only way to show a PDF file (without converting it to any other format) is using an HTML wrapper. For Flex there’s Flex iFrame library. You can use it to show an HTML code with embeded PDF file with installed Adobe Reader plugin (assuming that it IS installed in the system). If you are publishing to AIR you can use HTMLLoader to do the same thing:
var request:URLRequest = new URLRequest("http://www.example.com/test.pdf");
pdf = new HTMLLoader();
pdf.height = 800;
pdf.width = 600;
pdf.load(request);
container.addChild(pdf);
Read more about it.
Once again this method relies on Adobe Reader plugin installed. But here you at least can check if the plugin is installed or not using HTMLLoader.pdfCapability. Though there’s a bug, if you install Adobe Reader plugin and don’t agree with Terms of Use pdfCapability shows OK but you will not be able to see your PDF document. What’s more this method comes with a huge list of limitations (quoted from the link above):
- PDF content does not display in a window (a NativeWindow object) that is transparent (where the transparent property is set to true).
- The display order of a PDF file operates differently than other display objects in an AIR application. Although PDF content clips correctly according to HTML display order, it will always sit on top of content in the AIR application’s display order.
- PDF content does not display in a window that is in full-screen mode (when the displayState property of the Stage is set to StageDisplayState.FULL_SCREEN or StageDisplayState.FULL_SCREEN_INTERACTIVE).
- If certain visual properties of an HTMLLoader object that contains a PDF document are changed, the PDF document will become invisible. These properties include the filters, alpha, rotation, and scaling properties. Changing thse renders the PDF file invisible until the properties are reset. This is also true if you change these properties of display object containers that contain the HTMLLoader object.
- PDF content is visible only when the scaleMode property of the Stage object of the NativeWindow object containing the PDF content is set to StageScaleMode.NO_SCALE. When it is set to any other value, the PDF content is not visible.
- Clicking links to content within the PDF file update the scroll position of the PDF content. Clicking links to content outside the PDF file redirect the HTMLLoader object that contains the PDF (even if the target of a link is a new window).
- PDF commenting workflows do not function in AIR.
This basically means that you can’t do anything with the box where your PDF is displayed, neither animate nor control it. It’s just sitting there messing with your cool AIR app.
But using this method you can communicate with a PDF file provided that you compile specific JavaScript in it using Adobe Acrobat JavaScript capabilities (doh?). Read further to find out how it can be done without Acrobat. I don’t know what JavaScript code inside PDF file can do, even if it has total control over the plugin this method sucks.
Convert to SWF.
More popular way is to convert PDF to SWF and load it in your application. The only open source tool I found is pdf2swf (part of swftools). Apparently it is used by every service which makes flash PDF viewers like scribd.com, safaribooksonline.com, pagegangster.com and alike. I actually found out about safaribooksonline when I was looking where to download AIR 1.5 Cookbook because there’s a chapter about PDF in AIR (which actually said nothing more that what I wrote here). Safari and PageGangster don’t secure their files but I couldn’t break Scribd data format. I know that it’s using pdf2swf because swftools home page mentions it.
This tool indeed converts PDF files to SWF. The latest version publishes to AVM2 with -T9 flag which you can use in AS3. It generates a lot of StaticText objects which of course are not TextFields but you can use TextSnapshot to work with StaticTexts (get text, select text). It does have conversion glitches, for some reason it may convert text to bitmaps instead, I can’t get how exactly it works with fonts. The latest version has a lot of bugs fixed but on my tests I failed making Flex select text with TextSnapshot. I’m waiting for a reply in mailing list.
The tool works, bad news is that you need to run it to convert PDF files to SWFs. That’s one more step in the chain.
Generate.
The most popular library for generating PDFs from ActionScript 3 is definitely AlivePDF. The project is active and many enthusiasts develop different extensions: templates, JavaScript injection (this is how you probably can inject JavaScript to any PDF file without Adobe Acrobat and use it to communicate with that file). The latest news is plans for using Alchemy to speed up PDF creation. There are a couple more projects on Google Code: PDFCase — a port of Java PDFJet and as3-pdfreader — a Chinese guy tried to port Jave PDFBox but gave up in the middle. So, we are left with the only solution — AlivePDF.
I don’t know all possibilities of AlivePDF so I can’t really tell how well it works. But at least it seems that there’s one working library for creating PDFs in ActionScript3.
Final words.
That’s a shame that PDF files can work with Flash content but ActionScript can’t work with PDF files. I’m not counting HTML wrapped PDF display in Flex/AIR. We need API at least for reading PDF content and displaying it on the screen. With ability to index and search text, add annotations and handle button events.
I was hoping that in the next Flash Player we’d get such API, but no, looks like we will have to wait longer.
I like collecting stuff. Not collecting stamps or coins just for the sake of it, but gathering stuff I might need in the future. I got many topics I can research from time to time downloading PDFs and saving links. I’d say that I’m getting more pleasure from preparing to some activity than from this activity after I got all preparation done. This is my first problem.
My second problem is that I tend to forget stuff. It makes researching mostly a waste of time. Sometimes I download a file only to find that I have 3 copies of the same file in several tmp folders.
That’s why I was trying to sort everything I download… It takes a lot of time and gets complicated. Much better principle is tags. I used Leap a bit, but tags it’s using break when restoring files from Time Machine (from NTFS drive) and it is not flexible at all.
I tried Evernote and other similar apps for mac, honestly, Evernote is close enough to what I need but it lacks text formatting and crosslinking functions, it can’t create nested notebooks and can do nothing with PDFs apart from importing them.
I recently installed a beta of DEVONthink, I must say that it is more interesting than Evernote but it is indeed beta because it doesn’t even have a good WYSIWYG editor, PDF annotation bugs, but at least you can link PDFs from text notes. “Similar documents” AI function doesn’t have much use. Looking that it have been beta for a year or so we will probably have to wait another year for release.
People advised to try a Wiki. I did, but no Wiki can work fine with PDFs. Though, the whole idea of Hypertext is probably what I need. Sorting, tagging and connecting documents with ideas to form a single big idea. The problem is that I read a lot of scientific articles which are in PDFs and it’s not a case to save text to HTML for example… But, actually, I’ll try to research around converting PDFs to HTML with images. I checked how Adobe Acrobat exports PDFs to HTML and RTFs and it’s horrible. I’m not saying that it takes whole 5 minutes to export one book.
In conclusion, I still haven’t found any software which would fully satisfy me. There are applications which have several features from the set of features I need, but none does everything I want, not that I wanted too much:
- Be able to create rich text nodes,
- Insert images, movies, audio files and PDFs,
- Have everything searchable,
- Tag everything,
- Be able to sort nodes to folders and smart folders,
- Be able to crosslink files,
- Add small notes (as clouds) to everything,
- Annotate PDFs,
- Have a usable intuitive interface,
- Run on a mac offline.
Are these features so hard to make?













Recent Comments