DVD Conversion
From Zanecorpwiki
Contents |
Motivation
Space and convenience. 100 DVDs, in a mix of single and boxed sets take up XX cubic feet. Dropping the boxes and storing in slimline jewel cases reduces the volume to YY cubic feet. With hard drives and such for network access, you add another ZZ cubic feet. Compared to the original, as shipped space, the converted, useable space is WW% less space.
The hard drive stored media can be more easily accessed, and if converted you can skip the annoying warnings and bullshit previews.[notes 1]
In short:
- use less space to
- make media more accessible and
- protect investment
TODO: get picture of what we're talking about.
Legal Questions
Start with this excellent summary of the current state of things. In general, copying DVDs is fraught with legal and civil peril. I make notes of my own understanding throughout the writeup. I, however, am not a lawyer so none of this should be taken as anything more than a starting point. I should note, however, that even lawyers cannot tell you "what's allowed" and what isn't. This is a very murky area. In such a case, doing anything is dangerous because even if you can prevail in court, it could be very expensive and time consuming to do so.
The central legal question revolves around the Digital Millennium Copyright Act (DMCA). This law is unique and departs dramatically from previous copyright law and precedent.[notes 2] The key provision in the law makes it a civil violation (and in some cases criminal) to circumvent encryption. Commercial DVDs are (usually) encrypted with something called CSS.
Some DVDs--ones you make yourself, or a family member has produced, and very rarely some commercial DVDs--are not encrypted. As far as I know, it is perfectly legal to make copies of this material, including converting it into another format.[notes 3] This is what makes the DMCA an abberation. It tramples of all previous precedent, social norms, and common sense and turns a traditional right into a civil violation when exercising that right includes sidestepping the CSS code. It's like saying free speech is legal, but using the subjunctive is not, so speech involving the subjunctive is (effectively) criminalized even though it would otherwise be perfectly legitimate. The DMCA makes about as much sense as passing a law that says you can't make notes or highlight in a book you've bought.
Equipment
- document scanner (nice to have)
- flat bed scanner
- jewel cases
- hard drives
- Mac
- Handbrake
Phase One
Phase one is just backing up the DVD, making an image of the original DVD image.
Legal risk. There are claims that a single, straight backup of a DVD which you own is legal. I haven't been able to verify this.[notes 4] Presumably, a bit-for-bit copy would be fine because you're not decrypting the DVD. The way CSS works,[notes 5] it's hard (I know of know way) to make a perfect bit-for-bit copy. You're forced to decrypt the image even if you don't want or need to as part of the copying process.
How to do this. With Linux or Mac, run:
dd if=/dev/dvd of=/Volumes/backup01/dvd.img conv=noerror,sync,notrunc
For most DVDs, this will silently copy all the bits making an image of the DVD which can be played directly or used to create a new DVD should the original be lost (or you'd rather not use it). It's not the most convenient format, however, because the file is huge (same as original DVD, 4 to 8GB) and not every player can handle the raw DVD files. Also, while most DVDs are "friendly" in terms of layout, some are setup in such a way as to make navigating without the DVD menu (which is encoded separate from the actual movie information) difficult.
If you do get error messages while copying the DVD, there are two possibilities. First, the disk is bad or has bad bits. This doesn't mean that the data is unesable. DVDs are encoded to be robust (like CDs). You may not even notice the errors.
The second possibility is that the DVD is intentionally messed up with bad sectors in order to make copying harder. Either way, the "conv" options deal with this. By default, dd will halt if errors occur; 'noerror' says to keep going. 'sync' says "write 0's when you find errors" (otherwise, it'll write nothing which leaves you with an unreadable disc image) and 'notrunc' tells dd not to truncate the image (otherwise it will sometimes get to the end of the process and just start over again).
Why do this. First and foremost, for security. DVDs don't last forever. In fact, I find increasingly (in my own collection) that they're defective from day one. DVD players will often fail when trying to read the original DVD, but a computer player will be able to deal better.[notes 6]
As far as I understand, copyright laws are clear: you own the right to watch the movie and backing up media, especially when eventual failure is virtually guaranteed, is perfectly legit and, as far as copyright law itself goes, legal. It's "fair use". It's only the DMCA that makes this a problem.
Phase Two
In phase two, the movies are converted into another format which makes them more convenient.
Legal risk. This is pretty clearly in violation of the DMCA. So long as you are doing it for your own personal use, with no financial or commercial gain, then you'd "merely" be subject to civil, but not criminal, violations.[notes 7]
How to do this. Use Handbrake. There are ports for Mac, Linux, and Windows. It provides canned profiles for various formats (iPhone, high quality "network" storage, etc.)
Why do this. In terms of preserving your collection, the converted media is not as good as the bit-for-bit (sans CSS) copies. You lose some quality when you convert. However, the converted movies are far more convenient to access. The primary advantage is the use of newer codecs which allow you to get DVD quality with a much smaller file size. In general, H264 (the most common encoding used in MP4 files) is about 3-4 times better than DVD at equivalent quality. This means you could store about 3-4 times as many movies on a given hard drive.
Deciding on Formats
Bottom Line
.H264 + AAC in an MKV container.
What that Means and Why
There are two kinds of "things" at play in every modern video file: the container and codecs. Codecs are the way in which the data encoded. Most codecs are lossy compression codecs. A lossless codec (whether compressed or not) retains every bit of source data perfectly. This ends up with huge files. Top end HD resolution circa 2010 was 1,920x1,080 (1080p). With 32bit color and 30 frames per second, you're talking over a quarter of a gigabyte per second![notes 8] That puts a 2 hour movie at around 1.5 terabytes.[notes 9] for video alone.
Even compressed, these files would be huge by today's standards. It is provably impossible to compress arbitrary digital data beyond a certain point in any practical way. In comes lossy compression.
Lossy compressions exploit both the technical limitations of the eye--which is estimated to perceive about 10 million colors[notes 10]--and the psychology of image processing in order to "get rid" of stuff a human can't see or wouldn't notice anyway.
There are codecs for both audio and video. The best widely available (and, AFAIK, the best format period) for video compression is .H264, sometimes referred to as MPEG-4.[notes 11] .H264 contains patented technology, and there is an open source alternative, Ogg Theoria, which I believe (but don't really know for certain) is technically in the same league in terms of quality to file size ratios, but currently lacks wide support. Ogg Theoria (often referred to as just Ogg) may be useable in certain circumstances, but couldn't be used in my ideal setup as of this writing.
In the audio world, the codec is less important (because there's not as much audio data and it's generally easier to compress). AAC is probably the best, widely supported standard. As with video, there is an open source technically competitive, but less widely supported format Ogg Vorbis (also often called Ogg).
The container formats are what put it all together. A flexible container can accept multiple video, audio, and other tracks (generically, video overlays) as well as some other information (like, whether viewers can skip a section of video or not) and keep everything time-synced. That's how the user can switch through different angles and sound tracks in a movie.
For simple music files, that don't need containers, the file is just the codec-compressed piece of music. an MP3, MP4[notes 12], AAC, or OGG file is an audio file.
With video, though, that usually has at least one audio and one video track, you need a container. So with movies, it's not "the movie track" itself which is the file, but the container for the different tracks of the movie. There are many older formats, but the best modern formats are M4P[notes 13], MKV[notes 14], and the Ogg (container) format.
M4P and MKV are very similar. M4P is certainly the best supported, but MKV is a close second[notes 15]. Again, while Ogg does have good support, it's not good enough at the moment for my ideal setup.
Handling Subtitles
DVD subtitles are in VOBSUB format. This is a time coded bitmap. The subtitle tracks can be embedded in MKV and (newer) M4P containers can embed the VOBSUB tracks, but some (especially mobile) players are unable to read these tracks. There are two solutions. The first is to "burn in" the subtitles during format conversion. This blends the subtitle bitmaps in with the regular video stream and means it's impossible to turn subtitles off.
Eventually, one can assume players will improve to use the embedded VOBSUB tracks, but until then, it is possible to extract the sub-titles into an external SRT file that encodes time codes and subtitles as text. These files can be loaded in the viewer to add subtitles.[notes 16]
To extract an SRT file, extract the VOBSUB using MkToolNix:
mkvmerge -i foo.mkv # find the track number of the VOBSUB track mkvextract track 4:foo # where '4' would be the track number # you now have a foo.sub and foo.idx
It's now time to OCR the IDX file with avidemux2 via. Tools -> OCR (Vobsub -> SRT). Then select the IDX file, specify the output file (add the '.srt' suffix) and hit go. You'll need to help train the OCR. A Mac accent chart or a video tutorial may be useful.
Notes
- ↑ I can see being forced to sit through copyright notices once, but the second and third time? Being forced to sit through previews, as some DVDs do, is offensive.
- ↑ Some would argue it is in conflict with and in some cases nullified by copyright law, but that's where the expensive lawyers come in.
- ↑ Of course, doing many things with even unencrypted DVDs would be illegal. The material is still under copyright.
- ↑ Lower courts have ruled that 321 Studios DVD copying software was illegal, even if used for legal purposes. As far as I understand, this leaves the question of whether a bit-for-bit copy for pure backup is itself legal. I was unable to find a statutory exception on this point or legal precedent. It could arguably be fair use, but the courts have recently been rather narrow in interpretation of fair use.
- ↑ The CSS "keys" are written to an area of the DVD which can only be pressed.
- ↑ A recent anecdote: my copy of "Seven Samurai" from the Criterion Collection was completely unplayable. I mention the vendor because they're known for high quality stuff, so if there's was useless... The problem was bad sectors that caused multiple DVD players (PSII and Mac) to skip small portions early going. The last 45 minutes or so (climax!) was completely unplayable. It would halt for minutes on end, then jump forward 5-10 minutes, play a few seconds, and repeat. Based on experience of ripping DVDs prior to the enactment of DMCA, I know first hand that these problems can sometimes be fixed by copying the DVD and playing the copied image from the hard drive.
- ↑ The civil exposure, however, is huge. Some (including myself) would argue that this is tantamount to back-door criminilization. $100,000 per incident is, I believe, the number. Meaning if you copy four disks so your daughter can watch the first season of Barney on a car ride, you'd potentially be on the hook for $400,000.
- ↑ 237.30469 MBi; in 10 years, that explanation point may seem silly, but to someone that remembers when it was cutting edge to have a megabyte floppy disk and has seen the old car wheel size HD platters that held kilobytes of info, that much data streaming by is amazing.
- ↑ 1.669 TBi
- ↑ I'm just stealing the reference without double checking myself, sloppy, but this isn't my day job: D. B. Judd and G. Wyszecki (1975). Color in Business, Science and Industry. Wiley Series in Pure and Applied Optics (third ed.). New York: Wiley-Interscience. pp. 388. ISBN 0471452122.
- ↑ The full technical designation is not important, AFAIK, there's no confusion around the standard.
- ↑ Both MPEG-3 and MPEG-4 (as well as earlier MPEG-2, which is actually, I believe, what most commercial DVDs are encoded in) define both audio and video compression algorithms. A 'foo.mp3' or 'bar.mp4' file, however, is almost always an audio file. Same with OGG.
- ↑ Not to be confused with MP4, though it often is.
- ↑ Also called the Matroska format.
- ↑ In my experience, haven't checked numbers.
- ↑ This is also an interesting way to add commentary, or a joke track, or whatever.


