====== Embedding Blobs in Binaries ====== I was recently looking for a way to ship an image with an executable without referring to the image as external file. Now you can argue if it is a good idea in general, but that's another story. Since I found very little information on the subject and needed to piece together information from various sources, I conclude the task with this comprehensive write up on the subject. The mechanisms described here apply to including any resource - not just images - into an executable c-program. ===== Object Files ===== There are various ways to embed generic fixed data within an application. Let's start with a basic example that should be familiar: const char* mystring = "hello world"; Every application is just a collection of [[wp>Data_segment|Data segments]] stored in a file. On the modern PC architecture, every application binary consists of three major segments //Stack//, //Data// and //Code//. Without going into details, fixed data - such as constant text-strings reside in the read-only part of the //Data Segment// of the application binary and the c-compiler will automatically create a memory region for it. When you compile the above line with ''gcc -c -o mytext.o mytext.c'' and inspect the object file with ''objdump -t mytext.o'' you'll find the data-section in the object file. 00000000 l d .text 00000000 .text 00000000 l d .data 00000000 .data 00000000 l d .rodata 00000000 .rodata 00000000 g O .data 00000004 mystring which basically says: "mystring' is a read-only global (''g'') text object (''O'') and can be found at offset 0x00 in the object-file. For a detailed description of all the values see ''man objdump''. ===== C Include method ===== Now one solution would be to simply write the code in c and have the compiler take care of everything by using const unsigned char* myimage = "BIG BLOCK OF RESOURCE DATA ENCODED AS STRING"; To encode the data of a given file, the xxd tool that comes with the [[http://www.vim.org/|vim]] text editor can be used: ''xxd -i binary_file'' outputs C include file style of the given binary file and writes a complete static array definition named after the input file: # cat /tmp/example Hello World # xxd -i /tmp/example unsigned char _tmp_example[] = { 0x48, 0x65, 0x6c, 0x6c, 0x6f, 0x20, 0x57, 0x6f, 0x72, 0x6c, 0x64, 0x0a }; unsigned int _tmp_example_len = 12; Using this variant produces portable C code and just works(TM). On the downside, it requires at least five times the size of the original data (" 0xNN," for every original data-byte) and the source code needs to be updated every time the original data changes. While this approach is useful in some cases, we can do better than that. ===== Binary linking ===== The GNU linker can be used to directly create object files with a custom ''.data'' section directly from any input file. The /magic/ flags are ''-r'' to make the object file relocatable and ''-b binary'' for linking files with an unusual binary format. # ld -r -b binary -o example.o example.jpg The resulting object file ''example.o'' can be linked with any application, simply with gcc itself. e.g. ''gcc -o myapp myapp.c example.o''. Now the last missing link is to access the data in ''example.o'' from the c-code in myapp.c. Have a look at the output of the object and compare it with above output for the //mystring// object. # objdump -t example.o SYMBOL TABLE: 00000000 l d .data 00000000 .data 00004fc0 g .data 00000000 _binary_example_jpg_end 00004fc0 g *ABS* 00000000 _binary_example_jpg_size 00000000 g .data 00000000 _binary_example_jpg_start The "[..] ''g .data '' [...] ''''" part should ring a bell. This data-section can be referenced from the C code simply by using: extern const unsigned char _binary_example_jpg_start[]; and the length can be calculated from extern const unsigned char _binary_example_jpg_start[]; extern const unsigned char _binary_example_jpg_end[]; size_t len = _binary_example_jpg_end - _binary_example_jpg_start; or simply by referencing the address of ''_binary_example_jpg_size''. The linker will resolve the extern reference when linking the application and the application will use the data just like a fixed blob in the source-code. So far so good. ===== Architecture dependent binary linking ===== Now for the tricky part: The GNU linker behaves differently depending on platform and architecture. The implementations interesting for me are GNU/Linux, OSX and mingw (cross-compiling windows binaries on a GNU/Linux host). The mingw cross-compiler behaves almost exactly as gnu-ld with one minor difference: the data section does not include the leading underscore: ''_binary_example_jpg_start'' vs ''binary_example_jpg_start''. -- Fine, there goes some of the elegance of the solution, but that case is easily handled with an ''#ifdef''. However, Mac/OSX is different. The ''ld'' which is shipped with X-code comes from //llvm version 2.7svn// and does not support the ''-b input-format'' feature. Furthermore //universal executables// on OSX may comprise binary formats for various architectures with the .data section format being different for each architecture. The alignment for the data may differ between 32bit and 64bit architectures and the endianess may differ as well. Thus the creation of the data section needs to be done during compilation instead of the linking stage. On OSX ld's binary linking feature has been moved into their customized gcc, and is available via '-sectcreate' option: gcc -sectcreate __DATA __example_jpg example.jpg -o myapp myapp.c To create a universal build for Intel architectures, add ''-arch i386 -arch x86_64'' to above commandline. ''objdump'' is also a GNU tool which is not available on OSX. You can inspect the data section using ''%%otool -s __DATA __example_jpg /path/to/executable%%''. see ''man otool'' for details there. Due to the nature of OSX binaries, referencing the data-section in the c-code is not possible with a simple ''extern unsigned char''. The linker does not know which architecture will be used and can not provide an address. The mach binary format which is used by OSX needs to be inspected at runtime when the architecture is known and map the relevant data after the application is started. Apple provides an API for doing that which is defined in the ''[[http://www.opensource.apple.com/source/cctools/cctools-384.1/include/mach-o/getsect.h|mach-o/getsect.h]]'' header file. If you have x-code installed you can read documentation on it at ''man getsectbyname''. Resolving the secion can be only be done at runtime after the data section has been relocated and can done by calling ''getsectbyname()''. However there is a trick that you can use, to make this implicit. the meta-variable ''_section$'' is recognized by the gcc compiler on OSX. It produces the same result as calling ''getsectbyname()->addr''. Short of reading the actual code, information about osx linker internals is not easy to come by. ''getsectbyname()'' actually opens the executable file and searches the relevant data section while the application is running. ''_section$'' may or may not already be resolved at link-time for a given architecture ((should you know more about this or have any pointers to documentation, please contact me)). Update (Oct 2016 - Thanks to Eugene Gershnik): On newer versions of OSX/macOS that run executables with [[wp>Address_space_layout_randomization|ASLR]], the call to `getsectbyname` needs to be replaced with `getsectiondata` ((`getsectiondata` is a drop-in replacement, see also http://stackoverflow.com/questions/28978788/crash-reading-bytes-from-getsectbyname)). However this API is only available from OS 10.7 onwards. -- ===== Architecture Abstraction -- working solution ===== Long story short, one can use a macro abstraction that works x-platform. To access the data and size ''LDVAR()'' and ''LDLEN()'' are defined, for the actual external definition of the symbol, ''EXTLD()'' is used: #ifdef __APPLE__ #include #define EXTLD(NAME) \ extern const unsigned char _section$__DATA__ ## NAME []; #define LDVAR(NAME) _section$__DATA__ ## NAME #define LDLEN(NAME) (getsectbyname("__DATA", "__" #NAME)->size) #elif (defined __WIN32__) /* mingw */ #define EXTLD(NAME) \ extern const unsigned char binary_ ## NAME ## _start[]; \ extern const unsigned char binary_ ## NAME ## _end[]; #define LDVAR(NAME) \ binary_ ## NAME ## _start #define LDLEN(NAME) \ ((binary_ ## NAME ## _end) - (binary_ ## NAME ## _start)) #else /* gnu/linux ld */ #define EXTLD(NAME) \ extern const unsigned char _binary_ ## NAME ## _start[]; \ extern const unsigned char _binary_ ## NAME ## _end[]; #define LDVAR(NAME) \ _binary_ ## NAME ## _start #define LDLEN(NAME) \ ((_binary_ ## NAME ## _end) - (_binary_ ## NAME ## _start)) #endif Example usage in a C Program: // define the external variable EXTLD(example_jpg) void some_function() { // access the data size_t length = LDLEN(example_jpg); uint8_t *data = LDVAR(example_jpg); } As a final note, some care must be taken when choosing the variable identifier. ''ld'' will use the filename to generated the section name. If the filename includes characters that are not valid C identifiers they will be transformed to underscores. e.g. ''ld -r -b binary -o example.o ../images/example.jpg'' will create a region ''%%_binary____images_example_jpg%%''. The ''../'' as well as the slash and dot are transformed to underscores. This is not an issue on OSX where the identifier needs to be specified with the ''-sectcreate'' option. However identifiers on OSX are limited to 16 characters. So in order to use above approach x-platform, the path to the file-name passed to ''ld'' must be <16 chars and the same identifier needs to be specified on the OSX compile command. A complete project that uses this approach to include a jpeg image file and a javascript text file is [[https://github.com/x42/harvid|harvid]]. It also outlines how to use a x-platform ''Makefile'' for creating the object files and adding the relevant flags to the OSX gcc command. ===== Alternative option using asm ===== User ColaEuphoria points out that an alternative for x86 architecture is to use assembly section .rodata global _my_data; _my_data: incbin "my_data.file" _my_data_size: dd $-_my_data and compile it with `''nasm -felf64 resource.s -o resource.o''` (Note that ''-felf64'' here is Linux 64bit. The options needs to be replaced with the corresponding target architecture and OS). The data can then be referenced using extern const unsigned char _my_data[]; with gcc. MSVC does not need the leading underscore in c-code and one can reference it using ''my_data''. {{tag>development article}}