Tutorials‎ > ‎

gcc_trace_functions_improved

HOWTO: Improve upon native GCC trace function support:

These instructions for tracing your C code build on the heels of this tutorial:

This tutorial provides code and instructions for multiple improvements:
  1. Runtime name-lookups of each function: 

    Instead of generating only function addresses, this short few steps allows you to spit out the list of human-readable function names during runtime along with their addresses while the program is running.

  2. Easily blacklist and whitelist your function tracepoints:

    Because the traced list of functions can be quite large (particularly with very complex C programs), the tracing isn't very useful if you don't have a relatively straightforward mechanism for pruning the list of traced functions before recording them to a file for later analysis (or runtime analysis). Also if pruning of the traced function list is not done, you can considerably slow down your program with unnecessary I/O when the traces are written out to disk.

  3. "Fuzzy"-generate tracepoints and whitelist/blacklists before running your program:

    Using some standard linux utilities, you can "search" your binaries to automatically to find interesting functions to trace using a regular expression, which we will use to statically-compile into the tracing code before running your program.

  4. Easily probe external libraries linked against your program that you may also be interested in.

    As programs get very large, so do the number of dependencies, and we would like a tracing solution to be able to investigate those dependencies.
Once you have these abilities on top of the native GCC tracing support, you could do more sophisticated things: Now that the amount of tracing information is signficantly less *and* contains names of functions that can be analyzed, you could easily fit the entire trace into memory, which would allow your program to run even faster.


First, download the basic tracing code and helper script:

Download these two files, and then skip down below to see steps on how to use them in your program:
  1. Tracing code itself: download "tracefunc.c"
  2. Helper Script: download "generate_trace_input.sh"
  3. Example Main Program (optional, if you just want to try "test" the tutorial:) download "example_program.c"

Usage:

To get the improvements listed above, we first need to probe your program's already-compiled binary before we can trace it. After that, we will recompile your program with the tracing code linked in with this statically-probed information to be used during runtime while your program is being traced.

Step A: Configure the helper script

The beginning of the script defines a few configuration parameters that you need to change:
  1. "header": This points to a C header file to be included by our improved tracing code. 

    • This file is automatically generated by the helper script and contains the statically-probed information about your program to be provded as input for tracing your program. It does not need to change unless your really want to.

  2. "trace_file": Specify where the final tracing results should go.

  3. "external_libraries": Provide a list of absolute path names of libraries that you would like to trace in conjunction with your main program. 

    • If you're not interested in any external libraries right now, just set this to empty quotes ""

  4. "whitelist_functions": Specify a fuzzy list of functions that you want to trace.

    • For example, let's say you're not entirely sure what the full name of the functions you want to trace. As an example, let's say you had a large source file and all the functions in that file had a common prefix:

            new_feature.c:

                   void new_feature_initialize() ....
                   void new_feature_cleanup() .....
                   void start_feature_thread() ....
                   void stop_feature_thread() .....

      Now, in this case, you would simply like to trace *all* functions with the string "new_feature" and "feature_thread", without having to list every single function by hand.
      Then, you would configure the helper script like this:

      whitelist_functions="new_feature|feature_thread"     # separate fuzzy strings with a pipe '|' character in regular expression syntax

    • At this point, the helper script will probe your program and all external libraries for these strings and output the list of function names containing these strings for later tracing.

  5. "blacklist_functions": Specify a fuzzy list of functions that you do NOT want to trace.

    • For example, sometimes, the list of whitelisted functions is too big. This option allows us to narrow it down:

               Let's say "new_feature.c" contained hundreds of functions:

                    int feature_A() ...
                    long feature_B() ...
                    char feature_C() ...

    • Again, in this example, we're not entirey sure what the full list of feature function names are, so we would do:
                
                whitelist_functions="feature"

    • However, we don't want to trace ALL the feature function names, but instead let's say we're only interested in feature function names A, but not B and C.

               Then we would do the following:

               blacklist_functions="B|C"

  6. "main_program": This is the path location of your final program binary.

    • The helper script will probe this pre-compiled binary along with any external libraries to perform the whitelist/blacklist search described above.

Step B: Run the helper script and compile the tracing code with your program
  1. Compile your original program normally:

    $ gcc program.c -o program

  2. Run the helper script. You should get some output similar to this:

    $ ./generate_trace_input.sh

    Example Helper Script Output

    [mrhines@salieri ~ ]$ ./generate_trace_input.sh
    Configuration:
    =================
    Runtime tracing results will go to trace.out
    Main program will be is: program
    Will also trace external libraries:
    /lib/i386-linux-gnu/libc.so.6

    Generating tracefunc.h input file for tracing...
    Outputting whitelisted matches from foo|bar|malloc ...
      Whitelist function match: bar
      Whitelist function match: foo
      Whitelist function match: __libc_malloc
      Whitelist function match: malloc
      Whitelist function match: malloc_get_state
      Whitelist function match: __malloc_hook
      Whitelist function match: malloc_info
      Whitelist function match: __malloc_initialize_hook
      Whitelist function match: malloc_set_state
      Whitelist function match: malloc_stats
      Whitelist function match: malloc_trim
      Whitelist function match: malloc_usable_size
    Outputting blacklisted matches from bar|malloc ...
      Blacklist function match: bar
      Blacklist function match: __libc_malloc
      Blacklist function match: malloc
      Blacklist function match: malloc_get_state
      Blacklist function match: __malloc_hook
      Blacklist function match: malloc_info
      Blacklist function match: __malloc_initialize_hook
      Blacklist function match: malloc_set_state
      Blacklist function match: malloc_stats
      Blacklist function match: malloc_trim
      Blacklist function match: malloc_usable_size


  3. Finally, compile your program with the provided tracing code like this:

    $ gcc tracefunc.c program.c -o program -finstrument-functions `pkg-config --cflags glib-2.0` `pkg-config --libs glib-2.0` -ldl

    NOTES:

    If you are using linux system, you will need to install the "libglib2.0-dev" or "glib2-devel" packages because the helper code uses GLIB's hashtable data structure.

  4. Run your program!

    Every traced function that you were interested in will be written to "trace.out" (or whatever filename you chose) in human readable format based on human-readable names that you configured previously with the helper script.

Feedback welcome!


Tracing Code: tracefunc.c (also included as a downloadable attachment)

#ifndef _GNU_SOURCE  /* These are needed to work with glib */
#define _GNU_SOURCE
#endif
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <dlfcn.h>
#include <string.h>
#include <glib.h>    /* gcc flags will provide us with the write linker options */

/*
 * This file will contain the list of blacklist functions and
 * whitelist functions that we want to search for during runtime.
 *
 * This file is generated with a search before compiling your program
 * and then referenced in the code below.
 */
#include "tracefunc.h"

static FILE *fp_trace;

static int nb_programs = 0;

/*
 * We will store the 'dlopen' handles used to probe each binary/library.
 */
static void ** lookup_handles;

/*
 * This hashtable will store the addresses of the human-readable functions
 * that we will be looking for while the program is running.
 * 
 * Each time a function is traced, we look it up here and spit it out if
 * we get a match table and we skip if we get a hit in the blacklist table.
 */
GHashTable * match_addrs, * black_addrs;

/*
 * The following attributes and prototypes are important:
 *
 * 'constructor' / 'destructor' is required by GCC to initiate tracing.
 * 
 * 'no_instrument_function' is optional, but important, otherwise GCC
 * will trace our tracing code, which easily leads to infinite recursion
 * and will cause the program to segfault.
 */
void __attribute__ ((constructor,no_instrument_function)) trace_begin (void);
void __attribute__ ((destructor,no_instrument_function)) trace_end (void);
void __attribute__ ((no_instrument_function)) print(const char * direction, void *func, void * caller);
void __attribute__ ((no_instrument_function)) __cyg_profile_func_enter (void *func, void *caller);
void __attribute__ ((no_instrument_function)) __cyg_profile_func_exit (void *func, void *caller);

/*
 * Perform the translation between function name and address 
 * and store it in a hashtable for later usage.
 */
void lookup_function(const char * function_name, GHashTable * table)
{
int x;
void * result; 
for(x = 0; x < nb_programs; x++) {
result = dlsym(lookup_handles[x], function_name);
if(result) {
g_hash_table_insert(table, result, (void *) 1);
break;
}
}

if(result == NULL) {
printf("match function %s cannot be found in any of the libraries\n", function_name);
exit(1);
}
}

/*
 * Invoke 'dlopen' for each external library we want to query during tracing,
 * including ourselves (the main program).
 */
void load_library(const char * library, int entry) 
{
lookup_handles[entry] = dlopen(library, RTLD_NOW);
if(lookup_handles[entry] == NULL) {
       printf("Could not open binary/library named: %s, because: %s\n", library ? library : "(main)", dlerror());
exit(1);
}
printf("Opened library located at %s\n", library ? library : "(main)");
}
void __attribute__ ((constructor,no_instrument_function)) trace_begin (void)
    int match_count, black_count;
    int x = 0;

    printf("program start\n");
    /*
     * First, use 'dlopen' to open a handle to each binary/library
     * that we intend to probe while tracing the program.
     */
    nb_programs = sizeof(programs) / sizeof(programs[0]);
 
    printf("Will probe %d libraries, including main(), for tracing...\n", nb_programs + 1);

    lookup_handles = malloc((nb_programs + 1) * sizeof(void *));
 
    /* Open external libraries library */
    for(x = 0; x < nb_programs; x++)
load_library(programs[x], x);
 
    /* Open the main program */
    load_library(NULL, nb_programs);
    nb_programs++;
    
    /*
     * Initialize the match and blacklist hashtables.
     */
    match_addrs = g_hash_table_new(g_direct_hash, g_direct_equal); 
    black_addrs = g_hash_table_new(g_direct_hash, g_direct_equal); 

    /*
     * Open the output trace results file.
     */
    fp_trace = fopen(output_trace_filename, "w"); 

    if(fp_trace == NULL) {
perror("fopen");
printf("Failed to open output trace file: %s\n", output_trace_filename);
     }

     match_count = sizeof(matches) / sizeof(matches[0]);
     black_count = sizeof(blacklist) / sizeof(blacklist[0]);


     /*
      * Now, go through each of the requested human-readable function names and
      * find the corresponding address of each function. Store that address
      * into the hashtables so that we can figure out whether or not to output
      * each traced function while the program is running.
      */
    printf("Looking up addresses for %d whitelisted functions...\n", match_count);

    for(x = 0; x < match_count; x++)
lookup_function(matches[x], match_addrs);

    printf("Looking up addresses for %d blacklisted functions...\n", black_count);

    for(x = 0; x < black_count; x++)
lookup_function(blacklist[x], black_addrs);
}
void __attribute__ ((destructor,no_instrument_function)) trace_end (void) 
    if(fp_trace != NULL) { fclose(fp_trace); } 
    printf("program end\n");
}

void __attribute__ ((no_instrument_function)) print(const char * direction, void *func, void * caller)
{
     Dl_info dl1, dl2;

     if(fp_trace == NULL)
         return;

     /*
      * Did the currently traced function 'hit' in the match table?
      */
     if(!g_hash_table_lookup(match_addrs, func))
         return;

     /*
      * Match hit. Check the blacklist.
      * Did we hit in the blacklist? Then ignore this one.
      */
     if(g_hash_table_lookup(black_addrs, func))
         return;

     /*
      * Now that we know the addresses in question are found,
      * we need to print out human-readable results by converting
      * the addresses of both the caller and callee to function names.
      */
     dladdr(func, &dl1);
     dladdr(caller, &dl2);

     /*
      * Sometimes this happens, no idea why.
      */
     if(dl1.dli_sname == NULL)
         return;

     /* Finished. */ 
     fprintf(fp_trace, "time [%ld] addr (%p): %s call from (%s) => to (%s) \n", time(NULL), func, direction,
      dl2.dli_sname ? dl2.dli_sname : "unknown", 
      dl1.dli_sname ? dl1.dli_sname : "unknown") ;
    fflush(fp_trace);
}

/*
 * This functions are required to be defined by GCC.
 * Each traced function results in GCC invoking these functions,
 * from which we do our more sophisticated tracing.
 */ 
void __attribute__ ((no_instrument_function)) __cyg_profile_func_enter (void *func, void *caller)
print("enter", func, caller); 
}

void __attribute__ ((no_instrument_function)) __cyg_profile_func_exit (void *func, void *caller)
print("exit", func, caller); 
}

Helper Script: generate_trace_input.sh (also included as a downloadable attachment)

#!/usr/bin/env bash

# Change these variables to match the program you are trying to trace functions.

header="tracefunc.h"         # To be autogenerated. Will hold trace function names.
trace_file="trace.out"     # Where should your tracing results go?

external_libraries="/lib/i386-linux-gnu/libc.so.6"      # Anything else to trace?
# Separate each name with a space.

whitelist_functions="foo|bar|malloc" # Which functions are you looking for?
blacklist_functions="bar|malloc"        # Separate each name with a pipe '|' 
# (regular expression syntax)

main_program="program"       # Where is your main executable?
# You should compile the main executable at least once 
# without tracing before running this script.
# We will probe the executable for the functions 
# addresses that you are interested in tracing.

echo "Configuration:"
echo "================="
echo "Runtime tracing results will go to $trace_file"

if [ -e ${main_program} ] ; then
echo "Main program will be is: $main_program"
else
echo "Error: Main program \"${main_program}\" does not exist. Please compile it at least once before attempting to trace it."
exit 1
fi

if [ x"$external_libraries" != x ] ; then
    for lib in ${external_libraries} ; do
     if [ ! -e $lib ] ; then
     echo "Error: External library does not exist, try again: $lib"
     exit 1
     fi
    done
fi

echo -e "Will also trace external libraries:\n$external_libraries\n"

if [ "$(echo "${whitelist_functions}" | wc -w)" -gt 1 ] ; then
echo "Error: list of whitelisted functions cannot contain spaces. Function names must be separated by the '|' pipe character. Try again."
exit 1
fi

if [ "$(echo "${blacklist_functions}" | wc -w)" -gt 1 ] ; then
echo "Error: list of blacklisted functions cannot contain spaces. Function names must be separated by the '|' pipe character. Try again."
exit 1
fi

echo "Generating $header input file for tracing..."

rm -f $header

cat << EOF > $header
/* 
 * AUTO GENERATED FILE. DO NOT MODIFY.
 * 
 * USE "generate_trace_input.sh" script to create this file.
 */

/*
 * Where should the result output trace contents should be written to?
 */
static const char * matches[] = {
EOF

echo "Outputting whitelisted matches from $whitelist_functions ..."

for program in ${main_program} ${external_libraries} ; do
nm -D ${program} | cut -d " " -f 3 | sort | uniq | grep -E "($whitelist_functions)" | while read line ; do 
echo "  Whitelist function match: $line"
echo -e "\"$line\"," >> $header
done
done

echo "};" >> $header

cat << EOF >> $header
static const char * blacklist[] = {
EOF

echo "Outputting blacklisted matches from $blacklist_functions ..."

for program in ${main_program} ${external_libraries} ; do
nm -D ${program} | cut -d " " -f 3 | sort | uniq | grep -E "($blacklist_functions)" | while read line ; do 
echo "  Blacklist function match: $line"
echo -e "\"$line\"," >> $header
done
done

echo "};" >> $header 

cat << EOF >> $header
static const char * programs[] = {
EOF

for lib in $external_libraries ; do
echo -e "\"$lib\"," >> $header
done

echo "};" >> $header 
 
cat << EOF >> $header

/*
 * Which white listed functions, blacklisted functions and 
 * additional, external libraries (if any) are to be loaded while the
 * program is being traced?
 */
static const char * output_trace_filename = 
EOF

echo " \"$trace_file\";" >> $header


Comments: Google login required.


ċ
example_program.c
(0k)
Michael Hines,
Jan 28, 2013, 9:05 AM
ċ
generate_trace_input.sh
(3k)
Michael Hines,
Jan 28, 2013, 8:34 AM
ċ
tracefunc.c
(6k)
Michael Hines,
Jan 28, 2013, 8:08 AM
Comments