Gallery

Biking Yinz-ward, Part 2: In Which Matters of the Hamstring Are Considered, Narcissism Cross Technology Debated, And School Bus Induced Time Portals Propositioned

Screen Shot 2014-06-09 at 4.14.37 PMedit

Here’s Part 1 if you missed it.


13. Turn right onto the Lower Trail

Screen Shot 2014-06-02 at 10.51.52 PM

(Estimated) Time: 8:05 a.m.
Time from start (5:49 a.m.): 2 hours 16 minutes
Miles this step: 12.8
Miles total: 39.2

Comments: Picture!

DSC_5760

The Lower Trail may be the finest piece of land I’ve had the pleasure of crossing. As the map shows it travels along the Juniata River; what it doesn’t show is the flush shades of green, nature’s panorama of the color ‘tween yellow and blue in all its glory and different hues. Aye, the eye marvels.

Besides that, it was nice to see this area in late spring after having seen its orange side in the fall. Maybe one day when we’ve solved the world’s more pressing situations we could let our cities’ buildings change color with the season. Imagine Manhattan’s skyline a egg blue in the srping, a soft green in summer, hard bronze in fall and pitch navy in winter. I’d be more inclined to visit, wouldn’t you?

I also saw a really intensely blue bird and stopped on a dime to take its picture. Alas, my attempt proved less than successful:

DSC_5761edit

Photographic evidence of the struggle

“O flighted one, thine color escapes my len’s net.” I have failed the birdwatchers of past and present.

The Lower Trail sparks a certain heightened rhetoric in all who pass through it; that’s the only explanation for why I’m keeping this verbal charade up. Next.

  1. Slight left on to stay on the Lower Trail

Screen Shot 2014-06-02 at 10.51.57 PM

Time: 9:28 a.m.
Time from start: 3 hours 39 minutes
Miles this step: 3.5
Miles total: 42.7

Comments: Oh right, trails are long. I stopped at a gas station mart and ransacked their peanut butter cracker and Nutri-grain bar stock, only after locking myself in their bathroom and downing two bottles of tap water. A truck labeled Duncansville pulled into the parking lot as I sat on a picnic table eating. I wondered if I should ask anyone where I was. But I just kept going instead. Why, yes, I am a male.

  1. Continue onto Long Rd (220 ft)

Screen Shot 2014-06-02 at 10.52.01 PM

  1. Continue straight onto T444 (0.2 miles)

Screen Shot 2014-06-02 at 10.52.04 PM

  1. Turn left onto Flowing Spring Rd/T444 (486 ft)

Screen Shot 2014-06-02 at 10.52.06 PM

  1. Turn left onto US-22 W

Screen Shot 2014-06-02 at 10.52.09 PM

Time: 9:58 a.m.
Time from start: 4 hours 9 minutes
Miles this step: 0.6 + 0.2 + 706 ft (round down to 0.1) = 0.9
Miles total: 43.6

Comments: Amnesia strikes again! I don’t really remember these few steps. Probably because Google split less than a mile’s distance into four parts so my brain decided to package it as one big IDGAF.

19. Turn onto State Rte 1011

Screen Shot 2014-06-02 at 10.52.13 PM

Time: 10:01 a.m.
Time from start: 4 hours 12 minutes
Miles this step: 1.2
Miles total: 44.8

Comments: It started to get hot around now. Hell begins outside Canoe Creek State Park (in an environmental sense).

  1. Turn left onto Turkey Valley Rd

Screen Shot 2014-06-02 at 10.52.17 PM

Time: 10:13 a.m.
Time from start: 4 hours 24  minutes
Miles this step: 3.1
Miles total: 47.9

Comments: (Homer Simpson’s post-Satisfying Laugh voice) Turkey Valley Road.

At this point my technological will came to heads with my historical perspective/visceral preferring/present moment mind-state occupying will. (The initial verbal carnage set a bloody tone to the entire affair.) I was biking up this turkey road, passing lovely farms and lovelier horses; the sun had reached a respectable height; the valley lay below me like a waiting whore armchair seeming to cry for (photo)graphic embrace. It would be so seductively easy to whip out my phone and snap a picture of the miles-long landscape, upload it to Instagram – obviously shared to Facebook and Twitter – along with some kind of humblebraggy/overly self-depricating caption like “It was inevitable I’d end up having no clue where I am #happyvalley #psu #summer2014” etc., hinting to what followers I have that “Hey, I’m out and about like a bigshot cyclist guy and I’ve biked so far that I had to let you all know how awesome I am because that’s what you’re supposed to do in today’s world, right? Showcase those life events like a boss who manages the Humility Resources department (because God help you if it’s overt braggadocio – unless you go that route with no exceptions, then you might strike gold), and bask in the navel-gazing security of pending notifications.

But on the other side, the side I consider to have all the good qualities, like healthy self-esteem, reduced levels of neuroses, self-awareness and perspective on a variety situations, and a willingness to remain focused when the moment calls for it, that side would say, “What is the ephemeral pleasure of harvesting Likes when compared to appreciating the eternity-spanning instant of you on this mountainside and connected to the landscape, you as a part of the whole and simply playing your role for whatever reason that may be?” This side often goes “deep” and can be hard to understand, but I have years and years to try and understand it.

Compromise: no shallow Instagramming, just a picture of hope.

photo

…so I guess the tech side won. [1]

21. Turn right onto US-22 W

Screen Shot 2014-06-02 at 10.52.21 PM

Time: 10:13 a.m. (re-adjusted based on evidence)
Time from start: 4 hours 24 minutes
Miles this step: 6.0
Miles total: 53.9

Comments: The time stamp should be more accurate on this step, since I had to stop at a gas station convenience store and refuel on crackers. I ate those in the shade and called Penn State’s Media Tech department to renew the camera I’d checked out. My phone’s call log says I made that call at 10:31, maybe 6 minutes after I stopped there. That store stands about 12 minutes after this step starts, thus 10:13 a.m. makes sense as when I most likely began this step.

First sighting of a “This is Steelers country” flag on a porch. Oh ya.

Biking along 22 took me through Hollidaysburg and Duncansville, the latter of which has big shipping rail yard running through the center of town. It’s a smaller town (population of about 1,200), but it seems to export huge amounts of coal. Anecdotally, I thought the noise must be hard on the residents across the street.

Saw another cyclist at one intersection and for once I felt sure I was pulling the longer trek.

  1. Continue onto Old Rte 22

Screen Shot 2014-06-02 at 10.52.25 PM

  1. Turn left onto Foot of Ten Rd

Screen Shot 2014-06-02 at 10.52.28 PM

Time: 10:53 a.m.
Time from start: 5 hours 4 minutes
Miles this step: 2.0 + 0.2 = 2.2
Miles total: 56.1

Comments: After Turkey Valley, this gets my vote for best road name. It sounds so solid, like a Doric pillar.

  1. Turn left onto Valley Forge Rd

Screen Shot 2014-06-02 at 10.52.31 PM

Time: 11:07 a.m.
Time from start: 5 hours 18 minutes
Miles this step: 0.6
Miles total: 56.7

Comments: A water break fit naturally here, since I had to stop anyway to take this picture:

DSC_5766

Wheels on the bus go ’round and ’round, ’round and ’round…

There’s two possible explanations here. One, I stumbled on a portal showing some time in the future, far enough ahead that the nuclear wasteland had recovered and nature was again thriving but human wreckage had not entirely decayed, leaving traces like these for the posterity of future species to be. Or two, some kids played a prank on their bus driver who just snapped.


25. Turn right onto 6 to 10 trail (APRR)

Screen Shot 2014-06-02 at 10.52.34 PM

Time: 11:17 a.m.
Time from start: 5 hours 28 minutes
Miles this step: 2.5
Miles total: 59.2

Comments:

DSC_5768

I stopped around here to take this picture. I do not know what animal the bones belong to, or how it got this way. I was not keen on finding out, especially if the one responsible was nearby. And I needed to keep moving; fewer pictures and more pedaling, I told myself.

Of course, right before the end of the trail I came upon some construction, which lets me make the joke about thinking I had left State College to get away from all the construction. (It makes sense if you’ve lived there.) So, camera out while I walked my bike through the ROAD WORK AHEAD.

DSC_5770

  1. Turn left onto Old Rte 22 [2]

Screen Shot 2014-06-09 at 4.29.33 PM

Time: 11:32 a.m.
Time from start: 5 hours 43 minutes
Miles this step: 2.0
Miles total: 61.2

Comments: Holy crapping effing monkey jumping cheese balls, this was the worst step of the trip to this point. It was here that my now veteran hatred for Route 22 took its first halting but soon to be burgeoning steps, which incidentally is how I approached the uphill 2 miles here. Walking made me realize how little I would be able to do if someone driving by decided to abduct me. The highway system is not set up well for long individual commutes on non-motor vehicles.

Halfway up I rested outside the gated driveway of a somewhat randomly placed mansion and ate the rest of my crackers. My map told me there was a town coming up fairly soon, so I was more worried about the fact that I didn’t have more than a third of a Gatorade bottle’s water. I won’t lie: the main reason I didn’t turn around here was because I remembered that joke about blondes (a popular joke topic when I was in elementary school). One version of it goes like this:

“On a deserted island there were three women, a blonde, a brunette, and a redhead. They needed to get back to the mainland and the only way was by swimming. The redhead goes first. She makes it a quarter of the way and then she drowns. The brunette goes second, and makes it one third of the way before drowning. The blonde starts her swim last. She makes it half way and gets tired, so she swims all the way back to the island.”

I didn’t want to be the blonde.

Next time: Approaching Ebensburg, AKA The Summit Is Too Damn High!


Notes

[1] To learn more about therapeutic horseback riding, read here. The homepage for the one in the picture is here.

[2] I don’t know what’s going on with these headings. The HTML says they’re fine but then they don’t show up as the right numbers so that’s not fine. I’m going to not worry about it right now and maybe come back to it later if it seems to be distracting to reading.

 

Gallery

And I Would Link 500 i-Nodes: Moving Along Chapter 5 in “The Linux Progriming Interface”

Previously Confusing Stuff

Bitwise operation: One that operates on bit patterns or binary numerals at the individual bit level. These may be one of the fastest processor operations, beating division, multiplication and sometimes addition, and using fewer resources as well. (This difference is reduced in modern processors.)

Common bitwise operators include NOT (complement), a unary operation that performs logical negation on each bit (0 becomes 1 and vice versa); AND, taking two bits and performing the logical AND operation one each corresponding pair of bits (result is 1 if bit A and bit B are both 1, otherwise result is 0; OR, taking two bit patterns of equal length and performing the logical inclusive OR (a 1 result if bit A or bit B or both are 1); and XOR, the exclusive version of OR (return a 0 if both bits are 0 or 1).

Here are the mathematical equivalents for us non-STEMers to gaze at in hypnotic incomprehensibility:

\text{NOT }x = \sum_{n=0}^{b}2^n\left[\left(\left\lfloor\frac{x}{2^n}\right\rfloor \bmod 2 + 1\right) \bmod 2\right]

x\text{ AND }y = \sum_{n=0}^{b}2^n\left(\left\lfloor\frac{x}{2^n}\right\rfloor \bmod 2\right)\left(\left\lfloor\frac{y}{2^n}\right\rfloor \bmod 2\right)

x\text{ OR }y = \sum_{n=0}^{b}2^n\left[\left[\left(\left\lfloor\frac{x}{2^n}\right\rfloor \bmod 2\right) + \left(\left\lfloor\frac{y}{2^n}\right\rfloor \bmod 2\right) + \left(\left\lfloor\frac{x}{2^n}\right\rfloor \bmod 2\right)\left(\left\lfloor\frac{y}{2^n}\right\rfloor \bmod 2\right)\bmod 2\right]\bmod 2\right]

x\text{ XOR }y = \sum_{n=0}^{b}2^n\left[\left[\left(\left\lfloor\frac{x}{2^n}\right\rfloor \bmod 2\right) + \left(\left\lfloor\frac{y}{2^n}\right\rfloor \bmod 2\right)\right]\bmod 2\right]

Where b is the number of bits in x = \lfloor\log_2 x\rfloor+1 for all x \neq 0.

(Actually, it’s not that bad; mod means modular, which returns a remainder R of any A mod B expression. (Ex. 8 mod 3 returns 2 as a remainder.) And capital-sigma notation you may or may not remember from high school pre-calculus, but it’s fairly straight-forward, though it gets applied in lots of higher level stuff.)

Last of all are bit shifts, which operate on the binary representation of an integer and are thus sometimes classified as bitwise operations, although technically this is not true since they do not operate on pairs of corresponding bits. Bit shifts include arithmetic shift (bits shift left or rightward and the ones shifted out of either end are discarded), logical shift (zeroes are shifted to replace the discarded bits – logical and arithmetic left-shifts are the same), circular shift/bit rotation/rotate no carry (a self-contained shift in which end values “rotate” to the other end as if connect like a circle; frequently used in cryptography), rotate through carry (similar to rotate no carry but the ends of the register are separated by the carry flag, a bit indicating an arithmetic transformation has been generated out of the most significant ALU bit position (the largest value bit at the head of a bit array)), and various other shifts in C, C++, C#, Python, Java and Pascal.

See Bitwise operation

Some Bonus Confusing Stuff From Outside Reading

Escrow: It can mean a contractual agreement in which a third party receives and distributes money or documents for the primary transacting parties according to their agreed conditions; an account established by a broker (TIL: brokers cannot represent both the seller and the buyer, for clear reasons) for holding funds on behalf of the broker’s principal (a person who authorizes an agent act to create one or more legal relationships with a third party, relying on the principal qui facit per alium, facit per se) or another person until the completion of the transaction; or a trust account held in the borrower’s name to pay obligations such as property taxes and insurance premiums.

Several types of escrow exist. In the U.S., the most common context would be in real estate, in which a mortgage lender unwilling to take on the risk of a homeowner not paying property tax would require an escrow company handle holding the money, as most mortgage terms specify. eBay is an example of an escrow company that facilitates person-to-to-person remote auctions. In the UK, escrow accounts are used to hold solicitors’ clients’ money (or deposits) for private property transactions until they are completed. Other example include covering the warranty period of the purchase of a second hand car, property rental deposits (Aha! those bastards), and granting provision of construction services until the work is complete to a defined standard.

Internet escrow works conceptually the same way. The Payment Services Directive, an EU regulatory service, began on 1 November 2009 to allow low-cost Internet escrow services with proper licensing and government regulation, which could enhance security in commercial transactions at the sot of cents rather than thousands of dollars. These are listed on government registers, a security necessity with the prevalence of “bogus escrow,” in which a person creates a phony escrow service to broker an illegitimate transaction with an unaware third party.

Other areas where escrow is used include automated banking (automated teller machines that can return money if the customer is unsatisfied by the deposit) and vending machines (same thing, which I had never thought of but find funny). Source code escrow agents hold software source code in the event of technical problems which would drive up the costs of a software client who does not have an escrow agreement in place. Examples of escrow intellectual property include song music and lyrics, manufacturing designs and notes, and TV and movie scripts and treatments (the “short story” the film/show tells).

See Escrow

Moore’s Law: Coined around 1970, the term describes an observation made by Gordon E. Moore, co-founder of Intel Corp., in a 1965 paper, namely that over the history of computing hardware, the number of transistors in a dense integrated circuit doubles approximately every two years. In the hilarious but inspiring way life imitates ideas, his prediction has come true in part because the law came to be used as a guide for long-term planning and setting development targets in the semiconductor industry. Electronic device capabilities like processing speed, memory capacity, sensor and pixel number and size have all been linked to Moore’s Law.

The Moore’s law period is/was often quoted as 18 months because of a prediction by David House (Intel) about doubling chip performance. Recently, predictions have gotten more bearish on limitations of the “Law” going forward, with Moore himself stating in 2005 that he could not see the exponential rate continuously doubling. However, several futurists such as Bruce Sterling, Vernor Vinge, and most famously Ray Kurzweil, believe that Moore’s Law will lead to a technological singularity (a widely contested term due to human predictive limits; it is generally held as the time when artificial intelligence will have progressed to the point of human intelligence, and the consequences thereof).

Similar observations to Moore’s, which initially only applied to components, have been seen in areas such as Dennard scaling (performance per watt; this trend appears to have broken down), quality adjusted price of IT equipment (prices declined an average of 16% annually from 1959-2009; it has slowed to 2% since 2010 , although my guess is that if you looked at each year from the past five decades there would be 3-5 years periods where the average was comparable, if not as low), hard disk storage cost per unit of information (sometimes called Kryder’s Law), network capacity (Butter’s law, a halving of network transmission cost), pixels per dollar, the great Moore’s law compensator (AKA bloat or Wirth’s law, where successive generations of computer software acquire enough bloat to offset the performance gains predicted by Moore’s law), library expansion, and the Carlson Curve (the biotechnological equivalent of Moore’s Law named author Rob Carlson, who predicted a similar doubling time of DNA sequencing technologies).

A lesser mentioned corollary to Moore’s law is Rock’s law, that as the cost of computing power to consumers falls, the cost to producers rises.

There is so much more on this topic (which innovations helped to sustain Moore’s law, its effects on the early days of the 2003 Iraq invasion, how the problem of obsolescence is involved), but we must get to TLPI.

See Moore’s law


Chapter 5 File I/O: Further Details

5.4 Relationship Between File Descriptors and Open Files

It is possible and useful for multiple descriptors to refer to the same open file. Here, there are three relevant data structures maintained by the kernel: 1.) the per-process file descriptor table, 2.) the system-wide table of open file descriptions, and 3.) the file system i-node table. For each process the kernel maintains a a table of open file descriptors, each of which includes a set of flags controlling the operation of the file descriptor and a reference to the open flag.

The system-wide table of open fd’s is called the open file table and contains the open file handles. An open fd stores the current file offset, status flags specified when opening the file, the file access mode, settings relating to signal-driven I/O (covered in later sections) and a reference to the i-node object for the file. The i-node (recall that the “index node” stores file metadata) includes the file type (regular, socket, FIFO) and permissions, a pointer to a list of locks held on this file, and various properties such as file size and time of access.

Process descriptors can refer to single open file description because of a fork (parent-child processes), and different file descriptions can refer to the same i-node table entry, if, say, both processes called open() for the same file. Within one process, multiple descriptors can point to the same open fd as a result of a call to dup(), dup2(), or fcntl(). 

The phrase scope rules is used in the sentence: “Similar scope rules apply when retrieving and changing the open file status flags (e.g., O_APPEND, )_ONBLOCK, and O_ASYNC) using the Fcntl() F_GETFL and F_SETFL operations.” That should make sense by next post; cutting it short this time and ending here.


Italicized words sections

Locks, scope rules. Bonus leftovers: neural networks, deep linking learning (let’s change it to that).

Until next time.

Gallery

Pittsburgh Pictures

Recently I watched Finding Vivian Maier in State College’s premier old-style theater, The State Theatre. (They even spell it all hoity-toity.) It told the story of Ms. Maier, a mid-1900s nanny whose hobby for photography went beyond all possible expectations of those who knew her. Her total work amounts to more than 100,000 photographs. Most of her work (featured here) was in black and white. A few of my favorites:

Undated, Chicago, IL

Undated, New York, NY

(This one may be perfect.)

Emmett Kelly as the clown figure "Weary Willie", Undated

Undated

Undated

Undated, New York, NY

November 1953, New York, NY

1954, New York, NY

August 16, 1956, Chicago, IL

September 1956, New York, NY

(This is the look that only older people give you nowadays if you try to take their picture. The younger generation prefers to turn away the face or throw up their arms to obscure themselves. Just an observation.)

August 22, 1956. Chicago, IL

September 24, 1959, New York, NY

1960s. Chicago, IL

(My first thought was Rorschach.)

April 19, 1971. Chicago, IL

An obvious but important point: every photograph is of people, as is most of Maier’s work and all of her best work. That was the biggest takeaway I got, as someone who likes taking landscape and nature photography but is so-so with people. I walked out of the Theatre (‘Oh, the Theatre,’ he opined, swishing his cape and tussling his neck scarf) resolving to make sure 4 of every 5 photos I took from then on would be of people.

Most of my three days in Pittsburgh were spent walking around town and taking pictures, and I kept this in mind. Clearly I have the option of just showing you the people pictures and saying, “See? I totally followed through on my resolution!” The truth is I was still about 50-50 for pictures with people versus without, but that in itself is an improvement so I’m optimistic.

Thanks to Jon, Brad, Eric and Bernadette for letting me crash at their place while I was there, and for giving me a ride back to State College. I feel incredibly lucky to know you all.

Also thanks to the residents of Pittsburgh for putting up with me.*

DSC_5928

DSC_5805

DSC_5988

DSC_5828

DSC_5868

DSC_5873

DSC_5871

DSC_5879

DSC_5876

DSC_5878

 

DSC_5883edit

DSC_5889

(The bar we went to each night. I haven’t decided if I ever need to go back.)

DSC_5902

(I crack up every time I look at the deer on the right.)

DSC_5925

DSC_6090

DSC_5933

DSC_5934

DSC_5944edit

DSC_5954

DSC_6008

DSC_6012

DSC_6017

DSC_6020edit

(This place smelled sooooooooooooooooooooooooooooooooooooo good. Sooooo good.)

DSC_6029

DSC_6043

DSC_6054

DSC_6064

DSC_6073

DSC_6097

DSC_6098

DSC_5959

(Hey, Maier took selfies, too.)

DSC_5881

(Last but not least, my gracious hosts.)

*All photos not cropped or edited in any way, not that it seems like they were.

Image

5…5…5 Pending I/Oooooooo’s: Chapter 5 of “The Linux Programming Interface”

Previously Confusing Stuff

Ordinal byte position: The location identifier of a byte within a file or array. The byte commencing a string of data would be the first byte, the following the second, and so on.

See Ordinal

File hole: File holes are found in sparse files, a type which attempts to use file system space more efficiently when blocks allocated to the file are mostly empty or unused. Metadata representing the empty blocks is written to the disk, which the file system converts to “real” blocks of zero bytes at runtime, unbeknownst to the application. Naturally, the holes are filled when the block receives real (non-empty) data in that disk space. The disadvantage to the sparse file system is potential fragmentation (inefficient storage space use), overwrites and other errors, and copying such files onto systems or with programs that do not support them.

See Sparse file

The Bonus Confusing Stuff From Last Time

SQL: Stands for Structured Query Language, a specialized programming language for managing data in relational database management systems (RDBMSs), the most commonly used form of databases today, storing financial records, manufacturing and logistical information, personnel data, etc. The highest revenue relational database vendors were Oracle, IBM, Microsoft, SAP with Sybase, and Teradata (2011). Oracle Database, Microsoft SQL Server and MySQL are used on 50 percent or more of the total database sites.

SQL evolved from Edgar F. Codd’s relational model of databases, which come with twelve rules (thirteen; 0-12) proposed by Codd.

Rule 0 : The Foundation rule – In short, qualified systems must be relational (formed of tuples, or ordered sets), databasal (formed as a base of data), and managerial. (Ok, I mean “must act as management systems.” And databasal is sadly not a word. It’s just that the Wiki graf I’m writing from is drier than a humorless cactus.) The system must use exclusively relational facilities, otherwise no dice.

Rule 1: The information rule – All information is represented only one way, as a value in a table.

Rule 2: The guaranteed success rule – All data must be accessible, or in a parallel sense, uniquely addressable.

Rule 3: Systematic treatment of null values – Each field must be able to remain null, and the database must include a distinct data type and value to indicate that.

Rule 4: Active online catalog based on the relational model – The metadata section of the database (how the ‘base’s structure is set up) must be accessible by the same relational methods as the data it contains.

Rule 5: The comprehensive data sublanguage rule – The system must be interactive and able to be edited/updated/improved using human readable syntax.

Rule 6: The view updating rule – All table updates that the system can do theoretically should be practically.

Rule 7: High-level insert, update, and delete – The system must allow these operations beyond the scope of a single row in a table, as well as multiple data set retrieval.

Rule 8: Physical data independence – Changes to how the data is stored and other physical properties should not require a change to an application based on the earlier properties.

Rule 9: Logical data independence – Changes to the level of tables, columns, and rows should not require a change in corresponding applications. A note is written that this rule is more difficult than the previous.

Rule 10: Integrity independence – Existing applications should not be affected by alterations to data integrity constraints. (Three types: entity – primary key, related to each space has its own ID, referential – foreign key, deals with connections to primary keys, and domain – heading values for pools of data items.)

Rule 11: Distribution independence – The database’s distribution actions should be abstracted away for the user. Any updates to the database in this area should not affect existing applications.

Rule 12: The nonsubversion rule: A system with a low-level/record-at-a-time interface cannot be used to subvert the system. (Query: is that ‘cannot’ or ‘should not’?)

SQL breaks several of these rules, such as using lists over tuples in the results and having its NULL feature introduced without founding it directly on the relational model. Implementation variance from vendor to vendor is another common criticism. The most recent standardization was in 2011.

One final point to remember is the distinction between alternatives to SQL as a language and alternatives to the relational model itself.

See SQL

SEO: AKA search engine optimization. The process of affecting the visibility of a website or web page in a search engine’s natural (unpaid/organic) search results.  Types of SEO include image, local video, academic, news and industry-specific search. The growing Internet marketing strategy considers how SE’s work, what people search for, the actual keywords they type, and which SE’s they prefer. Optimization involves editing content, HTML and other coding to increase the target’s relevance to the search terms and reduce the barriers of SE’s indexing activities (often using a web crawler to process a large amount of pages for later indexing and easier search access). Backlinks (inbound links) are another important characteristic in determining SEO.

Past and present search engines use various algorithms to ensure that websites cannot game the system by stuffing keywords or links on their pages as a way of shooting to the top. This creates an arms race between both sides of the table, and forces algorithms to become ever more complicated and far-reaching. Looking at Google, updates over the years have been implemented to cover keyword stuffing, link trading, link farming, link spamming, content copying, and other techniques considered unfair. The rise of personalized search, introduced by Google in 2005, has been called the death of page ranking by some.

Websites that rely on SE’s must adapt to what are often daily changes to search algorithms. In 2010 Google changed their search algorithm more than 500 times, each time possibly affecting websites’ search result placement and thus their overall traffic. This is held as one reason to not overly rely on traffic from search.

The dominant search engine in the United State, Google’s popularity is even higher abroad, covering almost 90 percent of UK searches and Germany with a similar share in other countries. Some exceptions are China, Japan, South Korea, Russia, and the Czech Republic, whose market leaders are Baidu, Yahoo! Japan, Naver, Yandex, and Seznam, respectively.

See Search engine optimization

Node.js: A software platform developed by Ryan Dahl for scalable server-side and networking applications. Written in JavaScript and running on Mac OS X, Windows and Linux, Node.js applications are meant to maximize throughput and efficiency, using non-blocking I/O and asynchronous events. (Recall that asynchronous means less dependent on a single terminal.) Applications run single-threaded, while file and network events use multiple threads. Node.js uses the Google V8 JavaScript engine (open source) to execute code. Using HTTP and socket support allows Node.js to act as a web server without additional web server software such as Apache.

Places where node is recommended: chat servers, API on top of object DB (database), Queued Inputs, Data Streaming, Proxy (intermediary servers that process client requests), Application Monitoring Dashboard (tracks usage and more info of active applications) and System Monitoring Dashboard. Some drawbacks of node include rapid library overturn, issues with standards, learning curve concerns and the necessary complexity of asynchronous, event-driven code versus synchronous.

See Node.js


Chapter 5 File I/O: Further Details

In which atomicity, fcntrl(), kernel data structures, file duplication, extended RDWR functionality, non-blocking I/O and temporary files with randomly generated unique names are covered.

5.1 Atomicity and Race Conditions

All system calls are executed atomically, meaning the kernel guarantees each step is performed as a single operation without interruption. Chiefly, atomicity allows systems to avoid race conditions (alt. race hazards), which result when tow processes/threads share resources and interact in unexpected ways based on the order each called for the CPU.

Last chapter we noted that O_EXCL along with O_CREAT causes open() to return an error if the file already exists; this allows the process to ensure that it is the file’s creator. This action is performed atomically, and the following example shows why this is important:

(from fileio/bad_exclusive_open.c)

fd = open(argv[1], O_WRONLY; /* Open 1: chekc if file exists /
if (fd != -1) { /
Open succeeded /
printf(“PID %ld] File \”%s\” already exists\n”,
(long) getpid(), argv[1]);
close(fd);
} else {
if (errno != ENOENT) { /
Failed for unexpected reason /
errExit(“open”);
} else {
/
WINDOW FOR FAILURE */
fd = open(argv[1], O_WRONLY | O_CREAT, S_IRUSR | S_IWUSR);
if (fd == -1)
errExit(“open”);

printf(“[PID %ld] Created file \”%s\” exclusively\n”,
(long) getpid(), argv[1]);/* MAY NOT BE TRUE! */
}
}

This code contains a bug. (Hey, I’m just following the book. I didn’t see it either.) Hypothetically, let’s say that at the time of the first call the file did not exist, but by the second call some other process had created it. This could happen if the kernel scheduler decided that the first process’s time slice had expired and gave control to the other process, or if both process were running concurrently on a multiprocessor system. A diagram shows how Process A might have an open() call that fails – open (…, O_WRONLY); – followed by the kernel switching time slice access to Process B, which goes about the same failed open(), but then tries again –  open(…, O_WRONLY | O_CREAT, …) – and succeeds at creating the file. The time slice might then return to Process A which finally gets around to its own – open(…, O_WRONLY | O_CREAT, …) – which succeeds, thus giving Process A the impression that it created the file. A way to check this is to include the following lines of code where the  WINDOW FOR FAILURE commented line is above:

printf(“[PID %ld] File \”%s\” doesn’t exist yet\n”, (long) getpid(), argv[1]);
if (argc > 2) {
/* Delay between check and create /
sleep(5)/
Suspends execution for 5 seconds */

printf(“[PID %ld] Done sleeping\n”, (long) getpid());

Running two simultaneous instances of this program will cause both to claim exclusive creation of the file:

$ ./bad_exclusive_open tfile sleep &
[PID 3317] File “tfile” doesn’t exist yet
[1] 3317
$.bad_exlusive_open tfile
[PID 3318] File “tfile” doesn’t exist yet
[PID 3318] Created file “tfile” exclusively
$ [PID 3317] Done sleeping
[PID 3317] Created file “tfile” exclusively Lie! (i.e. Not True)

Because the code of the first process was interrupted between the existence check and the creation of the file, both process now think they exclusively created it. (Think of it like petty patent spats.) This is why using the O_EXCL flag alongside the O_CREAT flag is important: it guarantees the atomicity of the process.

The other case where atomicity comes in is when multiple processes are appending data to the same file, such as a global log file. Looking at the following code for critiques:

if (lseek(fd, 0, SEEK_END) == -1)
errExit(“lseek”);
if (write(fd, buf, len) != len)
fatal(“Partial/failed write”);

We find that a similar problem from above can occur if the first process executing the code is interrupted between lseek() and write() by a second process doing the same thing. Avoiding this problem usually requires atomicity, which the O_APPEND flag guarantees.

5.2 File Control Operations: fcntl()

fcntl() takes a cmd argument- int fcntl(int fd, int cmd, …); – which can specify a wide range of operations. Some are covered below and others will be covered further in later chapters.

5.3 Open File Status Flags

fcntl() can retrieve or modify the access mode and open file status flags of an open file with the cmd F_GETFL:

int flags, accessMode;

flags = fcntl(fd, F_GETFL); /* Third argument option is not required here */
if (flags == -1)
errExit(“fcntl”);

The test for whether the file was opened for synchronized writes:

if (flags & O_SYNC)
printf(“writes are synchronized\n”);

Checking for the file’s access mode is more complex since the O_RDONLY(0), O_WRONLY(1), and O_RDWR(2) constants don’t correspond to single bits in the open file status flags. (I am not really sure what this means, so if anyone is still with us at this point do feel free to answer or speculate in the comments.) In any case, the check occurs by masking (setting multiple bits on, off, or otherwise inverted in order to work a single bitwise operation) the flags value with the constant O_ACCMODE, then testing for equality:

accessMode = flags & O_ACCMODE;
if (accessMode == O_WRONGLY || accessMode == O_RDWR)
printf(“file is writable\n”);<>

The F_SETFL command can modify the O_APPEND, O_NONBLOCK, O_NOATIME, O_ASYNC, and O_DIRECT flags. Cases where this modification is useful include when the file was not opened by the calling program and thus had no control over the flags used in open() (i.e. the file had previously taken aone of the three standard descriptors before the program was started), or when the file descriptor was obtained from a system call other than open(), such as pipe(), which creates a pipe and returns tow file descriptors referring to the pipe-ends, or socket(), likewise for sockets.

fcntl() retrieves a coppy of the existing flags, modifies the bits to be changed, and makes a further call to fcntl() to update the flags. Example:

int flags;

flags = fcntl(fd, F_GETFL);
if (flags == -1)
errExit(“fcntl”);
flags |= O_APPEND;
if (fcntl(fd, F_SETFL, flags) == -1)
errExit(“fcntl”);


As far as homework from last time, the answers to the exercises from Section 4.10 will be put on hold until I get myself a proper Linux console. Your patience is appreciated.


Confusing Stuff section

Single bitwise operation

Just one? Let’s again add some outside reading vocab: escrow, Moore’s Law, neural network, and deep linking.

Until next time.

 

Gallery

Biking Yinz-ward, Part 1 of Several

Screen Shot 2014-06-03 at 9.12.51 PM

The best part about biking from State College, Pa. to Pittsburgh is people’s faces when I tell them.

The worst part was doing it.

That’s not a holistic condemnation of the trip. Getting to see a chunk of western Pennsylvania for the first time, following a long trail step-by-step, avoiding getting lost, meeting new people, and resting were some of the highlights I experienced. I wouldn’t want anyone thinking about doing something similar to feel as if there isn’t any pleasure to be had. It’s just that the pain is much worse.

Now a week later, rubbing the upper part of my left knee still unleashes a tender sort of mewling from those muscles. My left shoulder spent most of today shedding sunburned layers of skin. And tomorrow morning I’ll plaster on some more cream onto my legs just in case those stubborn red patches really are poison ivy.

As far as I can tell, I got off lucky. On a less fortunate ride I might have met a bear on the trails, or gotten hit by a car on the highway (more likely), or that deer I gestured at menacingly might have charged me (even more likely). So, in the spirit of a previously trapped person emerging from a cave and looking around wonderingly at everything he thought he might never see again, I thought I’d recap the 29 hour adventure by going through each step on Google Maps’ route. I’ll carefully indicate on those maps where I went off from what the instructions said. I’ve no idea how many posts this will take, but I will try to keep it entertaining.

Here we go.


Route: State College, Pa. to Pittsburgh, Pa. 161 miles. Estimated 14 hours 46 minutes via Ghost Town trail.

1. Head southeast on S Allen St toward Highland Alley

Screen Shot 2014-06-03 at 8.16.23 PM

Time: 5:49 a.m., Tuesday, May 26, 2014
Time from start: 0 minutes
Miles this step: 0.7
Miles total: 0.7

Comments: I wanted to wake up at 4:45, but when I did I remembered that I’d set another alarm for 5:27 and deviously fell asleep again until then. This meant I ended up leaving after it was already light out. The forecast said 80+ degrees, which made it important that I cover as much ground as I could while it was still cool.

I wore blue and neon green sneakers, white low-cut ankle socks, white cotton shorts, a beater, sunglasses, and an American flag bandana, which I assumed would grant me unharmed passage from start to finish. I wonder whether counting on patriotism rates as superstition. I carried one backpack filled with the following:

2 long-sleeved shirts
2 pairs of underwear
1 pair of socks
1 pair of shorts
1 beater (also known as an A-shirt)
1 lightweight hoodie
1 iPhone 4S with charger
1 Nikon D7000 with battery charger
1 bike lock
1 wallet with driver’s license
1 keys
1 red spiral notebook
1 copy of Octavia Butler’s Patternmaster
The first three chapters of Richard Hamming’s The Art of Science and Engineering
Chapters 3 and 4 of Robert Sapolsky’s Why Zebras Don’t Get Ulcers (clearly I expected some sort of book review section of the trip)
1 pair of Crewsaver biking gloves
1 Old Spice deodorant
2 water bottles, one large, one small
Several pens and pencils
$2 in quarters
$250 in bills
Several pairs of foam earplugs

The last item was the only one I didn’t use at least once. Overall, this was too much weight (said Mr. Hindsight).

2. Turn right onto Westerly Pkwy

Screen Shot 2014-06-02 at 10.50.06 PM

Time: 6:04 a.m.
Time from start: 15 minutes
Miles this step: 0.3
Miles total: 1.0

Comments: I stopped at the Weis Markets to pick up an 8-pack of the 6-pack peanut butter crackers (48 crackers). I think it cost $3, which is a dollar more expensive than the same amount of crackers at Port Matilda’s Lykens Market. But I wasn’t about to take that 12-mile detour.

3. Keep left to stay on Westerly Pkwy

Screen Shot 2014-06-02 at 10.50.11 PM

Time: 6:05 a.m.
Time from start: 16 minutes
Miles this step: 0.3
Miles total: 1.3

Comments: I may not go through every single step if some of them are just really small adjustments like “Turn left and go 50 ft., then turn right,” or like this one.

4. Turn Left

Screen Shot 2014-06-03 at 8.16.30 PM

Time: 6:06 a.m.
Time from start: 17 minutes
Miles this step: 0.8
Miles total: 2.1

Comments: I passed a Welch pool where a group of kids and parents had already started swimming for the day. Dedication, or an overwhelming dislike of crowded pools? Both are justified.

5. Slight left toward Stonebridge Dr

Screen Shot 2014-06-02 at 10.51.03 PM

Time: 6:16 a.m.
Time from start: 27 minutes
Miles this step: 0.6
Miles total: 2.7

Comments: I thought I grew up in the suburbs, but seeing the houses this bike path passes through made me think where I grew up was a ghetto comparatively. Teenagers living here must tread a precarious line between exasperation and ennui most days.

6. Turn left onto Stonebridge Dr

Screen Shot 2014-06-02 at 10.51.09 PM

7. Turn right onto State Rte 3024

Screen Shot 2014-06-02 at 10.51.13 PM

Time: 6:20 a.m.
Time from start: 31 minutes
Miles this step: 0.7
Miles total: 3.4

Comments: Whitehall Rd. has beautiful scenery for bikers, open flat farmland on both sides of the road, peppered with an array of differently colored barns. By the way, I may forget to use the proper term cyclist, so for future reference “biker” will be used interchangeably.

8. Turn left onto PA-26 S

Screen Shot 2014-06-02 at 10.51.15 PM

Time: 6:22 a.m.
Time from start: 33 minutes
Miles this step: 1.4
Miles total: 4.8

Comments: This was the first downhill slope I could relax on, an easy descent curving right at the bottom. Appreciating the times when you don’t have to pedal is essential to cross-country biking.

9. Slight left onto PA-45 W/Pine Grove Rd (Continue to follow PA-45 W)

Screen Shot 2014-06-02 at 10.51.20 PM

Time: 6:26 a.m.
Time from start: 37 minutes
Miles this step: 20.4
Miles total: 25.2

Comments: The first big step. At 7 I pulled over to get water and take a selfie, two actions of equal priority.

photo (2)

Tiens, NSA!

The feature presentation here was the Russell E. Larson Agricultural Research Center, dozens of farm plots and small buildings spread across the 2,000 acre site. [1] I’ve been living in the State College area for almost three years, and it always surprises me how little I know of my surroundings.

This step taught me the importance of listening for cars and trucks behind you. Headphones aren’t for the safety-minded (says the guy who doesn’t wear a helmet, WHICH I DO NOT CONDONE AT ALL PLEASE ALWAYS WEAR A HELMET THANK YOU).

10. Slight left onto PA-453 S/State Rte 45

Screen Shot 2014-06-02 at 10.51.25 PM

Time: 7:45 a.m. (Time estimates will be more estimate-y from here on.)
Time since commencing: 1 hour 56 minutes
Miles this step: 0.7
Miles total: 25.9

Comments: On the big highway for a moment. A pickup truck was idling on the side of the road. Never know what you’ll get with idlers. They’re sort of like sleeping dogs. I say, let idlers be idlers.

11. Turn left onto US-22 E

Screen Shot 2014-06-02 at 10.51.29 PM

Time: 7:48 a.m.
Time since commencing: 1 hour 59 minutes
Miles this step: 0.7 (but not really; see next step)
Miles total: 26.3 (ditto)

Comments: At this intersection a vast cliff entered my view and I realized I had been to this place before. Last October I went on a date with a lovely lady, who probably does not want to be named here, to Blair County’s Lower Trail, a 16.8 mile trail transformed by the Rails-to-Trails Conservancy, a national nonprofit group that turns old unused railroads into rail trails for walking, biking, and even horseback riding. These trails are usually wide and flat; in other words idyllic. As was that day. I stopped to snack briefly and snap a couple photos commemorating my return visit.

12. Turn left onto Logging Rd 31101 Use Previous Knowledge and Take Shortcut

Screen Shot 2014-06-02 at 10.51.32 PMedit

Time: 7:55 a.m.
Time since commencing: 2 hours 6 minutes
Miles this step: Like 0.05 but let’s round up
Miles total: 26.4

Comments: Google Maps is many things, but like things that are many things (bureaucracies), it does not do flexibility well. The given route says to go down Logging Drive and wrap around at an official entrance to the trail. But an easier way would be to get off the bicycle and walk it along the area marked on the above map. The triumph I felt at this maneuver in no way corresponded to the magnitude of difference it would make by the trip’s end. (I felt really excited.) Still, it was a victory, and one I’d need as motivation in more dire times ahead.

Next time: Entering Mirkwood Forest the Lower Trail


Notes

[1] Ag Progress Days are hosted on the Russell Larson grounds each August and feature the latest technology and research, as well as guided tours and many more activities. This year’s will be on Aug. 12-14. Directions can be found here.

Image

Checking In

Hey everyone. No Linux posts, maybe for the next few days. I’m visiting Pittsburgh and have no idea whether I’ll have computer access. If I can I’ll post some but otherwise it’ll be mostly quiet until the weekend. In other words, the perfect time to reread the older posts! Or live your lives, I suppose. Anyway, talk to you again in a bit.

Image

LSeek(And Ye Shall Find): Closing Chapter 4 in “The Linux Programming Interface”

Previously Italicized words

Data integrity: The condition of uncorrupted and retrievable data. The two rough categories are physical integrity – against which challenges include electromechanical faults, design flaws, material fatigue, corrosion, outages, and other environmental hazards – and logical integrity, concerning the correctness or rationality of a piece of data, such as its referential integrity and whether its key (unique set of identifying characteristics) has integrity.

Ways to combat threats to physical integrity include uninterruptible power supplies, radiation hardened chips, robust and multi-server file systems (clustered), error code-checking (ECC) memory, and other redundancy techniques like RAID arrays. For logical, using check constraints (whether lines of code make numeric sense), foreign key constraint (checking data relationships for validity), and other runtime double-checks.

The parent-child relationship of related records works to ensure that parents own copies of child processes so that neither parent nor child can be wiped while one is (data-)invested in the other.

See Data Integrity

32-bit systems: A 32-bit microprocessor, the register can store 2^32 different values. The unsigned integer values this translates to are 0 through 4,294,967,295, which means access of 4 GiB of byte addressable memory. (GiB = Gibibyte, very similar to Gigabyte: 1GiB ≈ 1.074GB.) The external addresses and data buses are often wider than 32 bits, but both are stored and manipulated internally in the processor as 32-bit quantities.

See 32-bit

Resource limit: A set of parameters specified by the system admin to prevent individual queries or transactions from monopolizing the server resources. These parameters typically involve time ranges in which the administrator enforces the limits, such as killings huge reports or costly sessions during the critical times of the day.

See Resource limits

Consumable resource: The finite amount property associated with a resource which can be consumed by jobs allocated to it. I.e. a computer system has a finite number of processors.

See Resources in the job definition


Chapter 4 Universal I/O Model

4.7 Changing the File Offset: lseek()

The kernel records a file offset, AKA a pointer or read-write offset, for each open file, standing for where the next read()/write() operation will commence. The file offset is expressed as an ordinal byte position, relative to the start of the file, which byte is at offset 0, and points at each next byte with each successive call.

#include <unistd.h>

off_t lseek(int fd, off_t offset, int whence);
/* Returns new file offset if successful, -1 on error

offset specifies a value in bytes, whence indicates the base point from which offset is to be interpreted, and takes on of 3 possible values:

SEEK_SET (set offset bytes from beginning)
SEEK_CUR (adjusted by offset bytes relative to the current file offset)
SEEK_END (set to the file size plus offset)

A diagram shows how whence is interpreted. SEEK_SET is assigned to byte number zero of a file containing N bytes of data, while SEEK_END is assigned that N value. SEEK_CUR is assigned at the current file offset, some N – M value. The lseek() whence argument will traverse its way along byte by byte until N. Beyond EOF are N+1, etc. which are the unwritten bytes.

A call to retrieve the current location of the file offset without changing it: curr = lseek(fd, 0, SEEK_CUR);

Other call include:

lseek(fd, 0, SEEK_SET)/* Start of file /
lseek(fd, 0, SEEK_END)/
Next byte after the end of the file /
lseek(fd, -1, SEEK_END)/
Last byte of file /
lseek(fd, -10, SEEK_CUR)/
Ten bytes prior to current location /
lseek(fd, 10000, SEEK_END)/
10001 bytes past last byte of file */

lseek() fails (ESPIPE) on calls to a pipe, FIFO, socket or terminal, but can work for some device drivers.

It is possible to write bytes at an arbitrary point past the EOF, with the space between the two called a file hole. While they don’t take up any disk space, the bytes in a hole are considered to exist, returning a null value (0) when read by a buffer. The advantage to a file hole is not needing to allocate space for null bytes in a sparsely populated file. This topic is covered further in later sections.

The book’s example demonstrates how lseek() works with read() and write(). Some operations included as soffset (seek to byte offset from the start of the file), xlength (read length bytes from the file, starting at the current file offset, and display them in text form), Rlength (same as x but displays in hex), and wstr (write the string of characters specified in str at the current file offset).

The example:

#include <sys/stat.h>
#include<fcntl.h>
#include <ctype.h>
#include “tlpi_hdr.h”

int
main(int argc, char *argv[])
{
size_t len;
off_t offset;
int fd, ap, j;
char *buf;
ssize_t numRead, numWritten;

if (argc < 3 || strsmp(argv[1]. “–help”) == 0)
usageErr(“%s file {r|R|w|s}…\n”,
argv[0]);

fd = oprn(argv[1], O_RDWR | O_CREAT,
S_IRUSR | S_IWUSR | I_IRGRP | S_IWGRP |
S_IROTH | S_IWOTH);/* rw-rw-rw- */
if (fd == -1)
errExit(“open”);

for (ap = 2; ap < argc; ap++) {
switch (argv[ap][0]) {
casr ‘r’: /* Display bytes at current offset, as text ./
case ‘R’: /
Display bytes at current offset, in hex */
len = getLong(&argv[ap][1], GN_ANY_BASE, argv[ap]);
buf = malloc(len);
if (buf == NULL)
errExit(“malloc”);

numRead = read(fd, buf, len);
if (numRead == -1)
errExit(“read”);

if (numRead == 0) {
printf(“%s: end-of-file\n”, argv[ap]);
} else {
printf(“%s: “, argv[ap]);
for (j = 0; j < numRead; j++) {
if (argv[ap][0] == ‘r’)
printf(“%c, isprint((unsigned char) buf[j]) ?
buf[j] : ‘?’);
else
printf”%02x “, (unsigned int) bug[j]);
}
printf(“\n”);
}

free(buf);
break;

case ‘w’: /* Write string at current offset */
numWritten = write(fd, &argv[ap][1], strlen(&argv[ap][1]));
if (numWritten == -1)
errExit(“write”);
printf(“%s: wrote %ld bytes\n”, argv[ap], (long) numWritten);
break;

case ‘s’: /* Change file offset */
offset = getLong(&argv[ap][1], BN_ANY_BASE, argv[ap];
if (lseek(fd, offset, SEEK_SET) == -1)
errExit(“lseek”);
printf(“%s: seek succeeded\n”, argv[ap]);
break;

default:
cmdLineErr(“Argument must start with [rWs]: %s\n”, argv[ap]);
}
}
exit(EXIT_SUCCESS);
}

An example shell session shows (3xfast) how this program is used when we attempt to read bytes from a file hole:

$ touch tfileCreate new, empty file
$ ./seek_io tfile s100000 wabc Seek to offset 100,000, write “abc”
s100000: seek succeeded
wabc: wrote 3 bytes
$ ls -l tfileCheck size of file
-rw-r–r–1mtkusers100003 Feb 10 10:35 tfile
$ ./seek_io tfile s10000 R5Seek to offset 10,000, read 5 bytes from hole
s10000: seek succeeded
R5: 00 00 00 00 00Bytes in the hole contain 0

4.8 Operations Outside the Universal I/O Model: ioctl()

The general purpose mechanism for file and device operations performed outside of I/O:

#include <sys.ioctl.h>

int ioctl(int fd, int request, … /* argp /);
/ Value returned on success depends on request, -1 on error

Device-specific header files define constants that can be passed in the request argument. The ellipsis (…) indicates that the third argument can be of any type (pointer to an integer, to a structure) or unused. It will come up again in later sections.

4.9 Summary/Typing practice

In order to perform I/O, on a regular, we must first obtain a file descriptor using open(). I/O is then performed using read() and write(). After performing all I/O, we should free the file descriptor and its associated resources using close(). These systems calls can be used to perform I/O on all types of files. The fact that all file types and device drivers implement the same I/O interface allows for universality, meaning that a program can typically be used with any type of file without requiring code that is specific to the file type.

For each open file, the kernel maintains a file offset, which determines the location at which the next read or write will occur. The files offset is implicitly updated by reads and writes. Using lseek() we can explicitly reposition the file offset to any location within the file or past the EOF/. Writing data at a position beyond the previous end of the file creates a hole in the file. Reads from a file hole return bytes containing zeros.

The ioctl() system call is a catchall for device and file operators that don’t fit into the standard I/O model.

4.10 Homework

The exercise:

“The tee command reads its standard input until end-of-file, writing a copy of the input to standard output and to the file names in its command-line argument. (We show an example of the use of this command when we discuss FIFOs in Section 44.7.) Implement tee using I/O system calls. By default, tee overwrites any existing file with the given name. Implement the -a command-line option (tee -a file), which causes tee to append text to the end of a file if it already exists. (Refer to Appendix B for a description of the getopt() function, which can be used to parse command-line options.)

Write a program like cp that, when used to copy a regular file that contains holes (sequences of null bytes), also creates corresponding holes in the target file.”

(DONE CHAPTER FOUR! WHOO!!!!!!!!!!!!!)

Ahem … Good luck.


Italicized words section

ordinal byte position, file hole

Only two is low so how about some outside reading vocab? Let’s throw in SQL, SEO, and node.js.

Until next time.

Image

Silly Caller, _RDONLYs Aren’t For Writing: I/O In “The Linux Programming Interface”

Previously italicized words

Truncate: Truncation means limiting the number of digits on the right side of the decimal point by discarding the least significant ones. The number itself depends but four digits is typical. More officially, the truncation error can be (up to) twice the maximum error in rounding. The most common truncation feature, or error, in programming occurs when a decimal number is assigned as an int, because the int datatype does not store real numbers that are not integers.

See Truncation

Append: The append function attaches one thing, usually a list, to another. Other operations include appending files, or fields to records. Related but distinct functions include insert, which places the thing in between, and construct, which creates a new list with the first list as an element within the second. Like so:

append -> (1,2,3,4) (5,6,7,8) -> (1,2,3,4,5,6,7,8
construct -> (1,2,3,4) (5,6,7,8) -> ((1,2,3,4), 5,6,7,8)

See The append Function

Unsafe program: Generally, a poorly written program or a program written with malicious or viral intent. Unsafe can also refer to languages; or example, C/C++ is not a type safe language (one that prevents using a piece of memory as an incorrect type). These languages allow more experienced programmers control over low-level issues or performance tasks that would be too costly or inconvenient in other high-level languages.

The safety risks in C include values created in a type-unsafe way, and unchecked arrays cast and de/allocated with possible corruption results, such as dangling pointers. (Cast means that any object can be assigned a type regardless of whether that object or class is of the assigned type, because C does not dynamically check them.) Another way C classifies as an unsafe language include unrestricted manual memory management.

See CSE 341: Unsafe languages (C)


Chapter 4 Universal I/O Model

4.3 Opening a File: open() (Part 2)

O_CREAT: Creates a new, empty file. The mode argument must be specified or the file permissions will be set to a random value from the stack.

O_DIRECT: Allows file I/O to bypass the buffer cache. This feature is described later in the book.

O_DIRECTORY: Designed specifically for implementing opendir() and returns an error if the pathname is not a directory.

O_DSYNC: Performs file writes according the requirements of synchronized I/O data integrity completion. This feature is described later in the book.

O_EXCL: This flag allows the caller to ensure that it is the process creating file, and is used in conjunction with O_CREAT. This call fails if the pathname is a symbolic link, which presence would mean there was already a file in the desired location for the newly created one.

O_LARGEFILE: Opens large files on 32-bit systems; has no effect on 64-bit implementations.

O_NOATIME: Stops the file last-access-time from updating when reading from the file. This call requires a privileged process or matching user and owner IDs, or an error will occur. Intended to reduce disk activity caused by indexing and backup programs, although I can see that if implemented creatively it might allow for some subterfuge in accessing files; true or false?

O_NOCTTY: Prevents a terminal device being opened from becoming the controlling terminal. This feature is described later in the book.

O_NOFOLLOW: Prevents the open() standard practice of dereferencing symbolic links, by failing to open if the pathname is a symbolic link.

O_NONBLOCK: Opens the file in nonblocking mode. This feature is described later in the book.

O_SYNC: Opens the file for synchronous I/O. This feature is described later in the book.

O_TRUNC: Truncates regular and existing files to zero length and destroys any existing data. The caller must have write permissions regardless of whether the file is opened for reading or writing.

The possible errors (return: -1) that occur when trying to open the file, as identified by errno: 

EACCES: The file permissions don’t allow the caller access, or the file did not exist and could not be created.

EISDIR: The specified file is a directory, and the caller tried to open it for writing.

EMFILE: The process resource limit on the number of open file descriptors has been reached.

ENFILE: The system-wide limit on the number of open files has been reached.

ENONENT: The specified file doesn’t exist and O_CREAT was not specified, or it was but the pathname is otherwise non-existent (i.e. dangling link).

EROFS: The chosen file is read-only, and the caller tried to open it for writing.

ETXTBSY: The specified file is an executable (a program) that is currently executing. This means it must first terminate before it can be modified or opened for writing.

Later descriptions of system calls or library functions will not be listed in such a fashion, since each can be found in the corresponding manual page. This is done here simply because it is the first call we’ve gone through in detail, and looking at the array of reasons why an open() might fail illustrates some of the precautions and preconditions that users will need to be aware of. (Even the errors listed here are not a complete reference for open().)

Looking specifically at the creat() call:

#include <f.cntl.h>

int creat(const char pathname, mode_t mode);
/ Returns file descriptor, or -1 on error

This is equivalent to the open() call: fd = open(pathname, O_WRONLY | O_CREAT | O_TRUNC, mode);

…which allows more control (we can choose RDWR for example) than creat(), which has become somewhat obsolete.

4.4 Reading from a File: read()

The call:

#include <unistd.h>

ssize_t read(int fd, void buffer, size_t count); 
/ Returns number of bytes read, 0 on EOF, or -1 on error

count specifies the maximum number of bytes to read, and buffer supplies the address of the memory buffer into which the input data is to be placed, which must be at least count bytes long. EOF means end-of-file. size_t is an unsigned int type while ssize_t is signed. (Since count must be positive while the overall call must be able to return -1 on error.)

One notable feature of read() is its lack of a terminating null byte at the end of a printf() string. This feature is necessary since read() can be used on any sequence of bytes, be it text or binary, so it does not inherently include a null terminating character string. The difference can be seen in the following two codes:

A.

#define MAX_READ 20
char buffer[MAX_READ];

if (read(STDIN_FILENO, buffer, MAX_READ) == -1)
errExit(“read”);
printf(“The input data was: %s\n”, buffer);

B.

/* definition */
char buffer[MAX_READ + 1];
ssize_t numRead;

numRead = read(STDIN_FILENO, buffer, MAX_READ);
if (numRead == -1)
errExit(“read”);

buffer[numRead] = ”;
printf(“The input data was: %s\n”, buffer);

A’s output will probably include characters in addition to the string actually entered, because there is not an explicit terminating null byte, as in B (the ‘/0’). This terminating null byte requires memory as well, so the size of buffer must be at least one greater than the largest string we expect to read.

4.5 Writing to a File: write()

The call is parallel to read(), with the only differences being the syntax of the command and no return of 0 on EOF. write() returns the number of bytes actually written, which may be less than count if the disk was filled or the process resource limit was reached (this is called a partial write). How the kernel interacts with the buffering of disk I/O during write() calls will be covered later in the book.

4.6 Closing a File: close()

#include <unistd.h>

int close(int fd);
/* Returns 0 on success, -1 on error

Closing unneeded file descriptors explicitly is generally good practice, and helps make it more reliable and readable in the face of modifications. File descriptors are a consumable resource and sometimes used for long-lived programs spanning multiple files as shells or servers do, so failure to close them properly can result in a process running out of descriptors. The same bracketed error-checking code : if (close(fd) == -1) \n errExit(“close”); should be included in a call.


Italicized words section

Data integrity, 32-bit systems, resource limit, terminating null byte, consumable.

Until next time, patient friends and readers.